First, we will import the necessary libraries.

In [None]:
import numpy as np

Run the command above! If there is no error or output, congratulations : your package is loaded and you can proceed.

# Lesson 01: NumPy

Data Visualization needs Data, obviously. The domain is closely related to Data Science and Machine Learning and thus, statistics.

Before we start to plot visualizations, we need to be able to do simple operations on our data.

First, we will load the Normal Distribution using the function `genfromtxt` from Numpy. The file is already in this folder.

In [None]:
normal_distribution = np.genfromtxt('./normal_distribution.csv', delimiter=',')
normal_distribution

`normal_distribution` is now loaded as a Numpy.NpArray object.

"Numpy Arrays" allows a lot of operations that are not implemented in built-in Python lists. 

Feel free to familiarize yourself with Numpy functionnalities by reading the official documentation from their website. 

This is not necessary to complete this workshop, but interesting to better understand the behaviour of the functions introduced, and learn more of what NumPy can do for you.

## Jupyter Notebook Tips

* You can access the documentation of a specific function in a code block by typing Shift + Tab: click on `genfromtxt` above and try it!
* To expand the documentation, click on the arrow or press Shift + Tab a second time
* Tab-completion: Start typing a command (like `np.gen`) and press Tab, you will see a list of `np` methods starting with `gen`.

## Slicing & Indexing with NumPy

The indexing syntax is the same as for lists: `"[1, 2, 3][i]` will return the number at the index `i`.

The basic slice syntax is `i:j:k` where `i` is the starting index, `j` is the stopping index, and `k` is the step, where k can not be 0.

Indices can be optional, meaning `:` can be used to select all the delements.

`i` and `j` can be negative, they are interpreted as `n + i` and `n + j` (where n is the number of elements / the length of the string).

Examples:

In [None]:
print("Shape:\n", normal_distribution.shape)
print(f"The first value of the shape is the number of rows ({normal_distribution.shape[0]}) "
      f"and the second the number of columns ({normal_distribution.shape[1]}).\n")

print("Get the first line:\n", normal_distribution[0])
print("Get the last line:\n", normal_distribution[-1])

print("Get the first column:\n", normal_distribution[:, 0])

### Your turn!

How do we get the last column?

In [None]:
# TODO: Print the last column of `normal_distribution`.

print()

Great! 

Of course, you can combine indexing and slicing, as long as there is data left in your extract.

In [None]:
print("Get the first line without the first element:\n", normal_distribution[0][1:])
print("Get the first line with element reversed:\n", normal_distribution[0][::-1])

In [None]:
# TODO: Print the first line without the last element

print()

# TODO: Print the third element of the first line without the first element

print()

# TODO: Print the third and the fourth element of the last line without the first element

print()

### Using statistics

In case you forgot, a little reminder...

* The *mean* is the arithmetic average. The mean of a list `L = X1, X2, ... XN` is the sum of each element divided by the total number of elements `avg(L) = sum([X1, ... XN]) / N`.
* The *median* is the middle value of the ordered list. If `N` is odd, the median will be at the index `int(N/2)`, but if `N` is even, it will be the average of the value at index `int(N/2)` and the value at `int(N/2) + 1`.

The mean can be obtained by the function `np.mean(L)` and the median by `np.median(L)`.

In [None]:
np.mean([1, 2, 3])

Remember how we selected rows or columns just before?

You can also ask that from numpy, by selecting an `axis` value. `axis=0` select the horizontal axis (said to be "column-wise'), whereas `axis=1` select the vertical one ("row-rise").

In [None]:
matrix = [[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]

print("Means of each column:\n", np.mean(matrix, axis=0))
print("Means of each rows:\n", np.mean(matrix, axis=1))

print(np.mean(normal_distribution))

### Your turn!

In [None]:
# TODO: Print the mean of the first line.
print()
# TODO: Print the mean of the last column.
print()
# TODO: Print the mean of the intersection of the first 2 rows and last 2 columns
print()

In [None]:
# TODO: Print the median of the last line.
print()
# TODO: Print the median of the first column.
print()
# TODO: Print the median of each row.
print()

## Filtering

To iterate over your npArray, you can use `np.nditer(L)`.

This way you can apply filtering operations on your data.

In [None]:
for x in np.nditer(normal_distribution):
    if x > 105:
        print(x)

You can also use 
* Boolean indexes: `data[condition]`
* The function "extract" from NumPy: `np.extract(condition, data)`

Note: The condition is not a function but a series of Boolean.

In [None]:
print("Boolean mask:", normal_distribution > 105)

print("Filtered by boolean indexes:", normal_distribution[normal_distribution > 105])

print("Filtered by np.extract:", np.extract(normal_distribution > 105, normal_distribution))

You may also want to get the **indexes** instead of the values.

You can use:
* Python built-in `enumerate(iterable)` function. It will return a list of tuple `(i, x)` where `i` is the index of the element `x` in the list.
* NumPy "where" function: `np.where(condition)`

In [None]:
# TODO: Use both functions to print the indexes of the elements with values above 105.

print()

print()

You can combine conditions by using logical "and" with `&` and "or" with `|`

In [None]:
print(normal_distribution[(normal_distribution > 100) | (normal_distribution < 95)])

# TODO: Print the values that are higher than 95 but lower than 100

print()

## Array Creation

There are several ways to make NumPy arrays. An array has three particular attributes that can be queried: shape, size and the number of dimensions.

Don't forget to read the functions documentation with Shift+Tab to better understand them.

In [None]:
a = np.array([1, 2, 3])
print(a.shape)
print(a.size)
print(a.ndim)

In [None]:
x = np.arange(100)
print(x.shape)
print(x.size)
print(x.ndim)

In [None]:
y = np.random.rand(5, 80)
print(y.shape)
print(y.size)
print(y.ndim)