First, we will import the necessary libraries.

In [None]:
import numpy as np

Run the command above: select the cell and type Ctrl + Enter.

If there is no error or output, congratulations : your package is loaded and you can proceed to the next section!

# Lesson 01: NumPy

Data Visualization needs Data, obviously. The domain is closely related to Data Science and Machine Learning and thus, statistics.

Before we start to plot visualizations, we need to be able to do simple operations on our data.

First, we will use very simple data.

In [None]:
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
]

matrix = np.array(matrix)

`matrix`, which was a built-in Python list, is now loaded as a Numpy.NpArray object.

"Numpy Arrays" allows a lot of operations that are not implemented in Python lists. 

Feel free to familiarize yourself with Numpy functionnalities by reading the official documentation from their website. 

This is not necessary to complete this workshop, but interesting to better understand the behaviour of the functions introduced, and learn more of what NumPy can do for you.

## Jupyter Notebook Tips

* You can access the documentation of a specific function in a code block by typing Shift + Tab: click on `np.array` above and try it!
* To expand the documentation, click on the arrow or press Shift + Tab a second time
* Tab-completion: Start typing a command (like `np.arr`) and press Tab, you will see a list of `np` methods starting with `arr`.

## Slicing & Indexing with NumPy

Slicing has multiple usages:

* In machine learning, to separate a big dataset in multiple sets.
* In data mining, to extract the data you need from a big or dynamic content
* In data visualization, to be able to filter the data the way you want

The indexing syntax is the same as for lists: `[1, 2, 3][i]` will return the number at the index `i`. Indexing starts from 0.

In [None]:
[1, 2, 3][0]


The basic slice syntax is `i:j:k` where `i` is the starting index, `j` is the stopping index, and `k` is the step, where k can not be 0.

Let `n` be the length of the string or array (the number of elements / characters).

By default `i` = 0, `j` = `n`, `k` = 1.

Indices can be optional, meaning `:` can be used to select all the elements (from 0 to `n` with a step of 1).

`i` and `j` can be negative, they are interpreted as `n + i` and `n + j`.

Examples:

In [None]:
print("Get every line:\n", matrix[:])
print()

print("Get the first line (i=0):\n", matrix[0])
print("Get the last line (i=-1):\n", matrix[-1])
print()

print("Get the first two lines (j=2):\n", matrix[:2])
print("Get the second and third lines (i=1, j=3):\n", matrix[1:3])

### Your turn!

How do we get the two last lines of `matrix`?

In [None]:
# TODO: Print the last two lines of `matrix`.

print(matrix)

More examples, using `k`

In [None]:
print("Get every odd line (k=2):\n", matrix[::2])
print("Get every even line (i=1, k=2):\n", matrix[1::2])
print("Get every line, in reverse order (k=-1):\n", matrix[::-1])

In [None]:
# TODO: Print the first two lines, in reverse order.

print(matrix)

## Slicing multidimensional arrays

You can use the Indexing syntax along **multiple axis**.

You can access axis elements using commas: `matrix[:, :]` is equivalent to `matrix`: it selects every elements from axis 0, meaning the *lines*, and all the elements from axis 1, meaning the *columns*.

But `matrix[:, :, :]` will give an error: `matrix` only have 2 axis.

What we did before was equivalent to use indexing syntax on the axis 0, meaning we could only works with *lines*.

First, let's work with the columns of `matrix`.

In [None]:
print("Get every (line) elements, but from the first column:\n", matrix[:, 0])
print("Get the first two elements of the first column:\n", matrix[:2, 0])
print("Get every (line) elements of the first column in reverse order:\n", matrix[::-1, 0])

In [None]:
print(matrix)
print("Shape of the matrix:", matrix.shape)
print(f"The first value of the shape is the number of rows ({matrix.shape[0]}) "
      f"and the second the number of columns ({matrix.shape[1]}).\n")

print("Get the second column:\n", matrix[:, 1])
print("Above shape:", matrix[:, 1].shape)

print("Get a matrix subset without the first column (starting from the second):\n", matrix[:, 1:])
print("Above shape:", matrix[:, 1:].shape)

print("Get a matrix subset of the first two lines and first two elements:\n", matrix[:2, :2])
print("Above shape:", matrix[:2, :2].shape)

print("Get a matrix subset window of the first two lines without the first column:\n", matrix[:2, 1:])
print("Above shape:", matrix[:2, 1:].shape)

In [None]:
# TODO: Print the last column of the matrix.

print(matrix)

Now, we're gonna work with a three-dimensional array. Let's define `d3array`

In [None]:
d3array = np.array([
       [[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],
       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],
       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

print("d3array shape:", d3array.shape)

**From now on, use only `d3array` to resolve exercises.**

You can think of it as a cube of shape (x, y, z).

x is the rows (vertical axis, starting from 0 and going down to N)  
y is the columns (horizontal axis, starting from 0 and going up to N)  
z is the plane (diagonal axis, starting from 0 and going up to N)

In [None]:
print("We can select all elements from all axis...")

d3array[:, :, :]

In [None]:
print("Select the plan (matrix (3, 3)) at x=0:\n", d3array[0, :, :])
print("Select the plan (matrix (3, 3)) at y=0:\n", d3array[:, 0, :])
print("Select the plan (matrix (3, 3)) at z=0:\n", d3array[:, :, 0])

In [None]:
# TODO: Print the intersection of the plan at x=0 and z=0

print()

Great!

Of course, you can combine indexing and slicing, as long as there is data left in your extract.

In [None]:
print("Get the first matrix:\n", d3array[0])
print("Get the last matrix without the first line:\n", d3array[-1][1:])
print("Get the second element of the last line the first matrix:\n", d3array[0][-1][2])
print("Get the second element of the last line the first matrix:\n", d3array[0, -1, 2])
print("Get the second element of the last line the first matrix:\n", d3array[0, -1, :][2])

In [None]:
# TODO: Print the first line of the first matrix without the last element

print()

# TODO: Print the third element of the first line of the second matrix

print()

### Using statistics

In case you forgot, a little reminder...

* The *mean* is the arithmetic average. The mean of a list `L = X1, X2, ... XN` is the sum of each element divided by the total number of elements `avg(L) = sum([X1, ... XN]) / N`.
* The *median* is the middle value of the ordered list. If `N` is odd, the median will be at the index `int(N/2)`, but if `N` is even, it will be the average of the value at index `int(N/2)` and the value at `int(N/2) + 1`.

These functions are provided to you by NumPy - in a more optimized way than if you had coded it yourself.

The mean can be obtained by the function `np.mean(L)` and the median by `np.median(L)`.

In [None]:
np.mean([1, 2, 3])

Remember how we selected rows or columns just before?

You can also ask that from numpy, by selecting an `axis` value. `axis=0` select the horizontal axis (said to be "column-wise'), whereas `axis=1` select the vertical one ("row-rise").

In [None]:
print("Means of each column:\n", np.mean(matrix, axis=0))
print("Means of each rows:\n", np.mean(matrix, axis=1))

print(np.mean(matrix))

### Your turn!

In [None]:
# TODO: Print the mean of the first line.
print()
# TODO: Print the mean of the last column.
print()
# TODO: Print the mean of the intersection of the first 2 rows and last 2 columns
print()

In [None]:
# TODO: Print the median of the last line.
print()
# TODO: Print the median of the first column.
print()
# TODO: Print the median of each row.
print()

## Filtering

To iterate over your npArray, you can use `np.nditer(L)`.

This way you can apply filtering operations on your data.

In [None]:
for x in np.nditer(matrix):
    if x > 5:
        print(x)

You can also use 
* Boolean indexes: `data[condition]`
* The function "extract" from NumPy: `np.extract(condition, data)`

Note: The condition is not a function but a series of Boolean.

In [None]:
print("Boolean mask:", matrix > 5)

print("Filtered by boolean indexes:", matrix[matrix > 5])

print("Filtered by np.extract:", np.extract(matrix > 5, matrix))

You may also want to get the **indexes** instead of the values.

You can use:
* Python built-in `enumerate(iterable)` function. It will return a list of tuple `(i, x)` where `i` is the index of the element `x` in the list.
* NumPy "where" function: `np.where(condition)`

In [None]:
# TODO: Use both functions to print the indexes of the elements with values above 105.

print()

print()

You can combine conditions by using logical "and" with `&` and "or" with `|`

In [None]:
print(matrix[(matrix > 3) | (matrix < 8)])

# TODO: Print the values that are higher than 3 but lower than 8

print()

## Array Creation

There are several ways to make NumPy arrays. An array has three particular attributes that can be queried: shape, size and the number of dimensions.

Don't forget to read the functions documentation with Shift+Tab to better understand them.

In [None]:
a = np.array([1, 2, 3])
print(a.shape)
print(a.size)
print(a.ndim)

In [None]:
x = np.arange(100)
print(x.shape)
print(x.size)
print(x.ndim)

In [None]:
y = np.random.rand(5, 80)
print(y.shape)
print(y.size)
print(y.ndim)

This is the end of your NumPy initiation. 

If you have no questions or confusion to share with the class, you can now move on to Lesson 02.