# Python & Numpy basics

This notebook will give you a brief introduction to the Python language. Even if you've already coded in Python before, it will be helpful to familiarize yourself with the methods you will need for this workshop.

**Instructions:**
- You will use Python 3
- Don't use the 'for' or 'while' loops unless you are explicitly told to
- After having coded your function, run the following cell in order to verify your result

**At the end of this workshop, you will have learnt to:**
- Use iPython notebooks
- Use numpy methods for vector and matrix operations
- Understand the concept of broadcasting
- Vectorize code

If you are stuck or don't understand a concept, don't hesitate to come ask for help. **That's why we're here !**

## About iPython Notebooks ##

iPython notebooks are interactive code environments within a website. They allow you to execute code line by line and be able to directly visualize the content of variables or a result for example. You will use iPython notebooks for most of the workshops. You only need to write code between the `### Start of code ###` and `### End of code ####`. After having written your code, you can execute the cell either by pressing SHIFT+ENTER or by clicking 'Run' ( ▶️ button in the top bar ).

You will often come across "~ X lines of code". They are simply estimations of how many lines you need: feel free to write more or less.

**Exercise**: DeEnde the `test` variable as "Hello world !" in order to display "Hello world !" and execute that following two cells.

In [None]:
### Start of code ### (≈ 1 line of code)
test = None
### End of code ###

In [None]:
print ("test: " + test)

**Expected result**:
test: Hello world !

<font color='blue'>
    
**What you should remember**:
- Execute your cells by pressing SHIFT+ENTER (or by clicking Run)
- Write your code only inside the intended cells
- Don't modify the other cells

## 1 - Create basic functions with Numpy ##

Numpy is the official package for scientific calculation in Python. It is maintained by a large community [](www.numpy.org).
In this exercise, you will discover multiple key numpy methods such as `np.exp`, `np.log` and `np.reshape`. Remember them well because they will be useful for future workshops.

### 1.1 - np.exp(), the sigmoid function ###

Before using np.exp(), you will use math.exp() to implement the sigmoid function. This will help you understand why np.exp() is preferable to math.exp().

**Exercise**: Create a method that returns the sigmoid of a real number x. Use math.exp() for the exponential function.

**Rappel**:
$sigmoid(x) = \frac{1}{1+e^{-x}}$ is also known as the logistic function. It is a non linear function which isn't only used in Machine Learning (cf. logistic regression) but also in Deep Learning.

<img src="images/Sigmoid.png" style="width:500px;height:228px;">

In order to refer to a method belonging to a specific package, you can call it using `package_name.function()`. Execute the below code to see and example with `math.exp()`.

In [None]:
import math

def sigmoid_math(x):
    """
    Arguments:
    x -- a scalar

    Return:
    s -- sigmoid_math(x)
    """
    
    ### Start of code ### (≈ 1 line of code)
    s = None
    ### End of code ###
    
    return s

In [None]:
sigmoid_math(2.5)

**Expected result**: 
0.9241418199787566

The `math` library is rarely used in deep learning because the method inputs are real numbers. In deep learning, we will mostly use matrices and vectors. That's where you can see the appeal for numpy.

In [None]:
### One of the reasons we use "numpy" rather than "math" ###
x = [1, 2, 3]
sigmoid_math(x) # an error should pop up if you execute this cell
                # That is because x is a vector

If $ x = (x_1, x_2, ..., x_n)$ is a vector, then $np.exp(x)$ will apply the exponential function to every element of x. The result will therefore be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$

In [None]:
import numpy as np

# example of np.exp
x = np.array([1, 2, 3])
print(np.exp(x)) # the result is (exp(1), exp(2), exp(3))

Also, if x is a vector, then a Python operation like $s = x + 3$ ou $s = \frac{1}{x}$ will return a vector of the same dimensions as x.

In [None]:
# example of a vector operation
x = np.array([1, 2, 3])
print (x + 3)

Anytime you need additional information on a numpy function, we encourage you to check out [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html).

You can also create new cell in this notebook and write `np.exp?` for example to get fast access to the docs.

**Exercise**: Implement the sigmoid function using numpy.

**Instructions**: x can be a real number, a vector or a matrix. The data structure that is used in numpy to represent these dimensions (vectors, matrices, etc.) are called numpy arrays. You don't need to know more for now.
$$ \text{For each x } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1} $$

In [1]:
import numpy as np # np is an abbrevation of numpy. You will be able to write np.exp() instead of numpy.exp()

def sigmoid(x):
    """
    Arguments:
    x -- a scalar or a numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### Start of code ### (≈ 1 line of code)
    s = None
    ### End of code ###
    
    return s

In [None]:
x = np.array([1.5, 2.5, 3.5])
sigmoid(x)

**Expected result**: 
array([0.81757448, 0.92414182, 0.97068777])


### 1.2 - Sigmoid gradient

As you heard at the start of the workshop, you will need to calculate the gradients in order to optimize the cost function by using backpropagation.

**Exercise**: Implement the `sigmoid_gradient` function in order to compute the sigmoid function's gradient: $$dérivée\_sigmoïde(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$
You will code this function in 2 steps:
1. Define s as the sigmoid of x. The sigmoid(x) method might be useful.
2. Compute $\sigma'(x) = s(1-s)$

In [2]:
def sigmoid_deriv(x):
    """    
    Arguments:
    x -- a scalar or a numpy array

    Return:
    ds -- your gradient
    """
    
    ### Start of code ### (≈ 2 lines of code)
    s = None
    ds = None
    ### End of code ###
    
    return ds

In [None]:
x = np.array([1.5, 2.5, 3.5])
print ("sigmoid_deriv(x) = " + str(sigmoid_deriv(x)))

**Expected Output**: 
sigmoid_deriv(x) = [0.14914645 0.07010372 0.02845302]



### 1.3 - Reshaping Arrays ###

Two numpy methods that are often used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 
- X.shape allows you to get the shape (dimensions) of a matrix or a vector called X.
- X.reshape(...) is used to change X's shape



For example, in cmoputer science, an image is represented by a 3 dimensional array $(height, width, depth = 3)$.
However, when you use an image as an inout inside an algorithm, you need to convert it to a vector of shape (height\*width\*3, 1). In other words, you flatten or reshape the 3D array into a 1D array.


<img src="images/image2vector_kiank.png" style="width:500px;height:300;">

**Exercise**: Implement `image_to_vector()` which takes as input a matrix of shape (height, width, 3) and returns a vector of shape (longueur\*largeur\*3, 1). For example, if you want to reshape an array v of shape (a, b, c) into a vector of shape (a\*b, c), you'll write:
``` python
v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
```
- Please don't hardcode the image's dimensions as a constant ! Use `image.shape[0]`, etc to access the required values instead. 

In [None]:
def image_to_vector(image):
    """
    Argument:
    image -- a numpy array of shape (height, width, depth)
    
    Returns:
    v -- un vecteur de shape (height*width*depth, 1)
    """
    
    ### Start of code ### (≈ 1 line of code)
    v = None
    ### End of code ###
    
    return v

In [None]:
# An array of shape (3, 3, 2). In general, images will be of shape (nb_pix_x, nb_pix_y, 3) where depth corresponds to the three RGB values.
image = np.array([[[ 0.67126139,  0.29381281],
        [ 0.90714982,  0.52835547],
        [ 0.42245251 ,  0.45012151]],

       [[ 0.92814219,  0.96677647],
        [ 0.85114703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.6213595,  0.00531265],
        [ 0.1210313,  0.49974237],
        [ 0.3432129,  0.94631277]]])
print ("image_to_vector(image) = " + str(image_to_vector(image)))

**Expected result**: 

image_to_vector(image) = [[0.67126139]
 [0.29381281]
 [0.90714982]
 [0.52835547]
 [0.42245251]
 [0.45012151]
 [0.92814219]
 [0.96677647]
 [0.85114703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.6213595 ]
 [0.00531265]
 [0.1210313 ]
 [0.49974237]
 [0.3432129 ]
 [0.94631277]]

### 1.4 - Normalize your data

Another common technique that is used in Machine Learning and Deep Learning is data normalization. It often provides a better performance because the gradient descent will converge faster after normalization. Here, by normalization we mean replacing x by $ \frac{x}{\| x\|} $ (dividing each vector of x by its norm). 

For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ alors $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$et        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ 
Note that you can divide matrices of different sizes and it works well thanks to broadcasting which we will talk about in section 1.5.


**Exercise**: Implement normalize_rows() to normalize the lines of a matrix. After having applied this function to a matrix, each one of its lines must be a vector of unique size.

In [None]:
def normalize_rows(x):
    """
    Argument:
    x -- a numpy matrix x of shape (n, m)
    
    Returns:
    x -- the normalized matrix x
    """
    
    ### Start of code ### (≈ 2lines of code)
    # First compute x's norm. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    x_norm = None
    
    # Divide x by x_norm.
    x = None
    ### End of code ###

    return x

In [None]:
x = np.array([
    [2, 3, 6],
    [5, 2, 8]])
print("normalize_rows(x) = " + str(normalize_rows(x)))

**Expected result**: 

normalize_rows(x) = [[0.28571429 0.42857143 0.85714286]
 [0.51847585 0.20739034 0.82956136]]

**Note**:
In the normalize_rows function, you can try to print the shapes of x_norm and x and repeat the action as many times as you wish. You will observe that they can have different shapes. That's normal because x_norm takes the norm of each line of x. So x_norm has the same number of lines but only one column. How does this work when you divide x by x_norm ? That is what broadcasting is and we will learn more next !

### 1.5 - Broadcasting and softmax ####

Broadcasting is a very important concept to understand in numpy. It is a very useful notion to be able to make mathematical operations between arrays of different shapes. For more information, you can consult [the official broadcasting docs](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

**Exercise**: Implement the softmax function by using numpy. You can see softmax as a normalization function used when your algorithm needs to classify 2 classes or more.

Warning: Your code needs to be able to funtion with a vector but also a matrix

**Instructions**:
- $ \text{For each } x \in \mathbb{R}^{1\times n} \text{,     } softmax(x) = softmax(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}) = \begin{bmatrix}
     \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} $ 

- $\text{For a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ corresponds to the element of the $i^{th}$ line and the $j^{th}$ column of $x$, we therefore have: }$  $$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
    softmax\text{(first line of x)}  \\
    softmax\text{(second line of x)} \\
    ...  \\
    softmax\text{(last line of x)} \\
\end{pmatrix} $$

In [None]:
def softmax(x):
    """
    Argument:
    x -- A vector or a numpy matrix of shape (n,n)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,n)
    """
    
    ### Start of code ### (≈ 3lines of code)
    # Compute the exponential of each x element. Use np.exp()
    x_exp = None

    # Create an x_sum vector that contains the sum of each line of x_exp. Use np.sum(..., axis = 1, keepdims = True)
    x_sum = None
    
    # Compute softmax(x) by dividing x_exp by x_sum. That will automatically use numpy's broadcasting.
    s = x_exp / x_sum
    ### End of code ###
    print(x_exp.shape, x_sum.shape, s.shape)
    
    return s

In [None]:
x_vect = np.array([[9, 4, 0, 0 ,0]])

print("softmax(x_vect) = " + str(softmax(x_vect)))

x_matr = np.array([
    [1, 7, 5, 0, 6],
    [3, 4, 0, 2 ,0]])

print("softmax(x_matr) = " + str(softmax(x_matr)))


**Expected result**:

softmax(x_vect) = [[9.92941993e-01 6.69039052e-03 1.22538777e-04 1.22538777e-04
  1.22538777e-04]]

softmax(x_matr) = [[1.64525645e-03 6.63743823e-01 8.98279582e-02 6.05256022e-04
  2.44177707e-01]
 [2.38906644e-01 6.49415590e-01 1.18944614e-02 8.78888428e-02
  1.18944614e-02]]


**Note**:
- If you print the shapes of x_exp's, x_sum and s above this and rerun the cell, in the matrix' case you will see that x_sum is of shape (2, 1) while x_exp and s are of shape (2, 5). **x_exp/x_sum** works correctly thanks to broadcasting.

Congratulations ! You have now acquired a good comprehension of python and its numpy library and have implemented some functions which will be helpful in deep learning !


<font color='blue'>
    
**What you must remember:**
- np.exp(x) works for any np.array and applies the exponential function to all its elements
- the sigmoid function and its gradient
- image_to_vector is often used in deep learning
- np.reshape is often used. Later, you will see that its correct usage will prevent many bugs
- numpy has very efficient build-in functions
- broadcasting is extremely useful

## 2) Vectorisation

En deep learning, on manipule de très gros datasets. De ce fait, une fonction de calcul non optimisée peut présenter un poids lourd à votre algorithme et produira un modèle qui mettrait des années à s'éxécuter. Pour être sûr que votre code est suffisamment optimisé en terme de calcul, vous utiliserez le concept de vectorisation. Par exemple, essayez de me dire quelle est la différence entre les différentes implémentations de produits dot/outer/elementwise.

In deep learning, you manipulate many huge datasets. Therefore, a calculation function that isn't optimized can present a heavy toll on your algorithm and will produce a model which would takes years to run. To make sure your code is sufficiently optimized in terms of calculations, you shall use the concept of vectorisation. For example, try to tell me what the difference between the different implementations of dot/outer/elementwise products is.

In [None]:
import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### Classic implementation of a vector dot product ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot += x1[i] * x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms\n")

### Classic implementation of an outer product ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # creating a matrix of size len(x1)*len(x2) matrix with only zeroes
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i] * x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms\n")

### Classic implementation of element-wise ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i] * x2[i]
toc = time.process_time()
print ("multiplication élément-wise = " + str(mul) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms\n")

### Classic implementation of a general dot product ###
W = np.random.rand(3, len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms")

In [None]:
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### Numpy implementation of a dot product ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms\n")

### Numpy implementation of the outer ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms\n")

### Numpy implementation of the 'element-wise ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms\n")

### Numpy implementation of the general dot ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Temps de calcul = " + str(1000*(toc - tic)) + "ms")

As you can observe, numpy's implementation is cleaner and more efficient. For vectors and matrices of greater sizes, the difference in terms of computing time becomes greater.

### 2.1 Implement the loss L1 and L2 functions

**Exercise**: Implement the numpy version of the loss L1 function. The function abs(x) (absolute value of x) can prove very useful.

**Reminder**:
- The loss is used to evaluate the performance of your model. The greater the loss, the further apart your predictions ($ \hat{y} $) are from the real values ($y$). In deep learning, you will use the optimizer algorithms and the gradient descent in order to train your model and minize cost.
- L1 is defined as follows:

$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$

In [None]:
def L1_function(y_predicted, y):
    """
    Arguments:
    y_predicted -- vector of size m containing your predicted values
    y -- vector of size m containing the real values
    
    Returns:
    L1 -- loss L1 function value
    """
    
    ### Start of code ### (≈ 1 line of code)
    L1 = None
    ### End of code ###
    
    return L1

In [None]:
y_predicted = np.array([.8, 0.3, 0.2, .6, .2])
y = np.array([1, 1, 0, 1, 0])
print("L1 = " + str(L1_function(y_predicted,y)))

**Expected result**:
L1 = 1.7

**Exercise**: Implement the vectorised version of the loss L2 function. There are many ways to do it. You might find the np.dot() function useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, alors `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$.

- The loss L2 function is defined as follows: $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$

In [None]:
def L2_function(y_predicted, y):
    ### Start of code ### (≈ 1 line of code)
    L2 = None
    ### End of code ###
    
    return L2

In [None]:
y_predicted = np.array([.8, 0.3, 0.2, .6, .2])
y = np.array([1, 1, 0, 1, 0])
print("L2 = " + str(L2_function(y_predicted,y)))

**Expected result**: 
L2 = 0.77

Congratulations, you have finished this workshop ! We hope that his little exercise allowed you to familiarize yourself with Python and numpy, which constitutes a good base for future workshops.

<font color='blue'>

**What you must remember:**
- Vectorization is very important in deep learning. It allows you to have a better calculation performance and a more readable code
- The L1 and L2 functions
- You can now see other numpy functions like np.sum, np.dot, np.multiply, np.maximum, etc...