# Python Basics with NumPy

Welcome to your first assignment. This assignment gives you a brief introduction to NumPy library in Python. Please complete following exercises to collect your credits.
1. Exercise 1.1.1 (5%)
2. Exercise 1.1.2 (5%)
3. Exercise 1.2 (10%)
4. Exercise 1.3 (10%)
5. Exercise 1.4 (10%)
6. Exercise 2.1 (40%)
7. Exercise 2.2 (20%)


**Instructions:**
- Avoid using for-loops and while-loops, unless you are explicitly told to do so.
- Write your code between the ### START CODE HERE ### and ### END CODE HERE ### comments. **Do not modify code out of the designated area.**
- After coding your function, run the cell to check if your result is correct.

**After this assignment you will:**
- Be able to use iPython Notebooks
- Be able to use numpy functions and numpy matrix/vector operations
- Understand the concept of "broadcasting"
- Be able to vectorize code

Let's get started!

## 1 - NumPy Functions ##

[NumPy](https://numpy.org/) is an open source project aiming to enable numerical computing with Python.  It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. 

In this exercise you will learn how to construct customized functions using NumPy. 

### 1.1 - Sigmoid function ###
$\sigma(x) = \frac{1}{1+e^{-x}}$ is also known as the logistic function. It is usually used as an non-linear activation in deep learning.

![](https://mathworld.wolfram.com/images/eps-svg/SigmoidFunction_701.svg)

#### **(5%) Exercise 1.1.1: Build a sigmoid function using `math` library**
The function returns the sigmoid of a real number x. Use **`math.exp(x)`** for the exponential function.

In [1]:
import math

def math_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- A scalar

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + math.exp(-x))
    ### END CODE HERE ###
    
    return s

y = math_sigmoid(3)
print(y)

0.9525741268224334


> Expected Output: 
```python
0.9525741268224334
```

Actually, we rarely use `math` library in deep learning because we mostly use matrices and vectors, instead of real numbers. This is why numpy is more useful. 

In [None]:
# You'll get an error since math library cannot deal with vectors
x = [1, 2, 3]
math_sigmoid(x) 

By using numpy, x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. If $ x = (x_1, x_2, ..., x_n)$ then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$. Also, $x + 3$ or $\frac{1}{x}$ will perform elementwise operations. And the output should be a vector with the same size as x.

In [None]:
import numpy as np

x = np.array([1, 2, 3])
print(np.exp(x))
print(x + 3)
print(1 / x)

[ 2.71828183  7.3890561  20.08553692]
[4 5 6]
[1.         0.5        0.33333333]



#### **(5%) Exercise 1.1.2: Implement the sigmoid function using numpy.** 
The function returns the sigmoid of either a real number, or a vector, or a matrix. 
$$ \text{For } x \in \mathbb{R}^n \text{,     } \sigma(x) = \sigma\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1} $$

In [2]:
import numpy as np 

def numpy_sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s =  1 / (1 + np.exp(-x))
    ### END CODE HERE ###
    
    return s

x = np.array([1, 2, 3])
numpy_sigmoid(x)

array([0.73105858, 0.88079708, 0.95257413])

> Expected Output:
```python
array([0.73105858, 0.88079708, 0.95257413])
``` 


### 1.2 - Sigmoid gradient

Gradient is the essential tool to perform backpropagation in deep learning. You will need to compute gradients to optimize loss functions.

#### **(10%) 1.2 Exercise: Compute the gradient of the sigmoid function**
The formula is: $$\sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$

In [3]:
def grad_sigmoid(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    s = 1 / (1 + np.exp(-x))
    ds = s * (1 - s)
    ### END CODE HERE ###
    
    return ds

x = np.array([1, 2, 3])
print (f"grad(x) = {grad_sigmoid(x)}")

grad(x) = [0.19661193 0.10499359 0.04517666]


> Expected Output: 
```python
grad(x) = [0.19661193 0.10499359 0.04517666]
```


### 1.3 - Reshaping arrays ###

Two common numpy functions used in deep learning are [np.shape](https://numpy.org/doc/stable/reference/generated/numpy.shape.html) and [np.reshape()](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html). 
- X.shape is used to get the shape (dimension) of a matrix/vector X. 
- X.reshape(...) is used to reshape X into some other dimension. 

For example, in computer vision, a colored image is usually represented by a 3-dimensional array with shape $(width, height, 3)$. You can convert it into a vector, or 1-dimensional array with shape $(width*height*3, 1)$.

![](https://eli.thegreenplace.net/images/2015/row-major-3D.png)

#### **(10%) Exercise: 1.3 Build a function to convert an image into a vector**
The function takes an input of shape (width, height, 3) and returns a vector of shape (width\*height\*3, 1). 
**Please don't hardcode the dimensions of image as a constant. Instead, look up the quantities you need with `image.shape[0]`, etc.** 

In [15]:
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)
    ### END CODE HERE ###
    
    return v

image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print (f"vectorized image = {image2vector(image)}")

vectorized image = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]


>Expected Output: 
```python
vectorized image = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]
 ```


### 1.4 - Normalization

A common technique used in Deep Learning is normalization. The L-2 norm of a vector can be calculated as $\|x\|_2=\sqrt{\Sigma_{i}{x_i}^2}$. Normalization means changing a vector $x$ to $\frac{x}{\| x\|}$ (dividing each row of x by its norm).

For example, if 

$$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ 

then 
$$\| x\| = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$

and        
$$ normalized\_x = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ 

You can use `np.linalg.norm(x, axis = 1, keepdims = True)` to calculate norm of row vectors in $x$.

Although $x$ is a 2-d array and $\|x\|$ is a 1-d arrary, the operation in (4) can be easily performed. This is called broadcasting. For the full details on broadcasting, you can read the official [broadcasting documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html).


#### **(10%) Exercise 1.4: Implement row normalization.** 
After applying this function to an input matrix x, each row of x should be a unit vector (length=1).

In [6]:
def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)    
    y = np.linalg.norm(x, axis=1, keepdims=True)
    x = x / y
    # Divide x by its norm.
    
    ### END CODE HERE ###

    return x

x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print(f"normalizeRows(x) = {normalizeRows(x)}")

normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


>Expected Output: 
```python
normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]
 ```

## 2. Vectorization

### 2.1 NumPy optimized vector operations
In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is  computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.

In [None]:
import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print(f"dot = {dot}\n----- Computation time = {1000*(toc - tic)} ms\n")

### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)-by-len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print(f"outer = {outer}\n----- Computation time = {1000*(toc - tic)} ms\n")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print(f"elementwise multiplication = {mul}\n----- Computation time = {1000*(toc - tic)} ms\n")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print(f"gdot = {gdot}\n----- Computation time = {1000*(toc - tic)} ms\n")


dot = 278
----- Computation time = 0.1961709999998007 ms

outer = [[81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [63. 14. 14. 63.  0. 63. 14. 35.  0.  0. 63. 14. 35.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0

#### **(40%) Exercise 2.1 Optimize vector operations using NumPy functions**
Find corresponding NumPy functions for above operations (dot product, outer product, elementwise multiplication). Use NumPy functions to re-calculate these operations and compute the time comsumptions. 

In [18]:
import time
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
### START CODE HERE ###
dot = np.dot(x1, x2, out=None)
### END CODE HERE ###
toc = time.process_time()
print(f"dot = {dot}\n----- Computation time = {1000*(toc - tic)} ms\n")  # 10%

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
### START CODE HERE ###
outer = np.outer(x1, x2, out=None)
### END CODE HERE ###
toc = time.process_time()
print(f"outer = {outer}\n----- Computation time = {1000*(toc - tic)} ms\n")  # 10%

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
### START CODE HERE ###
mul = np.multiply(x1, x2)
### END CODE HERE ###
toc = time.process_time()
print(f"elementwise multiplication = {mul}\n----- Computation time = {1000*(toc - tic)} ms\n")  # 10%

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
### START CODE HERE ###
W = np.random.rand(3,len(x1))
gdot = np.dot(W,x1)
### END CODE HERE ###
toc = time.process_time()
print(f"gdot = {gdot}\n----- Computation time = {1000*(toc - tic)} ms\n")  # 10%


dot = 278
----- Computation time = 0.0 ms

outer = [[81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [63 14 14 63  0 63 14 35  0  0 63 14 35  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]
----- Computation time = 0.0 ms

elementwise multiplication = [81  4 10  0  0 63 10  0  0  0 81  4 25  0  0]
----- Computation time = 0.0 ms

gdot = [23.20870724 20.28636183 23.05462684]
----- Computation time = 0.0 ms



As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, the differences in running time become even bigger. 


### 2.2 L1 and L2 loss function 

#### **(20%) Exercise 2.2: Implement the the L1 and L2 loss functions** 
Realize L1 and L2 loss function using NumPy vectorizing. 

**Reminder**:
- The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
- L1 loss is defined as:
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$
- L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$

In [21]:
# GRADED FUNCTION: L1

def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.sum(np.abs(y - yhat))
    ### END CODE HERE ###
    
    return loss

def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.dot(np.abs(y - yhat), np.abs(y - yhat))
    ### END CODE HERE ###
    
    return loss

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print(f"L1 Loss = {L1(yhat, y)}")  # 10%
print(f"L2 Loss = {L2(yhat, y)}")  # 10%

L1 Loss = 1.1
L2 Loss = 0.43


>Expected Output:
```python
L1 Loss = 1.1
L2 Loss = 0.43
```


#Congratulations! You have finished this assignment!

