# Notes from the *Python Basics with Numpy v3* assignment

In [94]:
import math, numpy as np

## 1. Building basic functions with numpy

### 1.1. sigmoid function, np.exp()

The sigmoid function is sometimes known as the logistic function. It is used in machine learning for logistic regression, and also in deep learning. 

![sigmoid.png](./sigmoid.png?raw=true)

In deep learning we mostly use matrices and vectors, so the numpy library is more useful than the math library. Instead of math.exp(), we want to use np.exp() as it applies the exponential function to every element of z. 

In [95]:
def sigmoid(z):
    s = 1 / (1 + np.exp(-z))
    return s

In [96]:
z = np.array([1, 2, 3])
sigmoid(z)

array([0.73105858, 0.88079708, 0.95257413])

### 1.2. Sigmoid gradient

We compute gradients to optimize loss functions using backpropagation. 

This function computes the gradient of the sigmoid function with respect to its input z. 

$$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))$$

In [97]:
# Step 1. Find the sigmoid of z
z = np.array([1, 2, 3])
s = sigmoid(z)

# Step 2. Compute the sigmoid derivative
ds = s * (1 - s)
print(ds)

[0.19661193 0.10499359 0.04517666]


### 1.3. Reshaping arrays

Two common functions used in deep learning:

- `np.shape` — used to get the shape (dimension) of a matrix/vector 
- `np.reshape()` — used to reshape a matrix/vector into some other dimension

To unroll, or reshape a 3D array into a 1D vector:

In [98]:
# Initialize 3D array  
arr = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print(arr)

[[[0.67826139 0.29380381]
  [0.90714982 0.52835647]
  [0.4215251  0.45017551]]

 [[0.92814219 0.96677647]
  [0.85304703 0.52351845]
  [0.19981397 0.27417313]]

 [[0.60659855 0.00533165]
  [0.10820313 0.49978937]
  [0.34144279 0.94630077]]]


In [99]:
# Get dimensions
print("Dimensions: ", arr.shape, "\n")

# Reshape the 3D array into 1D vector using its dimensions
dimensions = 1
for i in range(0, len(arr.shape)):
    dimensions *= arr.shape[i]
reshaped_arr = arr.reshape(dimensions, 1)
print(reshaped_arr)

Dimensions:  (3, 3, 2) 

[[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]


### 1.4. Normalizing rows

When we normalize our data, we divide each row vector of matrix x by its norm.

For example, if 

$$x = \begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}$$

then the norm of each row is 

$$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}$$

and x normalized is 

$$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}$$

Implement normalization of matrix x:

In [100]:
# Initialize matrix
x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print(x)

[[0 3 4]
 [1 6 4]]


In [101]:
# Normalize the rows of the matrix
x_norm = np.linalg.norm(x, axis=1, keepdims=True)
x = x/x_norm
print(x)

[[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


### 1.5. Broadcasting and the softmax function

Broadcasting documentation: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

An important concept in numpy, broadcasting is useful for performing mathematical operations between arrays of different shapes. We briefly discussed this in the first notebook from this week. 

Softmax is a function that can be used to normalize a matrix. More about softmax: https://en.wikipedia.org/wiki/Softmax_function

The function is

$$\frac{e^{x_n}}{\sum_{j}e^{x_j}}$$

where n is the number of features (or columns) and j is the number of instances (or rows).

Calculate the softmax for each row of matrix x, which should automatically use numpy broadcasting:

In [102]:
# Initialize matrix
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0, 0]])
print(x)

[[9 2 5 0 0]
 [7 5 0 0 0]]


In [103]:
# Apply exp() to each element of x
x_exp = np.exp(x)
print(x_exp)
print("Shape of the softmax numerator: ", x_exp.shape)

[[8.10308393e+03 7.38905610e+00 1.48413159e+02 1.00000000e+00
  1.00000000e+00]
 [1.09663316e+03 1.48413159e+02 1.00000000e+00 1.00000000e+00
  1.00000000e+00]]
Shape of the softmax numerator:  (2, 5)


In [104]:
# Sum each row of x_exp
x_sum = np.sum(x_exp, axis=1, keepdims=True)
print(x_sum)
print("Shape of the softmax denominator: ", x_sum.shape)

[[8260.88614278]
 [1248.04631753]]
Shape of the softmax denominator:  (2, 1)


When we divide the two matrices to compute the softmax of each row, numpy automatically applies broadcasting to expand the shape of the softmax denominator into (2, 5), to make this division possible.

In [105]:
# Compute softmax of each row
s = x_exp / x_sum
print(s)

[[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]


## Vectorization

### 2.1. Implement the L1 and L2 loss functions

In deep learning, we use optimization algorithms like gradient descent to train our model and to minimize the loss/cost. The loss is used to evaluate the performance of our model. The bigger the loss, the more different the predictions (ŷ) are from the true values (y). 

L1 loss is defined as

$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}$$

In [106]:
# Initialize predicted values and true values
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

# Implement the numpy vectorized version of the L1 loss
loss = np.sum(np.abs(y - yhat))
print(loss)

1.1


L2 loss is defined as

$$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}$$

In [107]:
# Implement the numpy vectorized version of the L2 loss
loss = np.sum(np.power((y - yhat), 2))
print(loss)

0.43
