## Basics of Python for Machine Learning

- This exercise gives you a brief introduction to Python. Even if you've used Python before, this will help familiarize you with the functions we'll need.  

- Even if you know basics of python, this blog will help you to get to a place where you will get familiar with how to use python for ML.

- As we are developing the skill for ML with python, we need to make sure that we write an efficient code for model development. In ML, efficient code can be considered as the code without for loops.
---
#### Why?
ML is the process of learning things, which takes multiple trials. We cant learn a topic just by reading once there is a process of revision, which is a iterative process. If you know coding, you will get to know that iterative process needs for loop as their syntax. But for loop is not efficient in this context as it consumes lot of time. But there are some situations like epochs where you need for loop explicitly cant do anything there.

- After reading and trying the code in this blog, you will be able to use iPython notebooks and able to use numpy functions and matrix/vector operations and also understand the concept of "broadcasting". "Broadcasting" is one of the important concept in python that will be frequently used in ML.

In [1]:
test = "Hello World!"
print("test: " + test)

test: Hello World!


- Run your cells using SHIFT + ENTER

## Building basic functions with numpy

- Numpy is one of the main package in python for scientific computing. It is maintained by large community that is www.numpy.org.
- In this blog, we will learn several numpy functions like 'np.exp', 'np.log', and 'np.reshape'.

In [2]:
# Lets develop the sigmoid function using numpy
## We use np.exp() in this function.But if i put numpy aside as I learnt python I know one of the function in math module that is math.exp().
### Okay then first lets try with math.exp()
import math
def basic_sigmoid(x):
    s = 1/(1+math.exp(-x))
    return s

print("basic_sigmoid(1) = " + str(basic_sigmoid(1)))

basic_sigmoid(1) = 0.7310585786300049


In [3]:
## Okay math.exp() is working good with real number as the input
### What if input is matrix which is widely used in deep learning?
## Lets see
def basic_sigmoid(x):
    s = 1/(1+math.exp(-x))
    return s

print("basic_sigmoid(1) = " + str(basic_sigmoid([1,2,3])))

TypeError: bad operand type for unary -: 'list'

In [8]:
# You can see an error saying input is list
## Now lets implement the same function with numpy
import numpy as np
def sigmoid(x):
    s = 1/(1+np.exp(-x))
    return s

t_x = np.array([1, 2, 3])
print("sigmoid(t_x) = " + str(sigmoid(t_x)))

sigmoid(t_x) = [0.73105858 0.88079708 0.95257413]


In [9]:
# You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation.

In [12]:
## Lets try to impolement Sigmoid Gradient
### We know that gradients are usefull to optimize the loss function using backpropagation
## d(sigmoid) = sigmoid * (1 - sigmoid)
def sigmoid_derivative(x):
    ds = sigmoid(x)
    return ds * (1 - ds)

sigmoid_derivative(1)

0.19661193324148185

In [13]:
sigmoid_derivative(np.array([1,2,3]))

array([0.19661193, 0.10499359, 0.04517666])

- Two common functions used in deep learning are np.shape() and np.reshape()
x.shape() -> tells the shape of the matrix/vector x.
x.reshape() -> can change the shape of the matrix/vector x

---

- Lets take an example, in computer sciene image is represented as 3d array of shape $(length, height, depth = 3)$. When you read this image in the python, the algorithm will convert the shape of the image x into $(length*heght*3,1)$. In other words you will convert the 3D image into 1D vector

In [15]:
t_image = np.array([[[ 0.67826139,  0.29380381],
                     [ 0.90714982,  0.52835647],
                     [ 0.4215251 ,  0.45017551]],

                   [[ 0.92814219,  0.96677647],
                    [ 0.85304703,  0.52351845],
                    [ 0.19981397,  0.27417313]],

                   [[ 0.60659855,  0.00533165],
                    [ 0.10820313,  0.49978937],
                    [ 0.34144279,  0.94630077]]])

t_image.shape

(3, 3, 2)

In [19]:
print(f"t_image[0]: {t_image[0]}")
print(f"t_image[1]: {t_image[1]}")
print(f"t_image[2]: {t_image[2]}")

t_image[0]: [[0.67826139 0.29380381]
 [0.90714982 0.52835647]
 [0.4215251  0.45017551]]
t_image[1]: [[0.92814219 0.96677647]
 [0.85304703 0.52351845]
 [0.19981397 0.27417313]]
t_image[2]: [[0.60659855 0.00533165]
 [0.10820313 0.49978937]
 [0.34144279 0.94630077]]


In [21]:
# image2vector
# image2vector() is the function that takes input of shape (legnth, height,3)
# It will return the vector as (length*height*3,1)
def image2vector(image):
    v = image.reshape(image.shape[0]*image.shape[1]*image.shape[2],1)
    return v

In [22]:
print ("image2vector(image) = " + str(image2vector(t_image)))

image2vector(image) = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]


## Normalizing Rows
- Another important technique that we use in ML is to normalize the data.
- It makes the performance better as the "gradient descent converges faster after normalization".
-  Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).

For example, if 
 $$x = \begin{bmatrix}
         0 & 3 & 4 \\
        2 & 6 & 4 \\
 \end{bmatrix}\tag{3}$$ 
 then 
 $$\| x\| = \text{np.linalg.norm(x, axis=1, keepdims=True)} = \begin{bmatrix}
     5 \\
     \sqrt{56} \\
 \end{bmatrix}\tag{4} $$
 and
 $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
     0 & \frac{3}{5} & \frac{4}{5} \\
     \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
 \end{bmatrix}\tag{5}$$ 

Note that you can divide matrices of different sizes and it works fine: this is called broadcasting 
-  With `keepdims=True` the result will broadcast correctly against the original x.
- `axis=1` means you are going to get the norm in a row-wise manner. If you need the norm in a column-wise way, you would need to set `axis=0`.
- numpy.linalg.norm has another parameter `ord` where we specify the type of normalization to be done

In [8]:
# Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).
import numpy as np
def normalize_rows(x):
    norm = np.linalg.norm(x, axis=1, keepdims=True)
    print("X_norm shape: " + str(norm.shape))
    x = x/norm
    print("X shape: " + str(x.shape))
    return x

In [9]:
x = np.array([[0., 3., 4.],
              [1., 6., 4.]])
print("normalizeRows(x) = " + str(normalize_rows(x)))

X_norm shape: (2, 1)
X shape: (2, 3)
normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


- So you can see X shape has 3 columns but the X_norm has 1 column but still the division has been done. How? It is broadcasting, we will see it in the next topics.

## Softmax

*Softmax is a mathematical function that converts a vector of real numbers into a probability distribution, where each value is between 0 and 1, and all values sum to 1*

- It is a normalization function specifically designed for multi-class classification problems.
- In the further blogs I will explain in depth what is normalization and softmax and why we are doing them etc.

**Instructions**:
- $\text{for } x \in \mathbb{R}^{1\times n} \text{,     }$

\begin{align*}
 softmax(x) &= softmax\left(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}\right) \\&= \begin{bmatrix}
    \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} 
\end{align*}

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  

\begin{align*}
softmax(x) &= softmax\begin{bmatrix}
            x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
            x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
            \vdots & \vdots & \vdots & \ddots & \vdots \\
            x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
            \end{bmatrix} \\ \\&= 
 \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} \\ \\ &= \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    \vdots  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} 
\end{align*}

In [14]:
# m -> represent number of training examples
# there is a column matrix for each training example
# And each feature will be its own row.

# Softmax will be performed on every feature of every training example
# That means it will be performed on columns
def softmax(x):
    x_exp = np.exp(x)
    print("x_exp shape: " + str(x_exp.shape))
    x_sum = np.sum(x_exp,axis=1,keepdims=True)
    print("x_sum shape: " + str(x_sum.shape))
    s = x_exp/x_sum
    
    return s

In [15]:
t_x = np.array([[9, 2, 5, 0, 0],
                [7, 5, 0, 0 ,0]])
s_x = softmax(t_x)
print("t_x shape: " + str(t_x.shape))
print("s_x shape: " + str(s_x.shape))
print("softmax(x) = " + str(s_x))

x_exp shape: (2, 5)
x_sum shape: (2, 1)
t_x shape: (2, 5)
s_x shape: (2, 5)
softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]


- You can see x_exp and x_sum have different shapes but we divided them and the python used broadcasting to solve this problem.

**By now we learnt the basics of python numpy and implemented some useful functions that will help you in developing machine learning or deep learning models.**

## Vectorization

Vectorization is key in deep learning because it makes your code run faster. When working with large datasets, using the right functions can save you a lot of time.

Vectorization helps avoid loops and makes your code more efficient.

- Lets say you are the teacher of class consitsting 30 students and you have to analyze and calculate their total marks for the final exam. And the final exam student sheet consits of 10 subjects(columns) and 30 rows(students). Now you need to add everything one by one using pen but if you use calculator it saves lot of time.
- In the same way if you use vectotrization for the huge computations in ML/DL, you can save up lot's of time.
- I will implement and show you how much difference you can see.

In [25]:
# Let’s take a simple example—computing the dot product 
#of two vectors (which happens in neural networks all 
#the time).

import time

x1 = [1, 2, 3, 4]
x2 = [5, 6, 7, 8]

tic = time.perf_counter()
dot_product = sum(x1[i] * x2[i] for i in range(len(x1)))
toc = time.perf_counter()

print("Dot Product:", dot_product)
print("Time Taken:", (toc - tic) * 1000, "ms")  # Convert to milliseconds

Dot Product: 70
Time Taken: 0.08309999975608662 ms


In [26]:
# Vectorized approach using numpy
import time

x1 = [1, 2, 3, 4]
x2 = [5, 6, 7, 8]

tic = time.perf_counter() # time.perf_counter() is more precise than time.process_time()
dot_product = sum(x1[i] * x2[i] for i in range(len(x1)))
toc = time.perf_counter()

print("Dot Product:", dot_product)
print("Time Taken:", (toc - tic) * 1000, "ms")  # Convert to milliseconds



Dot Product: 70
Time Taken: 0.05019999935029773 ms


In [27]:
# you can see the difference in time taken

## Numpy version of L1 Loss and L2 Loss

- Loss functions measure how well a model's predictions match the actual values. Let’s implement two common loss functions using vectorization.
**L1 loss is defined as:**
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^{m-1}|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$

In [28]:
def L1(yhat, y):
    
    loss = np.sum(np.abs(y - yhat))
    
    return loss

In [29]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat, y)))

L1 = 1.1


**L2 loss is defined as:**
$$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^{m-1}(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$

In [30]:
def L2(yhat, y):

    a = y - yhat
    loss = np.sum(np.dot(a,a))
    
    return loss

In [31]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

print("L2 = " + str(L2(yhat, y)))

L2 = 0.43
