# Assignment 2
In this assignment, we will go through Perceptron, Linear Classifiers, Loss Functions, Gradient Descent and Back Propagation.


PS. this one is not from Stanford's course.



\

## Instructions
* This notebook contain blocks of code, you are required to complete those blocks(where required)
* You are required to copy this notebook ("copy to drive" above) and complete the code.(DO NOT CHANGE THE NAME OF THE FUNCTIONS)

# Part 1: Perceptron
In this section, we will see how to implement a perceptron. Goal would be for you to delve into the mathematics.


## Intro
What's a perceptron? It's an algorithm modelled on biological computational model to classify things into binary classes. It's a supervides learning algorithm, meaning that you need to provide labelled data containing features and the actual classifications. A perceptron would take these features as input and spit out a binary value (0 or 1). While training the model with training data, we try to minimise the error and learn the parameters involved.

**How does it work?**\
A perceptron is modelled on a biological neuron. A neuron has input dendrites and the output is carried by axons. Similarly, a perceptron takes inputs called "features". After processing, a perceptron gives output. For computation, it has a "weight" vector which is multipled with feature vector. An activation function is added to introduce some non linearities and the output is given out.\
It can be represented as: $$  f=\sum_{i=1}^{m} w_ix_i +b$$

Let's implement this simple function to give an output.



In [8]:
import numpy as np

class perceptron():
    def __init__(self,num_input_features=8):
        self.weights = np.random.randn(num_input_features)
        self.bias = np.random.random()

    def activation(self,x):
         x = np.transpose(x)
         if ( (np.dot(x,self.weights) + self.bias))<0:
            return 0 
         else: 
            return 1
          
    
        
         

    def forward(self,x: np.ndarray):
        
        if ( (np.dot(x,self.weights) + self.bias))<0:
            return 0 
        else: 
            return 1
        
        
        # YOUR CODE HERE

In [9]:
np.random.seed(0)
perc = perceptron(8)
assert perc.forward(np.arange(8))==1

# Part 2: Linear Classifier
In this section, we will see how to implement a linear Classifier.


## Intro


**How does it work?**

Linear Classifier uses the following function: $$Y = WX+b$$ Where, $W$ is a 2d array of weights with shape (#classes, #features).



Let's implement this classifier.



In [12]:
import numpy as np

class LinearClassifier():
    def __init__(self,num_input_features=32,num_classes=5):
        self.weights = np.random.randn(num_classes,num_input_features)
        self.bias = np.random.rand(num_classes,1)

    def forward(self,x: np.ndarray):
        z = np.dot(self.weights, x) + self.bias
        return z
        pass
        # YOUR CODE HERE

In [13]:
np.random.seed(0)
lc = LinearClassifier()
lc.forward(np.random.rand(32,1))
# Should be close to:
# array([[  7.07730669],
    #    [-10.24067722],
    #    [  0.75398702],
    #    [  9.8019519 ],
    #    [  2.36684038]])

array([[  7.07730669],
       [-10.24067722],
       [  0.75398702],
       [  9.8019519 ],
       [  2.36684038]])

# Part 3: Loss Functions, Gradient descent and Backpropagation




## Intro

Loss Functions tells how "off" the output od our model is. Based upon the application, you can use several different loss functions. Formally, A loss function is a function $L:(z,y)\in\mathbb{R}\times Y\longmapsto L(z,y)\in\mathbb{R}$ that takes as inputs the predicted value $z$ corresponding to the real data value yy and outputs how different they are We'll implement L1 loss, L2 loss, Logistic loss, hinge loss and cross entropy loss functions.

### **L1 loss**
L1 loss is the linear loss function  $L = \dfrac{1}{2}|y−z| $



In [18]:
import numpy as np
def L1Loss(z,y):
    l1 = 0.5*(abs(y-z))
    return l1
    pass

### **L2 loss**
L2 loss is the quadratic loss function or the least square error function  $L = \dfrac{1}{2}(y−z)^2 $



In [19]:
import numpy as np
def L2Loss(z,y):
    l2 = 0.5*(y-z)**2
    return l2
    pass

### **Hinge Loss**
Hinge loss is: $ L = max( 0, 1 - yz ) $

In [20]:
import numpy as np
def hingeLoss(z,y):
    l3 = max(0, 1 -y*z)
    return l3
    pass

### **Cross Entropy Loss**
Another very famous loss function is Cross Entropy loss: $ L = −[ylog(z)+(1−y)log(1−z)] $.

In [21]:
import numpy as np
import math as mt
def CELoss(z,y):
    l4 = -(y*mt.log(z) + (1-y)*mt.log(1-z))
    return l4
    
    pass

### **0-1 Loss**
Loss Function used by perceptron is: $ \begin{cases} 
      0=z-y & z=y \\
      1=\dfrac{z-y}{z-y} & z\neq y
   \end{cases} $.

In [22]:
import numpy as np
def zeroOneLoss(z,y):
    if z==y :
        return 0
    else:
        return 1
    pass

## Cost Function
The cost function $J$ is commonly used to assess the performance of a model, and is defined with the loss function $L$ as follows:
$$\boxed{J(\theta)=\frac{1}{m}\sum_{i=1}^mL(h_\theta(x^{(i)}), y^{(i)})}$$
where $h_\theta$ is the hypothesis function i.e. the function used to predict the output.

In [26]:
lossFunctions = {
    "l1" : L1Loss,
    "l2" : L2Loss,
    "hinge" : hingeLoss,
    "cross-entropy" : CELoss,
    "0-1" : zeroOneLoss
}

def cost(Z : np.ndarray, Y : np.ndarray, loss : str):
    '''
        Z : a numpy array of predictions.
        Y : a numpy array of true values.
        return : A numpy array of costs calculated for each example.
    '''
    loss_func = lossFunctions[loss]
    # YOUR CODE HERE
    #use len(z) to calculate the value of m in the loss eqaution. then use the loss functions used above to calculate various losses. 
    l1 = L1loss(z,y)
    l2 = L2loss(z,y)
    l3= hingeLoss(z,y)
    l4 = CELoss(z,y)
    l5 = zeroOneLoss(z,y)
    L = np.array([l1,l2l3,l4,l5])
    return L
    
    # YOUR CODE HERE
    pass

## Gradient Descent and Back Propagation
Gradient Descent is an algorithm that minimizes the loss function by calculating it's gradient. By noting $\alpha\in\mathbb{R}$ the learning rate, the update rule for gradient descent is expressed with the learning rate $\alpha$ and the cost function $J$ as follows:

$$\boxed{ W \longleftarrow W -\alpha\nabla J( W )}$$
​


But we need to find the partial derivative of Loss function wrt every parameter to know what is the slight change that we need to apply to our parameters. This becomes particularly hard if we have more than 1 layer in our algorithm. Here's where **Back Propagation** comes in. It's a way to find gradients wrt every parameter using the chain rule. Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight ww is computed using chain rule and is of the following form:

$$\boxed{\frac{\partial L(z,y)}{\partial w}=\frac{\partial L(z,y)}{\partial a}\times\frac{\partial a}{\partial z}\times\frac{\partial z}{\partial w}}$$
​
 
As a result, the weight is updated as follows:

$$\boxed{w\longleftarrow w-\alpha\frac{\partial L(z,y)}{\partial w}}$$

So, In a neural network, weights are updated as follows:

* Step 1: Take a batch of training data.
* Step 2: Perform forward propagation to obtain the corresponding loss.
* Step 3: Backpropagate the loss to get the gradients.
* Step 4: Use the gradients to update the weights of the network.
​

Bonus Problem
 
Now, Assuming that you know Back Propagation (read a bit about it, if you don't), we'll now implement an image classification model on CIFAR-10.

In [30]:
!pip install tensorflow


Collecting tensorflow
  Downloading tensorflow-2.9.1-cp38-cp38-win_amd64.whl (444.1 MB)
Collecting gast<=0.4.0,>=0.2.1
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting keras<2.10.0,>=2.9.0rc0
  Downloading keras-2.9.0-py2.py3-none-any.whl (1.6 MB)
Collecting termcolor>=1.1.0
  Downloading termcolor-1.1.0.tar.gz (3.9 kB)
Collecting opt-einsum>=2.3.2
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting libclang>=13.0.0
  Downloading libclang-14.0.1-py2.py3-none-win_amd64.whl (14.2 MB)
Collecting google-pasta>=0.1.1
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting keras-preprocessing>=1.1.1
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting absl-py>=1.0.0
  Downloading absl_py-1.0.0-py3-none-any.whl (126 kB)
Collecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.46.3-cp38-cp38-win_amd64.whl (3.5 MB)
Collecting tensorboard<2.10,>=2.9
  Downloading tensorboard-2.9.0-py3-none-any.whl (5.8 MB)
Collecting astunparse>=1.

In [31]:
import tensorflow as tf  
 
# Display the version
print(tf.__version__)    
 
# other imports
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten, Dropout
from tensorflow.keras.layers import GlobalMaxPooling2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.models import Model

2.9.1


# **Bonus Problem**

Now, Assuming that you know Back Propagation (read a bit about it, if you don't), we'll now implement an image classification model on CIFAR-10.

In [32]:
# Load in the data
cifar10 = tf.keras.datasets.cifar10
 
# Distribute it to train and test set
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

# Reduce pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0
 
# flatten the label values
y_train, y_test = y_train.flatten(), y_test.flatten()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
(50000, 32, 32, 3) (50000, 1) (10000, 32, 32, 3) (10000, 1)


In [None]:
'''visualize data by plotting images'''
# YOUR CODE HERE
pass
# YOUR CODE HERE

In [None]:

# number of classes
K = len(set(y_train))
'''
 calculate total number of classes
 for output layer
'''
print("number of classes:", K)
''' 
 Build the model using the functional API
 input layer
'''
```
  YOUR CODE HERE
```
 
'''Hidden layer'''
# YOUR CODE HERE
pass
# YOUR CODE HERE
 
"""last hidden layer i.e.. output layer"""
# YOUR CODE HERE
pass
# YOUR CODE HERE
 
 '''model description'''
model.summary()

In [None]:
# Compile
...
  YOUR CODE HERE
...

In [None]:
# Fit
...
  YOUR CODE HERE
...

In [None]:
# label mapping
 
labels = '''airplane automobile bird cat deerdog frog horseship truck'''.split()
 
# select the image from our test dataset
image_number = 0
 
# display the image
plt.imshow(x_test[image_number])
 
# load the image in an array
n = np.array(x_test[image_number])
 
# reshape it
p = n.reshape(1, 32, 32, 3)
 
# pass in the network for prediction and
# save the predicted label
predicted_label = labels[model.predict(p).argmax()]
 
# load the original label
original_label = labels[y_test[image_number]]
 
# display the result
print("Original label is {} and predicted label is {}".format(
    original_label, predicted_label))