# DLCV Assignment 1

**Due Date: 22/02/2024 11:59PM IST**

**Name:** Rishav Saha

**Sr. No.:** 22573


In this assignment, we will cover the following topics:

1) Training a simple Linear Model 

2) Implementing Modules with Backprop functionality

3) Implementing Convolution Module on Numpy


It is crucial to get down to the nitty gritty of the code to implement all of these. No external packages (like caffe,pytorch etc), which directly give functions for these steps, are to be used. 

# Training a simple Linear Model

In this section, you will write the code to train a Linear Model. The goal is to classify an input $X_i$ of size $n$ into one of $m$ classes. For this, you need to consider the following:

1)  **Weight Matrix** $W_{n\times m}$: The Weights are multipled with the input $X_i$ (vector of size $n$), to find $m$ scores $S_m$ for the $m$ classes.

2)  **The Loss function**:   
  * The Cross Entropy Loss: By interpreting the scores as unnormalized log probabilities for each class, this loss tries to measure dissatisfaction with the scores in terms of the log probability of the right class:

$$
L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{or equivalently} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j}
$$

where $f_{ y_i }$ is the $y_i$-th element of the output of $W^T  X_i$

3) **A Regularization term**: In addition to the loss, you need a Regularization term to lead to a more distributed (in case of $L_2$) or sparse (in case of $L_1$) learning of the weights. For example, with $L_2$ regularization, the loss has the following additional term:

$$
R(W) = \sum_k\sum_l W_{k,l}^2  
$$

Thus the total loss has the form:
$$
L =  \underbrace{ \frac{1}{N} \sum_i L_i }_\text{data loss} + \underbrace{ \lambda R(W) }_\text{regularization loss} \\\\
$$

4) **An Optimization Procedure**: This refers to the process which tweaks the weight Matrix $W_{n\times m}$ to reduce the loss function $L$. In our case, this refers to Mini-batch Gradient Descent algorithm. We adjust the weights $W_{n\times m}$, based on the gradient of the loss $L$ w.r.t. $W_{n\times m}$. This leads to:
$$
W_{t+1} = W_{t} - \alpha \frac{\partial L}{\partial W},
$$
where $\alpha$ is the learning rate. Additionally, with "mini-batch" gradient descent, instead of finding loss over the whole dataset, we use a small sample $B$ of the training data to make each learning step. Hence,
$$
W_{t+1} = W_{t} - \alpha \frac{\partial \sum_{i \in B}{L_{x_i}}}{\partial W},
$$
where $|B|$ is the batch size.

# Question 1

Train a **Single-Layer Classifier** for the MNIST dataset. 
* Use Softmax-Loss.
* Maintain a train-validation split of the original training set for finding the right value of $\lambda$ for the regularization, and to check for over-fitting.
* Finally, evaluate the classification performance on the test-set.


In [1]:
## Load The Mnist data:
# Download data from http://yann.lecun.com/exdb/mnist/
# load the data.
import idx2numpy
data=idx2numpy.convert_from_file('train-images.idx3-ubyte')
label=idx2numpy.convert_from_file('train-labels.idx1-ubyte')
test_data=idx2numpy.convert_from_file('t10k-images.idx3-ubyte')
test_label=idx2numpy.convert_from_file('t10k-labels.idx1-ubyte')
# maintain a train-val split
train_size=0.7*len(data)
training_x=data[0:int(train_size)]
print(len(training_x))
crossval_x=data[int(train_size):len(data)]
training_y=label[0:int(train_size)]
crossval_y=label[int(train_size):len(data)]
# Now, write a generator that yields (random) mini-batches of the input data
# Do not use same set of mini-batches for different epochs
    
def get_minibatch(training_x=training_x, training_y=training_y):
    ## Read about Python generators if required.

    ## WRITE CODE HERE
    batch_size=100
    num_samples = len(training_x)
    num_batches = num_samples // batch_size
    for i in range(num_batches):
        start_idx = i * batch_size
        end_idx = (i + 1) * batch_size
        mini_x = training_x[start_idx:end_idx]
        mini_y = training_y[start_idx:end_idx]
        yield mini_x, mini_y
    
    # Handle the last batch with fewer samples
    if num_samples % batch_size != 0:
        start_idx = num_batches * batch_size
        mini_x = training_x[start_idx:]
        mini_y = training_y[start_idx:]
        yield mini_x, mini_y

42000


In [2]:
import numpy as np
class Single_layer_classifier():
    
    def __init__(self, input_size, output_size):
        
        ## WRITE CODE HERE
        
        # Give the instance a weight matrix, initialized randomly
        # One possible strategy for a good initialization is Normal (0, σ) where σ = 1e-3.
        mean=0
        std_dev=10**(-3)
        self.W=np.random.normal(loc=mean,scale=std_dev,size=(input_size,output_size))
        # print(self.W)
        # Try experimenting with different values of σ.
        
        # Xavier init
        # std_dev_2=2/(input_size+output_size)
        # W=np.random.normal(loc=mean,scale=std_dev_2,size=(input_size,output_size))
        # print(W)
        
        
    # Define the forward function
    def forward(self, input_x):
        
        # get the scores
        ## WRITE CODE HERE
        self.scores=np.dot(input_x,self.W)
        return self.scores
        # return scores
    
    # Similarly a backward function
    # we define 2 backward functions (as Loss = L_data + L_reg, grad(Loss) = grad(L1) + grad(L2))
    
    def backward_Ldata(self, grad_from_loss):
        
        # this function returns a matrix of the same size as the weights, 
        # where each element is the partial derivative of the loss w.r.t. the corresponding element of W
        
        ## WRITE CODE HERE
        n=np.size(self.scores)
        tmp=np.tile(self.scores,n)
        print(tmp)
        return grad_matrix
        
    def backward_Lreg(self):
        
        # this function returns a matrix of the same size as the weights, 
        # where each element is the partial derivative of the regularization-term
        # w.r.t. the corresponding element of W
        
        ## WRITE CODE HERE
        
        return grad_matrix

In [43]:
# Implement the Softmax loss function
def loss_function(input_y,scores):

    ## WRITE CODE HERE  
    #softmax
    exp_scores=np.exp(scores)
    # print(scores)
    softmax_prob=exp_scores/np.sum(exp_scores,axis=1,keepdims=True)
    # softmax_loss_func=-np.log(softmax_prob)
    print(softmax_prob.shape)
    loss=softmax_prob
    return loss


def loss_backward(loss,scores):
    # This part deals with the gradient of the loss w.r.t the output of network
    # for example, in case of softmax loss(-log(q_c)), this part gives grad(loss) w.r.t. q_c
    # pass this to backward_ldata
    
    ## WRITE CODE HERE    
    
    softmax=loss
    # print(softmax)
    n,m=softmax.shape
    output_gradient=np.zeros((n,m,m))
    # print(n)
    # print(m)
    print(output_gradient.shape)
    for i in range (n):
        for j in range(m):
            for k in range(m):
                output_gradient[i,j,k]=softmax[i,j]*(int(j==k)-softmax[i,k])
    return output_gradient
    
    

### Create utility functions for calculating training and validation accuracy

In [40]:
# WRITE CODE HERE

In [41]:
minibatch=get_minibatch()
classifier=Single_layer_classifier(784,10)

In [44]:
# Finally the trainer:
# Make an instance of Single_layer_classifier
# Train for t epochs:
###  Train on the train-set obtained from train-validation split
###  Use the mini-batch generator to get each mini-batch

for iter_num,(input_x , input_y) in enumerate(minibatch):
    # Write code for each iteration of the training
    # print(iter_num)
    input_x_reshaped=np.reshape(input_x,(100,28*28))
    # Forward pass
    scores=classifier.forward(input_x_reshaped)
    # print(scores)
    loss=loss_function(input_y,scores)
    # print(loss)

    # Backward pass
    output_gradient=loss_backward(loss,scores)
    print(output_gradient)
    classifier.backward_Ldata(output_gradient)
    break
    # Update weights
    
    # Log the training loss value and training accuracy 

(100, 10)
(100, 10, 10)
[[[ 1.88632113e-02 -4.57256476e-05 -6.55123578e-04 ... -4.21978235e-04
   -1.67786774e-03 -1.65907261e-03]
  [-4.57256476e-05  2.37179029e-03 -8.09810522e-05 ... -5.21615198e-05
   -2.07404374e-04 -2.05081072e-04]
  [-6.55123578e-04 -8.09810522e-05  3.29020174e-02 ... -7.47332040e-04
   -2.97153791e-03 -2.93825135e-03]
  ...
  [-4.21978235e-04 -5.21615198e-05 -7.47332040e-04 ...  2.14588097e-02
   -1.91402716e-03 -1.89258662e-03]
  [-1.67786774e-03 -2.07404374e-04 -2.97153791e-03 ... -1.91402716e-03
    7.96278936e-02 -7.52529343e-03]
  [-1.65907261e-03 -2.05081072e-04 -2.93825135e-03 ... -1.89258662e-03
   -7.52529343e-03  7.88202150e-02]]

 [[ 5.92786349e-03 -2.43665437e-05 -6.54240909e-06 ... -1.24021627e-05
   -3.78403033e-06 -2.29945638e-05]
  [-2.43665437e-05  4.06930215e-03 -4.48270300e-06 ... -8.49766669e-06
   -2.59272752e-06 -1.57553278e-05]
  [-6.54240909e-06 -4.48270300e-06  1.09588541e-03 ... -2.28162076e-06
   -6.96146501e-07 -4.23030042e-06]
  ...

### Plot the training loss and training accuracy plot

In [81]:
# WRITE CODE HERE

### Find the accuracy on the validation set

In [82]:
# WRITE CODE HERE

In [83]:
# The next step is to find the optimal value for lambda, number of epochs, learning rate and batch size. 
# CHOSE ANY TWO from the above mentioned to tune.
# Create plot and table to show the effect of the hparams.

### Report final performance on MNIST test set

In [84]:
# WRITE CODE HERE

### Find the best performing class and the worst performing class

In [85]:
# WRITE CODE HERE

# Training a Linear Classifier on MNIST from scikit-learn

In this section you have to train a linear classifier from the scikit-learn library and compare its results against your implementation.
(https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

In [90]:
    # WRITE CODE HERE 
from sklearn.linear_model import LinearRegression
X_train=training_x
Y_train=training_y
x_test=test_data
y_test=test_label
regression_model=LinearRegression()
X_train_reshaped=np.reshape(X_train,(42000,28*28))
x_test_reshapd=np.reshape(x_test,(10000,28*28))
# print(X_train_reshaped.shape)
# print(Y_train.shape)
# print(x_test.shape)
regression_model.fit(X_train_reshaped,Y_train)
y_pred=regression_model.predict(x_test_reshapd)

### Compare the training and test accuracies for the your implementation and linear classifier from scikit-learn

In [None]:
    # WRITE CODE HERE

### Any additional observations / comments?

## BONUS Question
### Observe the effect on test set accuracy by changing the number of training samples.
### Train on 10%, 20% and 50% training data and plot the percentage of training data v.s. the test accuracy.  

In [None]:
# WRITE CODE HERE

# Implementing Backpropagation

Now that you have had some experience with single layer networks, we can proceed to more complex architectures. But first we need to completely understand and implement backpropagation.

## Backpropagation:

Simply put, a way of computing gradients of expressions through repeated application of chain rule. If
$$
L = f (g (h (\textbf{x})))
$$
then, by the chain rule we have:
$$
\frac{\partial L}{\partial \textbf{x}} = \frac{\partial f}{\partial g} \cdot \frac{\partial g}{\partial h} \cdot \frac{\partial h}{\partial \textbf{x}} 
$$

** Look into the class Lecture for more detail **



# Question 2 : Scalar Backpropagation

Evaluate the gradient of the following function w.r.t. the input:

$$ f(x,y,z) =  log(\sigma(\frac{cos(\pi \times x)+sin(\pi \times y/2)}{tanh(z^2)}))$$
where $\sigma$ is the sigmoid function. Find gradient for the following inputs:
  * $(x,y,z)$ =  (2,4,1)
  * $(x,y,z)$ =  (9,14,3)
  * $(x,y,z)$ =  (128,42,666)
  * $(x,y,z)$ =  (52,14,28)

      

In [None]:
# To solve this problem, construct the computational graph
# Write a class with forward and backward functions, for each node if you like
# For eg:

class Sigmoid():
    def __init__(self):
        
    def forward():
        # save values useful for backpropagation
    def backward():
        
# CAUTION: Carefully treat the input and output dimension variation. At worst, handle them with if statements.

In [None]:
# Now write the class func
# which constructs the graph (all operators), forward and backward functions.

class Func():
    def __init__(self):
        # construct the graph here
        # assign the instances of function modules to self.var
        
    def forward(x,y,z):
        # Using the graph element's forward functions, get the output. 
        
        return output
    
    def backward(output):
        # Use the saved outputs of each module, and backward() function calls
        
        return [grad_x,grad_y,grad_z]
    

## Question 3 : Modular Vector Backpropagation

* Construct a Linear Layer module, implementing the forward and backward functions for arbitrary sizes.
* Construct a ReLU module, implementing the forward and backward functions for arbitrary sizes.
* Create a 2 layer MLP using the constructed modules.

* Modifying the functions built in Question 1 , train this two layer MLP for the same data set, MNIST, with the same train-val split.

In [None]:
# Class for Linear Layer (If you're stuck, you can refer to code of PyTorch/Tensorflow packages) 


In [None]:
# Class for ReLU


In [None]:
# Your 2 layer MLP 


In [None]:
# Train the MLP


### Plot the training loss and training accuracy plot

In [None]:
# Use the same utility functions defined in the previous question
# WRITE CODE HERE

### Find the accuracy on the validation set

In [None]:
# WRITE CODE HERE

In [None]:
# Find the optimal value of learning rate and batch size. 
# Use the same tuning strategy as the previous question
# Create plot and table to show the effect of the hparams.

### Report final performance on MNIST test set

In [None]:
# WRITE CODE HERE

### Find the best performing class and the worst performing class

In [None]:
# WRITE CODE HERE

### Any additional observations / comments?

## BONUS Question
### Observe the effect on test set accuracy by changing the number of training samples.
### Train on 10%, 20% and 50% training data and plot the percentage of training data v.s. the test accuracy.  

In [None]:
# WRITE CODE HERE


# Implementing a Convolution Module with Numpy

* This topic will require you to implement the Convolution operation using Numpy.
* We will use the Module for tasks like Blurring.
* Finally, we implement Backpropagation for the convolution module.


## Question 4

* Implement a naive Convolution module, with basic functionalities: kernel_size, padding, stride and dilation
  
* Test out the convolution layer by using it to do gaussian blurring on 10 random images of CIFAR-10 dataset


In [None]:
## Define a class Convolution Layer, which is initialized with the various required params:
class Convolution_Layer():
    
    def __init__(self,input , filter_size, bias=True, stride=1, padding=0, dilation=1):
        # For an untrained layer, set random initial filter weights

    def forward(self,input):
        # Input Proprocess(According to pad etc.) Input will be of size (Batch_size, in_channels, inp_height, inp_width)
        
        # Reminder: Save Input for backward-prop
        # Simple Conv operation:
        # Loop over every location in inp_height * inp_width for the whole batch
        
        # Output will be of the size (Batch_size, out_channels, out_height, out_width)
        return output
    
    def backward(self, grad_of_output_size):
        
        # Naive Implementation
        # Speed is not a concern
        # Hint: gradients from each independant operation can be summed
        
        #  return gradient of the size of the weight kernel
        return grad
    
    def set_weights(self, new_weights):
        ## Replace the set of weights with the given 'new_weights'
        ## use this for setting weights for blurring, bilateral filtering etc. 
    

### Download CIFAR-10 images and load it in a numpy array (https://www.cs.toronto.edu/~kriz/cifar.html)



In [183]:
# WRITE CODE HERE
import pickle
def load_CIFAR():
    file_path='cifar-10-batches-py\data_batch_1'
    with open(file_path,'rb') as file:
        data=pickle.load(file,encoding='bytes')
    data=data[b'data']
    print(len(data[0]))
load_CIFAR()

3072


### Initialize a conv layer. Set weights for gaussian blurring (do not train the filter for this part). Visualise the filters using matplotlib


In [None]:
# WRITE CODE HERE

### Generate output for the first 5 images of the training set

In [None]:
# WRITE CODE HERE

### Use matplotlib to show the input and corresponding blurred output

In [None]:
# WRITE CODE HERE

## Question 5
<br>
Now we will use this module for training a simple Convolution Layer using CIFAR-10 images. 

* The goal is to learn a set of weights, by using the backpropagation function created. To test the backpropagation, instead of training a whole network, we will train only a single layer.
  * Instantiate a Convolution  layer $C_0$ with 20 filters, each with size 5$\times$5 (RGB image, so 3 input channels). Load the given numpy array of size (20,3,5,5), which represents the weights of a convolution layer. Set the given values as the filter weights for $C_0$. Take 100 CIFAR-10 images. Save the output of these 100 images generated from this Convolution layer $C_0$. 
  
  * Now, initialize a new convolution layer $C$ with weight values sampled from uniform distribution [-1,1]. Use the $L_2$ loss between the output of this layer $C$ and the output generated in the previous step to learn the filter weights of $C_0$.


In [None]:
## Load filter weights from given numpy array "C0_weights.npy".
## Init a conv layer C_0 with these given weights

## For all images get output. Store in numpy array.



In [None]:
# for part 2 we need to write a class for the  L2 loss
class L2_loss():
    def ___init__(self):
    
    def forward(self, C0_output,C_output):
        # Conv. output is of dimension (batchsize,channels,height,width)
        # calculate the L2 norm of (C0_output - C_output)
        
        return loss
    
    def backward(self,output_grad):
        # from the loss, and the conv. output, get the grad at each location
        # The grad is of the shape (batchsize,channels,height,width)
        return grad

# Now Init a new conv layer C and a L2 loss layer

# Train the new conv-layer C using the L2 loss to learn C_0, i.e., the set of given weights.
# Use mini-batches if required


# Print L2 dist between output from the new trained convolution layer C and the outputs generated from C_0.
