# Backprop-From-Scratch using Numpy

In the following exercise you will implement a simple 1-Hidden Layer Neural Network from scratch using torch. We will use a simple toy dataset called the half-moon dataset as a training set. To perform this task we will outline the necessary steps here and provide you with pseudo-code for your implementation.

1. Set the random number generator

2. Which hyper-parameters will you need to set prior to training?

3. Define the size of the input layer (D), the number of hidden layer units (H) and the output layer units (M).
    - Suggestion: use a small number of neurons in the hidden layer e.g. H=3
    
    
4. Define the training and test sets of your half-moon dataset.

5. We will use ```sigmoid``` activation functions. Define functions to compute the forward and "backward"-pass of the sigmoid. Your function should take in a ```torch.Tensor``` and return a ```torch.Tensor```


6. Define the weight tensors of each-layer. Initiallize the Weight-Tensors as ```torch.randn```. You should have two weight tensors W1, W2.


7. Within a training loop perform the following operations for the forward pass
    - Compute the affine layer transformation $z_1=W_1X$
    - Compute the non-linear activation $a_1=\sigma(z_1)$
    - Compute the affine layer transformation $z_2=W_2a_1$
    - Compute the non-linear activation $a_2=\sigma(z_2)$
    - Recall the chain-rule:
    - Use the notes at the bottom to simplify the code\*
    - Compute the gradient of the Loss with respect to the weights of the output layer $\frac{\partial{L}}{\partial{W_2}}=a_1^T*\frac{\partial{L}}{\partial{a_2}}\frac{\partial{a_2}}{\partial{z_2}}$. You will need to use ```torch.transpose``` and ```torch.matmul``` to perform this operation.
    - Compute the error on the output of the hidden-layer $\frac{\partial{L}}{\partial{a_1}}$
    - Compute the gradient of the loss with respect to the hidden-layer weights $W_1$. This is the same operation as for the output layer.
    - Bonus: Compute the sensitivity of the loss with respect to the input $\frac{\partial{L}}{\partial{X}}$
    - Perform a gradient descent step on the weights: $W_2^{t+1} = W_2^{t}-\frac{\alpha}{N}\frac{\partial{L}}{\partial{W_2}}$. (Hint: the division by $N$ is necessary due to the ```torch.matmul``` operation being an effective summation over all the input examples.
    - Compute the training loss as the binary cross entropy $BCE(y, a_2)=\frac{1}{N}\sum{y\cdot log(a_2)+(1-y)\cdot log(1-a_2)}$
    
    
    
8. Perform the above iteration over a number of epochs (full-passes through the training set and use full-batch learning)



8. After training, evaluate the performance on the test set by evaluating $y_{pred}=\sigma(W_2\sigma(W_1 X))$ and computing the accuracy using ```sklearn.metrics.accuracy_score```.



9. Plot the prediction on the training and the test set.


\
10. Bonus: Plot the sensitivity of the loss with respect to each datapoint in the input of the training set.

In [None]:


# STEP 1 : Set the seed 
set_seed(42)

# STEP 2 :Set the Hyperparameters
epochs = 1000 #Number of loops through whole dataset
batch_size = 1000 #Size of a single batch
batch_num = 1 #Use full batch training
test_size = 100 #Examples in test set

# STEP 3 : Set the parameters for the neural network - learning rate, network size
lr = 1.
I, H, O = 2, 500, 1 #Define input size (2), Size of Hidden Layer (4), Output size (1)


# STEP 4 : Define training and test sets for two moon dataset
X_train, y_train, X_test, y_test = make_train_test(batch_size, batch_num, test_size, noise=0.2)

#Define Train Set in Pytorch
X = torch.from_numpy(X_train).float()[0] #Convert to torch tensor, single batch
y = torch.from_numpy(y_train).float()[0] #Convert to torch tensor, single batch

#Define Test Set in Pytorch
X_test = torch.from_numpy(X_test).float() #Convert to torch tensor, already single batch
y_test = torch.from_numpy(y_test).float() #Convert to torch tensor, already single batch


# STEP 5 : Activation Function and derivative of the activation function
sigmoid = lambda x: 1./(1+torch.exp(-x)) #Sigmoid Activation Function
dSigmoid = lambda x: x*(1-x) #Derivative of Sigmoid Activation Function

# STEP 6 : Define the weight tensors
W1 = torch.randn((I, H))
W2 = torch.randn((H,O))


# STEP 7 : Forward Pass
for i in range(epochs):
    
    # store number of inputs
    N n= X.size(0)