# Building a Neural Network from Scratch

## Requirements 

- A working fully-connected deep neural network from scratch using only numpy.
- Includes dense layers, activations, optimizers, loss functions and sigmoid or softmax in case of classification. 
- Runtime and results on a public dataset.
- Documented code that includes brief summary, technical details, and results. 

## Extensions 

A comparison of the modelâ€™s runtime and performance with/without each component:
- More than 1 optimizer - SGD, Momentum, RMSProp, Adam etc.
- Regularization - L2/weight decay, dropout, possibly augmentations if image data etc.
- Results on more than 1 dataset.

In [None]:
from sklearn.datasets import fetch_openml

#load MNIST dataset
mnist = fetch_openml('mnist_784')
x = mnist.data
y = mnist.target


Forward pass on a single example:
$\hat{y}=\sigma(w^Tx+b)$ \
Sigmoid Function:
$\sigma = \frac{1}{1+e^{-z}}$ \
In backpropagation, we need to compute: $\frac{\partial L}{\partial w_j}$ i.e. we need to know how the cost changes with respect to each component of the weight matrix. In other words, we need to know how sensitive the cost function is to each of the components of the weight matrix. 

In [12]:
import numpy as np

#define activation functions
def sigmoid(A):
    return 1 / (1 + np.exp(-A))

#define loss functions
def cross_entropy(Y, Y_hat):
    n = Y.shape[1] #-- this is not bias, call it n
    #softmax :) 
    L = -(1./n) * (np.sum( np.multiply(np.log(Y_hat),Y) ) + np.sum( np.multiply(np.log(1-Y_hat),(1-Y)) ) )
    return L 




In [1]:
#one input layer
#two hidden layers
#one output layer

#MNIST -- 70000 images of 28x28 pixels and 10 classes
#stochastic gradient descent 

#softmax -- vector of 10 values with probability of belonging to said class
class nn:
    def __init__(self,x,y,hidden_size = 4, num_classes = 10):
        self.input = x #inputs 
        self.y = y #label (one hot encode)  
        self.output = np.zeros(num_classes) #predicted one-hot 
        #weights initialization (randomly for now)
        self.weights1 = np.random.rand(self.input.shape,hidden_size) 
        #deep layer has size of 4 -put in va
        self.weights2 = np.random.rand(hidden_size,num_classes)  
        
    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1)) #get weighted sum of w.x
        self.output = sigmoid(np.dot(self.layer1, self.weights2)) 
        self.loss = cross_entropy(self.y,self.output)
    
    def backpropagation(self): 
        #get derivative of loss function with respect to weights1 and weights2
        #weights1_deriv = np.dot(self.layer1.T, (2*(self.y - self.output) * derivativeoflossfunc
        #weights2_deriv = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * derivativeoflossfunc
        