# Building a Neural Network from Scratch

## Requirements 

- A working fully-connected deep neural network from scratch using only numpy.
- Includes dense layers, activations, optimizers, loss functions and sigmoid or softmax in case of classification. 
- Runtime and results on a public dataset.
- Documented code that includes brief summary, technical details, and results. 

## Extensions 

A comparison of the modelâ€™s runtime and performance with/without each component:
- More than 1 optimizer - SGD, Momentum, RMSProp, Adam etc.
- Regularization - L2/weight decay, dropout, possibly augmentations if image data etc.
- Results on more than 1 dataset.

In [None]:
from sklearn.datasets import fetch_openml

#load MNIST dataset
mnist = fetch_openml('mnist_784')
x = mnist.data
y = mnist.target


Forward pass on a single example:
$\hat{y}=\sigma(w^Tx+b)$ \
Sigmoid Function:
$\sigma = \frac{1}{1+e^{-z}}$ \
In backpropagation, we need to compute: $\frac{\partial L}{\partial w_j}$ i.e. we need to know how the cost changes with respect to each component of the weight matrix. In other words, we need to know how sensitive the cost function is to each of the components of the weight matrix. 

In [12]:
import numpy as np

#define the sigmoid activation function 
def sigmoid(a):
    return 1 / (1 + np.exp(-a))

def cross_entropy(Y, Y_hat):
    b = Y.shape[1]
    L = -(1./b) * ( np.sum( np.multiply(np.log(Y_hat),Y) ) + np.sum( np.multiply(np.log(1-Y_hat),(1-Y)) ) )
    return L 

In [None]:

#one input layer
#two hidden layers
#one output layer

#MNIST -- 70000 images of 28x28 pixels and 10 classes
#stochastic gradient descent 

    