# Building a Neural Network from Scratch

## Requirements 

- A working fully-connected deep neural network from scratch using only numpy.
- Includes dense layers, activations, optimizers, loss functions and sigmoid or softmax in case of classification. 
- Runtime and results on a public dataset.
- Documented code that includes brief summary, technical details, and results. 

## Extensions 

A comparison of the modelâ€™s runtime and performance with/without each component:
- More than 1 optimizer - SGD, Momentum, RMSProp, Adam etc.
- Regularization - L2/weight decay, dropout, possibly augmentations if image data etc.
- Results on more than 1 dataset.


### Load the dataset

In [24]:
import numpy as np

from sklearn.datasets import fetch_openml
#load MNIST dataset
mnist = fetch_openml('mnist_784')
X = mnist.data
y = mnist.target

In [25]:
X = X /255
#y = np.where(y=='0', 0, 1)

In [27]:
m = 60000
m_test = X.shape[0] - m

#switch rows and columns and reshape
print(X.shape, y.shape)
X_train, X_test = X[:m], X[m:]
y_train, y_test = np.array(y[:m]), np.array(y[m:])
#y_train, y_test = np.array(y[:m]).reshape(1,m), np.array(y[m:]).reshape(1,m_test)
print(X_train.shape, y_train.shape, X_test.shape,  y_test.shape) 

(70000, 784) (70000,)
(60000, 784) (60000,) (10000, 784) (10000,)


In [28]:
y_test


array(['7', '2', '1', ..., '4', '5', '6'], dtype=object)

In [75]:
def sigmoid(Z):
    """
    Sigmoid activation function.
    """
    return 1/(1+np.exp(-Z))

def der_sigmoid(Z):
    """
    Derivative of sigmoid activation function.
    """
    return sigmoid(Z) * (1 - sigmoid(Z))

def cross_entropy(Y, Y_hat):
    """
    Binary cross entropy loss function.
    """
    #print(Y.shape)
    n = Y.shape[0] 
    L = -(1/n) * (np.sum( np.multiply(np.log(Y_hat),Y) ) + np.sum( np.multiply(np.log(1-Y_hat),(1-Y)) ) )
    return L 

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

In [78]:
input_size = 784

#Initialization: weights and biases
w1 = np.random.rand(input_size,64)
w2 = np.random.rand(64,10)
b1 = np.zeros(784)
b2 = np.zeros(64)

def forward_propagation(X,y):
    #single forward pass
    print(X.shape,w1.shape, b1.shape)
    z1 = np.dot(X+b1,w1)
    print(z1.shape)
    s1 = sigmoid(z1)
    print(w2.shape,s1.shape,b2.shape)
    z2 = np.dot(s1+b2,w2) 
    #pass it through loss function 
    s2 = softmax(z2) #model output
    y_hat = np.argmax(s2)
    print(y_hat)
    L = cross_entropy(y,y_hat)
    
    return L

### Backward 
# get that d_w and d_b for each layer 

In [79]:
forward_propagation(X_train,y_train)

(60000, 784) (784, 64) (784,)
(60000, 64)
(64, 10) (60000, 64) (64,)
5


TypeError: can't multiply sequence by non-int of type 'float'