## Content:
- [Part 1](#part1)- Importing the libraries, packages
- [Part 2](#part2)- Useful Functions
- [Part 3](#part3) -  One Hidden Layer Class
- [Part 4](#part4) -  Two Hidden Layers Class 
- [Part 5](#part5) -  Loading Fashion MNIST 
- [Part 6](#part6)-  Loading Fashion MNIST
- [Part 7](#part7)-  Fashion MNIST One Hidden Layer
- [Part 8](#part8)) -  Fashion MNIST Two Hidden Layers
- [Part 9](#part9) -  Results 
- [Part 10](#part10) -  --
- [Part 11](#part11) -  --

Weight initialisation :

- https://machinelearningmastery.com/weight-initialization-for-deep-learning-neural-networks/
- https://www.deeplearning.ai/ai-notes/initialization/
- https://datascience-enthusiast.com/DL/Improving-DeepNeural-Networks-Initialization.html

[Back to top](#Content:)


<a id='part1'></a>

### Part 1 -   Importing the libraries, packages

In [787]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import base64
import os
import io
import requests
import random 

from sklearn import preprocessing
from sklearn import metrics
from sklearn.model_selection import train_test_split

from scipy.special import expit as activation_function
from scipy.stats import truncnorm

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import datasets

[Back to top](#Content:)


<a id='part2'></a>

### Part 2 -   Useful Functions

In [788]:
rng = np.random.default_rng() 

In [789]:
def truncated_normal(mean=0, sd=1, low=0, upp=10):
    return truncnorm(
        (low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)

def softmax(X):
    e = np.exp(X - np.max(X))
    return e / e.sum(axis=0, keepdims=True)


def cross_entropy(target, output):
    return -np.mean(target*np.log(output))

def cross_entropy_matrix(output, target):
    target = np.array(target)
    output = np.array(output)
    product = target*np.log(output)
    errors = -np.sum(product, axis=1)
    m = len(errors)
    errors = np.sum(errors) / m
    return errors

def sigmoid(x):
    return 1/(1+np.exp(-x))

def ds(x):
    return sigmoid(x)*(1-sigmoid(x))

def relu(x):
    return np.maximum(x,0)
  

def dr(x):
    dr = (np.sign(x) + 1) / 2
    return dr

def tanh(x):
    a = np.exp(x)
    b = np.exp(-x)
    return (a-b)/(a+b)

def dt(x):
    return 1-tanh(x)**2
    
def leaky(x,a):
    leaky = np.maximum(x,0)*x + a*np.minimum(x,0)
    return leaky

def dl(x,a):
    dl = (np.sign(x)+1)/2 - a*(np.sign(x)-1)/2
    return dl

def derivative(f):
    if f == sigmoid :
        return ds
    if f == tanh :
        return dt
    if f == relu :
        return dr
    if f == leaky :
        return dl
    return None

def y2indicator(y, K):
    N = len(y)
    ind = np.zeros((N,K))
    for i in range(N):
        ind[i][y[i]]=1
    return ind

def classification_rate(Y, P):
    return np.mean(Y==P)

[Back to top](#Content:)


<a id='part3'></a>

### Part 3 -   One Hidden Layer Class

# One Hidden Layer

# Variables :

- **X**     : N_Samples x N_features
- **W1**    : Hidden x N_features
- **b1**    : Hidden
- **W2**    : Output x Hidden
- **b2**    : Output

In [790]:
class HiddenOne:
     
    def __init__(self, 
                 input_nodes, 
                 output_nodes, 
                 hidden_nodes,
                 activation_hidden,
                 learning_rate=0.01,
                 optimizer = None,
                 beta1 = 0.9,   #ADAM optimization parameter, default value taken from practical experience
                 beta2 = 0.999, #ADAM optimization parameter, default value taken from practical experience
                 batch_size = None,
                 delta_stop = None,
                 patience = 1,
                 leaky_intercept=0.01
                ):         
        # Initializations
        self.input_nodes = input_nodes
        self.output_nodes = output_nodes       
        self.hidden_nodes = hidden_nodes          
        self.learning_rate = learning_rate 
        self.activation_hidden = activation_hidden
        self.hidden_derivative = derivative(self.activation_hidden)
        self.beta1 = beta1
        self.beta2 = beta2
        self.optimizer = optimizer
        self.batch_size = batch_size
        self.delta_stop = delta_stop
        self.patience = patience
        self.leaky_intercept = leaky_intercept
        self.create_weight_matrices()
        self.create_biases()
        self.reset_adam()
             
    def create_weight_matrices(self):       
        if self.activation_hidden == relu : # He initialization
            self.W1 = np.random.randn(self.hidden_nodes, self.input_nodes )/np.sqrt(self.input_nodes/2 ) # hidden x features
            self.W2 = np.random.randn(self.output_nodes, self.hidden_nodes )/np.sqrt(self.hidden_nodes/2 )  # output x hidden
        else : # Xavier initialization
            self.W1 = np.random.randn(self.hidden_nodes, self.input_nodes )/np.sqrt(self.input_nodes ) # hidden x features
            self.W2 = np.random.randn(self.output_nodes, self.hidden_nodes )/np.sqrt(self.hidden_nodes )  # output x hidden
            
            
    
        
    
    def create_biases(self):    
        #tn = truncated_normal(mean=2, sd=1, low=-0.5, upp=0.5)
        #self.b1 = tn.rvs(self.hidden_nodes).reshape(-1,1) 
        #self.b2 = tn.rvs(self.output_nodes).reshape(-1,1) 
        self.b1 =  np.zeros((self.hidden_nodes, 1 ))
        self.b2 = np.zeros((self.output_nodes, 1 ))
          
    def reset_adam(self):
        '''
        Creates Adam optimizations variables
        '''
        self.Vdw1 = np.zeros((self.hidden_nodes, self.input_nodes ))
        self.Vdw2 = np.zeros((self.output_nodes, self.hidden_nodes ))
        self.Vdb1 = np.zeros((self.hidden_nodes, 1 ))
        self.Vdb2 = np.zeros((self.output_nodes, 1 ))
        self.Sdw1 = np.zeros((self.hidden_nodes, self.input_nodes ))
        self.Sdw2 = np.zeros((self.output_nodes, self.hidden_nodes ))
        self.Sdb1 = np.zeros((self.hidden_nodes, 1 ))
        self.Sdb2 = np.zeros((self.output_nodes, 1 ))
        
        
    def forward(self, X):
        Z1 = self.W1.dot(X.T) + self.b1 # Hidden x N_samples
        A1 = self.activation_hidden(Z1)      # Hidden x N_samples
        Z2 = self.W2.dot(A1) + self.b2  # Output x N_samples
        A2 = softmax(Z2)      #Output x N_samples
        return A2, Z2, A1, Z1
    
    
    def backprop(self, X, target):
        # Forward prop
        A2, Z2, A1, Z1 = self.forward(X)
        # Compute cost
        cost = cross_entropy(target, A2)
        # N samples
        m = X.shape[0]
        # deltas
        dZ2 = A2 - target                                       #Output x N_samples
        dW2 = dZ2.dot(A1.T)/m                                   #Output x hidden
        db2 = np.sum(dZ2, axis=1, keepdims=True)/m              #Output x 1
        dZ1 = self.W2.T.dot(dZ2)*self.hidden_derivative(Z1)     # Hidden x N_samples
        dW1 = dZ1.dot(X)/m                                      # Hidden x N_Features
        db1 = np.sum(dZ1, axis=1, keepdims=True)/m              # Hidden x 1
        # Update
        lr = self.learning_rate
        self.W2 -= lr*dW2
        self.b2 -= lr*db2
        self.W1 -= lr*dW1
        self.b1 -= lr*db1
        return cost
        
    def backpropSGD(self, X, target):
        m = X.shape[0]                  #N_samples
        X_SGD = X.copy()
        u = rng.shuffle(np.arange(m))
        X_SGD = X_SGD[u,:].squeeze()    # N_samples x N_Features
        target_SGD = target[:,u].squeeze() # Output x N_samples
        cost = 0
        for i in range(m) :
            # Forward prop
            x = X_SGD[i,:].reshape(1,-1)                   # 1 x N_features
            a2, z2, a1, z1 = self.forward(x)
            # cost update
            cost = cost + cross_entropy(target_SGD[:,i].reshape(-1,1), a2)/m
            # deltas
            dz2 = a2 - target[:,i].reshape(-1,1)                    #Output x 1
            dW2 = dz2.dot(a1.T)                                     #Output x hidden
            db2 = dz2                                               #Output x 1
            dz1 = self.W2.T.dot(dz2)*self.hidden_derivative(z1)     # Hidden x 1
            dW1 = dz1.dot(x)                                        # Hidden x N_Features
            db1 = dz1                                               # Hidden x 1
            # Update
            lr = self.learning_rate
            self.W2 -= lr*dW2
            self.b2 -= lr*db2
            self.W1 -= lr*dW1
            self.b1 -= lr*db1
        return cost
        
    def backprop_minibatch(self, X, target):
        n = X.shape[1]               # N_features
        batch_size = X.shape[0]      # N_samples
        if self.batch_size == None :
            batch_size = self.minibatch_size(batch_size)
        else :
            batch_size = self.batch_size
            
        X_SGD = X.copy()
        u = rng.shuffle(np.arange(X.shape[0] ))
        X_SGD = X_SGD[u,:].squeeze()    # N_samples x N_Features
        target_SGD = target[:,u].squeeze() # Output x N_samples
        cost = 0
        
        pass_length = int(X.shape[0]/batch_size)
        for i in range(pass_length) :
            k = i*batch_size
            # Forward prop
            X = X_SGD[k:k+batch_size,:].reshape(batch_size,-1)              #batch_size x N_features
            A2, Z2, A1, Z1 = self.forward(X)
            # cost update
            cost = cost + cross_entropy(target_SGD[:,k:k+batch_size].reshape(-1,batch_size), A2)/pass_length
            # deltas
            dZ2 = A2 - target_SGD[:,k:k+batch_size].reshape(-1,batch_size)   #Output x batch_size
            dW2 = dZ2.dot(A1.T)/batch_size                                   #Output x hidden
            db2 = np.sum(dZ2, axis=1, keepdims=True)/batch_size              #Output x 1
            dZ1 = self.W2.T.dot(dZ2)*self.hidden_derivative(Z1)              # Hidden x batch_size
            dW1 = dZ1.dot(X)/batch_size                                      # Hidden x N_Features
            db1 = np.sum(dZ1, axis=1, keepdims=True)/batch_size              #Hidden x1                                            # Hidden x 1
            # Update
            lr = self.learning_rate
            self.W2 -= lr*dW2
            self.b2 -= lr*db2
            self.W1 -= lr*dW1
            self.b1 -= lr*db1
        return cost
    
    def backprop_adam_minibatch(self, X, target):
        n = X.shape[1]               # N_features
        batch_size = X.shape[0]      # N_samples
        if self.batch_size == None :
            batch_size = self.minibatch_size(batch_size)
        else :
            batch_size = self.batch_size
            
        X_SGD = X.copy()
        u = rng.shuffle(np.arange(X.shape[0] ))
        X_SGD = X_SGD[u,:].squeeze()    # N_samples x N_Features
        target_SGD = target[:,u].squeeze() # Output x N_samples
        cost = 0
        
        pass_length = int(X.shape[0]/batch_size)
        for i in range(pass_length) :
            k = i*batch_size
            X = X_SGD[k:k+batch_size,:].reshape(batch_size,-1)  
            t = target_SGD[:,k:k+batch_size].reshape(-1,batch_size)
            cost = cost + self.backpropADAM(X, t)/pass_length
        return cost
        
    
    def backpropADAM(self, X, target):
        # Forward prop
        A2, Z2, A1, Z1 = self.forward(X)
        # Compute cost
        cost = cross_entropy(target, A2)
        # N samples
        m = X.shape[0]
        # deltas
        dZ2 = A2 - target                                       #Output x N_samples
        dW2 = dZ2.dot(A1.T)/m                                   #Output x hidden
        db2 = np.sum(dZ2, axis=1, keepdims=True)/m              #Output x 1
        dZ1 = self.W2.T.dot(dZ2)*self.hidden_derivative(Z1)     # Hidden x N_samples
        dW1 = dZ1.dot(X)/m                                      # Hidden x N_Features
        db1 = np.sum(dZ1, axis=1, keepdims=True)/m              # Hidden x 1
        # Adam updates
        beta1 = self.beta1
        beta2 = self.beta2
        # V
        self.Vdw1 = beta1*self.Vdw1 + (1-beta1)*dW1
        self.Vdw2 = beta1*self.Vdw2 + (1-beta1)*dW2
        self.Vdb1 = beta1*self.Vdb1 + (1-beta1)*db1
        self.Vdb2 = beta1*self.Vdb2 + (1-beta1)*db2
        # S
        self.Sdw1 = beta2*self.Sdw1 + (1-beta2)*dW1**2
        self.Sdw2 = beta2*self.Sdw2 + (1-beta2)*dW2**2
        self.Sdb1 = beta2*self.Sdb1 + (1-beta2)*db1**2
        self.Sdb2 = beta2*self.Sdb2 + (1-beta2)*db2**2    
        # Update
        lr = self.learning_rate
        self.W2 -= lr * self.Vdw2 / (np.sqrt(self.Sdw2)+1e-8)
        self.b2 -= lr * self.Vdb2 / (np.sqrt(self.Sdb2)+1e-8)
        self.W1 -= lr * self.Vdw1 / (np.sqrt(self.Sdw1)+1e-8)
        self.b1 -= lr * self.Vdb1 / (np.sqrt(self.Sdb1)+1e-8)
        return cost  
    
    def predict(self, X_predict):
        A2, Z2, A1, Z1 = self.forward(X_predict)
        return A2
    
    def predict_class(self, X_predict):
        A2, Z2, A1, Z1 = self.forward(X_predict)
        y_pred = np.argmax(A2, axis=0)
        return y_pred
                   
    def run(self, X_train, target, epochs=10):
        costs = [1e-10]
        if self.delta_stop == None : 
            if self.optimizer == 'adam':
                self.reset_adam()
                for i in range(epochs):
                    cost = self.backpropADAM(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 1epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
             
            elif self.optimizer == 'mini_adam':
                self.reset_adam()
                for i in range(epochs):
                    cost = self.backprop_adam_minibatch(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
            elif self.optimizer == 'SGD' :
                for i in range(epochs):
                    cost = self.backpropSGD(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            elif self.optimizer == 'minibatch' :
                for i in range(epochs):
                    cost = self.backprop_minibatch(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            else :
                for i in range(epochs):  
                    cost = self.backprop(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            
        else :
            counter = 0
            if self.optimizer == 'adam':
                self.reset_adam()
                for i in range(epochs):
                    cost = self.backpropADAM(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
            elif self.optimizer == 'mini_adam':
                self.reset_adam()
                for i in range(epochs):
                    cost = self.backprop_adam_minibatch(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs     
            elif self.optimizer == 'SGD' :
                for i in range(epochs):
                    cost = self.backpropSGD(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            elif self.optimizer == 'minibatch' :
                for i in range(epochs):
                    cost = self.backprop_minibatch(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            else :  
                for i in range(epochs): 
                    cost = self.backprop(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                        else :
                            counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')        
                print(f'Loss after epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
          
            
        
       
    def evaluate(self, X_evaluate, target):
        '''
        return accuracy score, target must be the classes and not the hot encoded target
        '''
        
        y_pred = self.predict_class(X_evaluate)
        accuracy = classification_rate(y_pred, target)
        print('Accuracy :', accuracy)
        return accuracy
        
       
    def minibatch_size(self, n_samples):
        '''
        Compute minibatch size in case its not provided
        '''
        if n_samples < 2000:
            return n_samples
        if n_samples < 12800:
            return 64
        if n_samples < 25600:
            return 128
        if n_samples < 51200:
            return 256
        if n_samples < 102400:
            return 512
        return 1024
    
        
        
            

[Back to top](#Content:)


<a id='part4'></a>

### Part 4 -   Two Hidden Layers Class

# Two Hidden Layers

# Variables :

- **X**     : N_Samples x N_features
- **W1**    : Hidden1 x N_features
- **b1**    : Hidden1
- **W2**    : Hidden2 x Hidden1
- **b2**    : Hidden2
- **W3**    : Output x Hidden
- **b3**    : Output

In [760]:
class HiddenTwo:
     
    def __init__(self, 
                 input_nodes, 
                 output_nodes, 
                 hidden_nodes_1,
                 hidden_nodes_2,
                 activation_hidden_1,
                 activation_hidden_2,
                 learning_rate=0.01,
                 optimizer = None,
                 beta1 = 0.9,   #ADAM optimization parameter, default value taken from practical experience
                 beta2 = 0.999, #ADAM optimization parameter, default value taken from practical experience
                 batch_size = None,
                 delta_stop = None,
                 patience = 1,
                 leaky_intercept=0.01
                 
                ):         
        # Initializations
        self.input_nodes = input_nodes
        self.output_nodes = output_nodes       
        self.hidden_nodes_1 = hidden_nodes_1    
        self.hidden_nodes_2 = hidden_nodes_2    
        self.learning_rate = learning_rate 
        self.activation_hidden_1 = activation_hidden_1
        self.activation_hidden_2 = activation_hidden_2
        self.hidden_derivative_1 = derivative(self.activation_hidden_1)
        self.hidden_derivative_2 = derivative(self.activation_hidden_2)
        self.beta1 = beta1
        self.beta2 = beta2
        self.optimizer = optimizer
        self.batch_size = batch_size
        self.delta_stop = delta_stop
        self.patience = patience
        self.leaky_intercept = leaky_intercept
        self.create_weight_matrices()
        self.create_biases()
        self.reset_adam()
             
    def create_weight_matrices(self):
        if self.activation_hidden_1 == relu : # He initialization
            self.W1 = np.random.randn(self.hidden_nodes_1, self.input_nodes )/np.sqrt(self.input_nodes/2 ) # hidden1 x features
            self.W2 = np.random.randn(self.hidden_nodes_2, self.hidden_nodes_1 )/np.sqrt(self.hidden_nodes_1/2 )  # hidden2 x hidden1
            self.W3 = np.random.randn(self.output_nodes, self.hidden_nodes_2 )/np.sqrt(self.hidden_nodes_2/2 )  # output x hidden2
        else : # Xavier initialization
            self.W1 = np.random.randn(self.hidden_nodes_1, self.input_nodes )/np.sqrt(self.input_nodes ) # hidden1 x features
            self.W2 = np.random.randn(self.hidden_nodes_2, self.hidden_nodes_1 )/np.sqrt(self.hidden_nodes_1)  # hidden2 x hidden1
            self.W3 = np.random.randn(self.output_nodes, self.hidden_nodes_2 )/np.sqrt(self.hidden_nodes_2)  # output x hidden2
        
    def create_biases(self):  
        self.b1 =  np.zeros((self.hidden_nodes_1, 1 ))
        self.b2 = np.zeros((self.hidden_nodes_2, 1 ))
        self.b3 = np.zeros((self.output_nodes, 1 ))
     
    def reset_adam(self):
        '''
        Creates Adam optimizations variables
        '''
        self.Vdw1 = np.zeros((self.hidden_nodes_1, self.input_nodes ))
        self.Vdw2 = np.zeros((self.hidden_nodes_2, self.hidden_nodes_1 ))
        self.Vdw3 = np.zeros((self.output_nodes, self.hidden_nodes_2))
       
        self.Vdb1 = np.zeros((self.hidden_nodes_1, 1 ))
        self.Vdb2 = np.zeros((self.hidden_nodes_2, 1 ))
        self.Vdb3 = np.zeros((self.output_nodes, 1 ))
        
        self.Sdw1 = np.zeros((self.hidden_nodes_1, self.input_nodes ))
        self.Sdw2 = np.zeros((self.hidden_nodes_2, self.hidden_nodes_1 ))
        self.Sdw3 = np.zeros((self.output_nodes, self.hidden_nodes_2))
       
        self.Sdb1 = np.zeros((self.hidden_nodes_1, 1 ))
        self.Sdb2 = np.zeros((self.hidden_nodes_2, 1 ))
        self.Sdb3 = np.zeros((self.output_nodes, 1 ))
                
    def forward(self, X):
        Z1 = self.W1.dot(X.T) + self.b1      # Hidden1 x N_samples
        A1 = self.activation_hidden_1(Z1)      # Hidden1 x N_samples
        Z2 = self.W2.dot(A1) + self.b2      # Hidden2 x N_samples
        A2 = self.activation_hidden_2(Z2)      # Hidden2 x N_samples
        Z3 = self.W3.dot(A2) + self.b3       # Output x N_samples
        A3 = softmax(Z3)                     #Output x N_samples
        return A3, Z3, A2, Z2, A1, Z1
    
    def backprop(self, X, target):
        # Forward prop
        A3, Z3, A2, Z2, A1, Z1 = self.forward(X)
        # Compute cost
        cost = cross_entropy(target, A3)
        # N_samples
        m = X.shape[0]
        # deltas
        dZ3 = A3 - target                                      #Output x N_samples
        dW3 = dZ3.dot(A2.T)/m                                  #Output x Hidden_2
        db3 = np.sum(dZ3, axis=1, keepdims=True)/m             #Output x 1
        dZ2 = self.W3.T.dot(dZ3)*self.hidden_derivative_2(Z2)    # Hidden2 x N_samples
        dW2 = dZ2.dot(A1.T)/m                                     # Hidden2 x Hidden1 
        db2 = np.sum(dZ2, axis=1, keepdims=True)/m             # Hidden2 x 1
        dZ1 = self.W2.T.dot(dZ2)*self.hidden_derivative_1(Z1)     # Hidden x N_samples
        dW1 = dZ1.dot(X)/m                                      # Hidden x N_Features
        db1 = np.sum(dZ1, axis=1, keepdims=True)/m              # Hidden x 1
     
        # Update
        lr = self.learning_rate
        self.W3 -= lr*dW3
        self.b3 -= lr*db3
        self.W2 -= lr*dW2
        self.b2 -= lr*db2
        self.W1 -= lr*dW1
        self.b1 -= lr*db1
        
        return cost
        
    
    def backprop_minibatch(self, X, target):
        n = X.shape[1]               # N_features
        batch_size = X.shape[0]      # N_samples
        if self.batch_size == None :
            batch_size = self.minibatch_size(batch_size)
        else :
            batch_size = self.batch_size
            
        X_SGD = X.copy()
        u = rng.shuffle(np.arange(X.shape[0] ))
        X_SGD = X_SGD[u,:].squeeze()    # N_samples x N_Features
        target_SGD = target[:,u].squeeze() # Output x N_samples
        cost = 0
        
        pass_length = int(X.shape[0]/batch_size)
        for i in range(pass_length) :
            k = i*batch_size
            # Forward prop
            X = X_SGD[k:k+batch_size,:].reshape(batch_size,-1)              #batch_size x N_features
            A3, Z3, A2, Z2, A1, Z1 = self.forward(X)
            # cost update
            cost = cost + cross_entropy(target_SGD[:,k:k+batch_size].reshape(-1,batch_size), A3)/pass_length
            # deltas
            dZ3 = A3 - target_SGD[:,k:k+batch_size].reshape(-1,batch_size)   #Output x batch_size
            dW3 = dZ3.dot(A2.T)/batch_size                                   #Output x hidden_2
            db3 = np.sum(dZ3, axis=1, keepdims=True)/batch_size              #Output x 1
            dZ2 = self.W3.T.dot(dZ3)*self.hidden_derivative_2(Z2)            # Hidden2 x batch_size
            dW2 = dZ2.dot(A1.T)/batch_size                                   # Hidden2 x Hidden1 
            db2 = np.sum(dZ2, axis=1, keepdims=True)/batch_size              # Hidden2 x 1
            dZ1 = self.W2.T.dot(dZ2)*self.hidden_derivative_1(Z1)            # Hidden x batch_size
            dW1 = dZ1.dot(X)/batch_size                                      # Hidden x N_Features
            db1 = np.sum(dZ1, axis=1, keepdims=True)/batch_size              # Hidden x 1                        
            # Update
            lr = self.learning_rate
            self.W3 -= lr*dW3
            self.b3 -= lr*db3
            self.W2 -= lr*dW2
            self.b2 -= lr*db2
            self.W1 -= lr*dW1
            self.b1 -= lr*db1
        return cost
    
    def backpropADAM(self, X, target):
        # Forward prop
        A3, Z3, A2, Z2, A1, Z1 = self.forward(X)
        # Compute cost
        cost = cross_entropy(target, A3)
        # N samples
        m = X.shape[0]   
        # deltas
        dZ3 = A3 - target                                      #Output x N_samples
        dW3 = dZ3.dot(A2.T)/m                                  #Output x Hidden_2
        db3 = np.sum(dZ3, axis=1, keepdims=True)/m             #Output x 1
        dZ2 = self.W3.T.dot(dZ3)*self.hidden_derivative_2(Z2)    # Hidden2 x N_samples
        dW2 = dZ2.dot(A1.T)/m                                     # Hidden2 x Hidden1 
        db2 = np.sum(dZ2, axis=1, keepdims=True)/m             # Hidden2 x 1
        dZ1 = self.W2.T.dot(dZ2)*self.hidden_derivative_1(Z1)     # Hidden x N_samples
        dW1 = dZ1.dot(X)/m                                      # Hidden x N_Features
        db1 = np.sum(dZ1, axis=1, keepdims=True)/m              # Hidden x 1
        # Adam updates
        beta1 = self.beta1
        beta2 = self.beta2
        # V
        self.Vdw1 = beta1*self.Vdw1 + (1-beta1)*dW1
        self.Vdw2 = beta1*self.Vdw2 + (1-beta1)*dW2
        self.Vdw3 = beta1*self.Vdw3 + (1-beta1)*dW3
        self.Vdb1 = beta1*self.Vdb1 + (1-beta1)*db1
        self.Vdb2 = beta1*self.Vdb2 + (1-beta1)*db2
        self.Vdb3 = beta1*self.Vdb3 + (1-beta1)*db3
        # S
        self.Sdw1 = beta2*self.Sdw1 + (1-beta2)*dW1**2
        self.Sdw2 = beta2*self.Sdw2 + (1-beta2)*dW2**2
        self.Sdw3 = beta2*self.Sdw3 + (1-beta2)*dW3**2
        self.Sdb1 = beta2*self.Sdb1 + (1-beta2)*db1**2
        self.Sdb2 = beta2*self.Sdb2 + (1-beta2)*db2**2
        self.Sdb3 = beta2*self.Sdb3 + (1-beta2)*db3**2  
        # Update
        lr = self.learning_rate
        self.W3 -= lr * self.Vdw3 / (np.sqrt(self.Sdw3)+1e-8)
        self.b3 -= lr * self.Vdb3 / (np.sqrt(self.Sdb3)+1e-8)
        self.W2 -= lr * self.Vdw2 / (np.sqrt(self.Sdw2)+1e-8)
        self.b2 -= lr * self.Vdb2 / (np.sqrt(self.Sdb2)+1e-8)
        self.W1 -= lr * self.Vdw1 / (np.sqrt(self.Sdw1)+1e-8)
        self.b1 -= lr * self.Vdb1 / (np.sqrt(self.Sdb1)+1e-8)
        return cost  
    
      
    def predict(self, X_predict):
        A3, Z3, A2, Z2, A1, Z1 = self.forward(X_predict)
        return A3
    
    def predict_class(self, X_predict):
        A3, Z3, A2, Z2, A1, Z1 = self.forward(X_predict)
        y_pred = np.argmax(A3, axis=0)
        return y_pred
    # To be deleted               
    def xrun(self, X_train, target, epochs=10):
        costs = [1e-10]
        for i in range(epochs):
            cost = self.backprop(X_train, target)
            costs.append(cost)
            if i%10 == 0:
                print(f'Loss after epoch {i} : {cost}')
        costs.pop(0)
        return costs  
         
    def run(self, X_train, target, epochs=10):
        costs = [1e-10]
        if self.delta_stop == None : 
            if self.optimizer == 'adam':
                self.reset_adam()
                for i in range(epochs):
                    cost = self.backpropADAM(X_train, target)
                    
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 1epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                    
            elif self.optimizer == 'SGD' :
                for i in range(epochs):
                    cost = self.backpropSGD(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 2epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            elif self.optimizer == 'minibatch' :
                for i in range(epochs):
                    cost = self.backprop_minibatch(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 3epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            else :
                for i in range(epochs):  
                    cost = self.backprop(X_train, target)
                    costs.append(cost)
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 4epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            
        else :
            counter = 0
            if self.optimizer == 'adam':
                self.reset_adam()
                for i in range(epochs):
                    cost = self.backpropADAM(X_train, target)
                    print(f'Loss after epoch {i} : {cost}')
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 5epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                    
            elif self.optimizer == 'SGD' :
                for i in range(epochs):
                    cost = self.backpropSGD(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 6epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            elif self.optimizer == 'minibatch' :
                for i in range(epochs):
                    cost = self.backprop_minibatch(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                    else :
                        counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')
                print(f'Loss after 7epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
                
            else :  
                for i in range(epochs): 
                    cost = self.backprop(X_train, target)
                    costs.append(cost)
                    n = len(costs)
                    delta = np.abs(costs[n-1]/costs[n-2]-1)
                    if(delta < self.delta_stop) :
                        counter+=1
                        if(counter>=self.patience):
                            print(f'Early stop at epoch {i}, the cost is : {cost}')
                            costs.pop(0)
                            return costs
                        else :
                            counter =0
                    if i%10 == 0 and i>0 :
                        print(f'Loss after epoch {i} : {cost}')        
                print(f'Loss after 8epoch {len(costs)} : {costs[-1]}')        
                costs.pop(0)
                return costs
          
            
       
    def evaluate(self, X_evaluate, target):
        '''
        return accuracy score, target must be the classes and not the hot encoded target
        '''
        
        y_pred = self.predict_class(X_evaluate)
        accuracy = classification_rate(y_pred, target)
        print('Accuracy :', accuracy)
        return accuracy
    
    def minibatch_size(self, n_samples):
        '''
        Compute minibatch size in case its not provided
        '''
        if n_samples < 2000:
            return n_samples
        if n_samples < 12800:
            return 64
        if n_samples < 25600:
            return 128
        if n_samples < 51200:
            return 256
        if n_samples < 102400:
            return 512
        return 1024
        
        
        
        
            

In [761]:
nn = HiddenTwo(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes_1 = 9,
               hidden_nodes_2 = 7,
               learning_rate = 0.01,
               activation_hidden_1 = relu,
               activation_hidden_2 = relu,
               optimizer='adam',
              #batch_size=200
              )


In [762]:
c=nn.run(X_train, y_train_cat, epochs=400 )

Loss after epoch 10 : 0.1975488547415771
Loss after epoch 20 : 0.14997360424852205
Loss after epoch 30 : 0.13539839591865355
Loss after epoch 40 : 0.12082195270255933
Loss after epoch 50 : 0.10849031963453487
Loss after epoch 60 : 0.09933015158491876
Loss after epoch 70 : 0.09242548581001123
Loss after epoch 80 : 0.08535395251213562
Loss after epoch 90 : 0.07905414503163215
Loss after epoch 100 : 0.07459900842701588
Loss after epoch 110 : 0.07104861957478723
Loss after epoch 120 : 0.06828063573786859
Loss after epoch 130 : 0.06520633758146221
Loss after epoch 140 : 0.06268914849925311
Loss after epoch 150 : 0.060934624902296466
Loss after epoch 160 : 0.059322254995836946
Loss after epoch 170 : 0.05824582616622491
Loss after epoch 180 : 0.057193997812181624
Loss after epoch 190 : 0.05634407472788578
Loss after epoch 200 : 0.05559537018714807
Loss after epoch 210 : 0.05500607944048086
Loss after epoch 220 : 0.05443849314650221
Loss after epoch 230 : 0.05385838817668915
Loss after epoch 2

In [763]:
nn.evaluate(X_test, y_test)

Accuracy : 0.8118


0.8118

In [451]:
nn = HiddenOne(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes = 5,
               #hidden_nodes_2 = 1,
               learning_rate = 0.01,
               activation_hidden = relu,
               #activation_hidden_2 = relu,
               optimizer='minibatch',
              batch_size=200)


In [452]:
c=nn.run(X1, y11, epochs=400 )

Loss after epoch 10 : 0.08597256371026611
Loss after epoch 20 : 0.07169481260925979
Loss after epoch 30 : 0.06567010778371164
Loss after epoch 40 : 0.0620478213421242
Loss after epoch 50 : 0.05950785690640602
Loss after epoch 60 : 0.05757478829274955
Loss after epoch 70 : 0.056051360056106406
Loss after epoch 80 : 0.05482564820920087
Loss after epoch 90 : 0.05382021171029689
Loss after epoch 100 : 0.052978879908718995
Loss after epoch 110 : 0.05226140707559768
Loss after epoch 120 : 0.05163947120984345
Loss after epoch 130 : 0.051092929718120156
Loss after epoch 140 : 0.05060807398578545
Loss after epoch 150 : 0.050173258883087074
Loss after epoch 160 : 0.04978128511246916
Loss after epoch 170 : 0.04942493618253959
Loss after epoch 180 : 0.04909974908371374
Loss after epoch 190 : 0.04880092430208747
Loss after epoch 200 : 0.048525469148654576
Loss after epoch 210 : 0.048270381760029984
Loss after epoch 220 : 0.048033530775136404
Loss after epoch 230 : 0.04781163041760578
Loss after epo

In [453]:
acc = nn.evaluate(X2, y2)

Accuracy : 0.8292666666666667


In [415]:
cost = nn.backprop(X_train, y_train_cat)

In [417]:
A3, Z3, A2, Z2, A1, Z1 = nn.forward(X_train)

In [418]:
print(A3-y_train_cat)

[[ 0.10485694 -0.89514306 -0.89514306 ...  0.10485694 -0.89514306
   0.10485694]
 [ 0.06021112  0.06021112  0.06021112 ...  0.06021112  0.06021112
   0.06021112]
 [ 0.0907259   0.0907259   0.0907259  ...  0.0907259   0.0907259
   0.0907259 ]
 ...
 [ 0.09401693  0.09401693  0.09401693 ...  0.09401693  0.09401693
   0.09401693]
 [ 0.09631682  0.09631682  0.09631682 ...  0.09631682  0.09631682
   0.09631682]
 [-0.86754949  0.13245051  0.13245051 ...  0.13245051  0.13245051
   0.13245051]]


[Back to top](#Content:)


<a id='part6'></a>

### Part 6 -  Loading Fashion MNIST

In [764]:
from tensorflow.keras.datasets import fashion_mnist


In [765]:
fashion = fashion_mnist.load_data()

In [766]:
(X_train, y_train),(X_test, y_test) = fashion

In [767]:
print(X_train.shape)

(60000, 28, 28)


In [768]:
M = X_train.shape[1]
N_train = X_train.shape[0]
N_test = X_test.shape[0]

In [769]:
X_train = X_train.reshape(N_train, M*M, 1).squeeze()
X_test = X_test.reshape(N_test, M*M, 1).squeeze()

In [770]:
y_train_cat = to_categorical(y_train).T
y_test_cat = to_categorical(y_test).T

In [771]:
#MAX = 255
#X_train = X_train/ MAX
#X_test =X_test/ MAX

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train= scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [613]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout

# Fashion MNIST with 1 hidden layer

In [772]:
D = X_train.shape[1]
K = y_train_cat.shape[0]
M=5
nn = HiddenOne(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes = M,
               learning_rate = 0.01,
               activation_hidden = tanh)

In [773]:
c = nn.run(X_train, y_train_cat, epochs=500 )

Loss after epoch 10 : 0.19769455850518092
Loss after epoch 20 : 0.18447470664662308
Loss after epoch 30 : 0.17788118929486993
Loss after epoch 40 : 0.1733091170234118
Loss after epoch 50 : 0.16971472610771088
Loss after epoch 60 : 0.1667337580919098
Loss after epoch 70 : 0.16420194915662806
Loss after epoch 80 : 0.16200510514988745
Loss after epoch 90 : 0.1600520905801858
Loss after epoch 100 : 0.15827740724881514
Loss after epoch 110 : 0.15663658039234027
Loss after epoch 120 : 0.1550994347574728
Loss after epoch 130 : 0.15364506952402274
Loss after epoch 140 : 0.15225863050225014
Loss after epoch 150 : 0.15092928672653272
Loss after epoch 160 : 0.14964894702059203
Loss after epoch 170 : 0.14841142958676912
Loss after epoch 180 : 0.14721191364184538
Loss after epoch 190 : 0.14604656980465372
Loss after epoch 200 : 0.1449123055584184
Loss after epoch 210 : 0.1438065860696706
Loss after epoch 220 : 0.1427273054123765
Loss after epoch 230 : 0.1416726922897992
Loss after epoch 240 : 0.140

In [774]:
acc = nn.evaluate(X_test, y_test)


Accuracy : 0.7287


In [775]:
D = X_train.shape[1]
K = y_train_cat.shape[0]
M=5
nn = HiddenOne(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes = M,
               learning_rate = 0.01,
               activation_hidden = relu,
               optimizer='minibatch',
                delta_stop = 1e-7,
                patience = 5,
              )

In [776]:
c = nn.run(X_train, y_train_cat, epochs=200 )

Loss after epoch 10 : 0.07201086010993014
Loss after epoch 20 : 0.06456045855266555
Loss after epoch 30 : 0.06075946433965692
Loss after epoch 40 : 0.058230730209001644
Loss after epoch 50 : 0.05641187204620377
Loss after epoch 60 : 0.05500763006864552
Loss after epoch 70 : 0.05388005931007338
Loss after epoch 80 : 0.05294915246231096
Loss after epoch 90 : 0.052159997077627474
Loss after epoch 100 : 0.05147454478004901
Loss after epoch 110 : 0.050868256916434304
Loss after epoch 120 : 0.050330873441123324
Loss after epoch 130 : 0.04985564676707296
Loss after epoch 140 : 0.04942866576952212
Loss after epoch 150 : 0.04903962599001507
Loss after epoch 160 : 0.04868285820185186
Loss after epoch 170 : 0.04835792550200942
Loss after epoch 180 : 0.048062425352798074
Loss after epoch 190 : 0.047790328795408774
Loss after epoch 201 : 0.04756171465217987


In [777]:
acc = nn.evaluate(X_test, y_test)

Accuracy : 0.8241


### SGD

In [315]:
nn_SGD = HiddenOne(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes = M,
               learning_rate = 0.01,
               activation_hidden = relu,
               optimizer='SGD',
               #batch_size = 28,
                delta_stop = 1e-3,
                patience = 5,
              )

In [316]:
c = nn_SGD.run(X_train, y_train_cat, epochs=50 )

Loss after epoch 10 : 0.05229169746879698
Loss after epoch 20 : 0.050989022653112885
Loss after epoch 30 : 0.05047409297339392
Loss after epoch 40 : 0.05006307293211981
Loss after epoch 51 : 0.04998152154949375


In [317]:
acc = nn.evaluate(X_test, y_test)

Accuracy : 0.7947


# ADAM

In [318]:
nn_adam = HiddenOne(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes = M,
               learning_rate = 0.01,
               activation_hidden = relu,
               optimizer='adam',
               #batch_size = 28,
                #delta_stop = 1e-3,
                #patience = 5,
              )

In [319]:
c = nn_adam.run(X_train, y_train_cat, epochs=300 )

Loss after epoch 10 : 0.23329427329855534
Loss after epoch 20 : 0.22581268191815174
Loss after epoch 30 : 0.21756016625246444
Loss after epoch 40 : 0.2123454866753959
Loss after epoch 50 : 0.20782633720502175
Loss after epoch 60 : 0.20225578209607875
Loss after epoch 70 : 0.19722409237726474
Loss after epoch 80 : 0.1927940246394797
Loss after epoch 90 : 0.18775465784580125
Loss after epoch 100 : 0.18286143393355006
Loss after epoch 110 : 0.179449483118376
Loss after epoch 120 : 0.17719147826304704
Loss after epoch 130 : 0.17550718613772454
Loss after epoch 140 : 0.1741688920517013
Loss after epoch 150 : 0.17307389693182942
Loss after epoch 160 : 0.17215258888543558
Loss after epoch 170 : 0.1713067040286042
Loss after epoch 180 : 0.1704599507829086
Loss after epoch 190 : 0.16944868175105815
Loss after epoch 200 : 0.16803511558947992
Loss after epoch 210 : 0.16629823343822747
Loss after epoch 220 : 0.16506756526536606
Loss after epoch 230 : 0.16371251732833814
Loss after epoch 240 : 0.16

In [320]:
acc = nn.evaluate(X_test, y_test)

Accuracy : 0.7947


# 2 layers

In [629]:
nn = HiddenTwo(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes_1 = 5,
               hidden_nodes_2 = 7,
               learning_rate = 0.01,
               activation_hidden_1 = relu,
               activation_hidden_2 = relu,
               optimizer='adam',
              #delta_stop = 1e-5,
              #patience=5
              )


In [630]:
c= nn.run(X_train, y_train_cat, epochs=600 )

Loss after epoch 10 : 0.23009918657484552
Loss after epoch 20 : 0.20925003257289768
Loss after epoch 30 : 0.19687719692833824
Loss after epoch 40 : 0.1830438426783252
Loss after epoch 50 : 0.17300164977829482
Loss after epoch 60 : 0.1633852997230326
Loss after epoch 70 : 0.15475056906538215
Loss after epoch 80 : 0.14603352547099815
Loss after epoch 90 : 0.13740840593385814
Loss after epoch 100 : 0.1305420250692707
Loss after epoch 110 : 0.12421605815376298
Loss after epoch 120 : 0.11855536991873979
Loss after epoch 130 : 0.11297379956453496
Loss after epoch 140 : 0.1063691771139391
Loss after epoch 150 : 0.09924395750899022
Loss after epoch 160 : 0.09293506478708678
Loss after epoch 170 : 0.0884201466092327
Loss after epoch 180 : 0.08514571794258906
Loss after epoch 190 : 0.08252271186843954
Loss after epoch 200 : 0.08016558721847178
Loss after epoch 210 : 0.07820037675723318
Loss after epoch 220 : 0.07662357319133095
Loss after epoch 230 : 0.07534951891270372
Loss after epoch 240 : 0.

In [631]:
acc = nn.evaluate(X_test, y_test)

Accuracy : 0.7435


In [784]:
nn = HiddenTwo(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes_1 = 64,
               hidden_nodes_2 = 32,
               learning_rate = 0.01,
               activation_hidden_1 = relu,
               activation_hidden_2 = relu,
               optimizer='adam',
               delta_stop = 1e-5,
               patience=5
              )


In [785]:
c= nn.run(X_train, y_train_cat, epochs=500 )

Loss after epoch 0 : 0.3347957865433994
Loss after epoch 1 : 0.5320168134159778
Loss after epoch 2 : 1.157130499416637
Loss after epoch 3 : 1.5418694393832684
Loss after epoch 4 : 0.5432880684295185
Loss after epoch 5 : 0.3182735420275621
Loss after epoch 6 : 0.23358968794332943
Loss after epoch 7 : 0.19897185593060784
Loss after epoch 8 : 0.18063892545249638
Loss after epoch 9 : 0.1791005209125011
Loss after epoch 10 : 0.18049939152442432
Loss after epoch 10 : 0.18049939152442432
Loss after epoch 11 : 0.1844177825873545
Loss after epoch 12 : 0.18075155984887634
Loss after epoch 13 : 0.17418912013981377
Loss after epoch 14 : 0.16705448213065835
Loss after epoch 15 : 0.16054894526335764
Loss after epoch 16 : 0.15604620473849787
Loss after epoch 17 : 0.17019228153304622
Loss after epoch 18 : 0.1552243908167947
Loss after epoch 19 : 0.14732152623433026
Loss after epoch 20 : 0.14343440518989856
Loss after epoch 20 : 0.14343440518989856
Loss after epoch 21 : 0.14389630115708066
Loss after e

  # This is added back by InteractiveShellApp.init_path()
  # This is added back by InteractiveShellApp.init_path()


Loss after epoch 48 : nan
Loss after epoch 49 : nan
Loss after epoch 50 : nan
Loss after epoch 50 : nan
Loss after epoch 51 : nan
Loss after epoch 52 : nan
Loss after epoch 53 : nan
Loss after epoch 54 : nan
Loss after epoch 55 : nan
Loss after epoch 56 : nan
Loss after epoch 57 : nan
Loss after epoch 58 : nan
Loss after epoch 59 : nan
Loss after epoch 60 : nan
Loss after epoch 60 : nan
Loss after epoch 61 : nan
Loss after epoch 62 : nan
Loss after epoch 63 : nan
Loss after epoch 64 : nan
Loss after epoch 65 : nan
Loss after epoch 66 : nan
Loss after epoch 67 : nan
Loss after epoch 68 : nan
Loss after epoch 69 : nan
Loss after epoch 70 : nan
Loss after epoch 70 : nan
Loss after epoch 71 : nan
Loss after epoch 72 : nan
Loss after epoch 73 : nan
Loss after epoch 74 : nan
Loss after epoch 75 : nan
Loss after epoch 76 : nan
Loss after epoch 77 : nan
Loss after epoch 78 : nan
Loss after epoch 79 : nan
Loss after epoch 80 : nan
Loss after epoch 80 : nan
Loss after epoch 81 : nan
Loss after e

KeyboardInterrupt: 

In [786]:
acc = nn.evaluate(X_test, y_test)

Accuracy : 0.779


# Optimizer

In [366]:
nn = HiddenTwo(input_nodes = D, 
               output_nodes = K, 
               hidden_nodes_1 = M,
               hidden_nodes_2 = M,
               learning_rate = 0.01,
               activation_hidden_1 = relu,
               activation_hidden_2 = relu,
              optimizer = 'adam')
c= nn.run(X_train, y_train_cat, epochs=200 )

Loss after epoch 0 : 4.0248719069912555
Loss after epoch 1 : 2.276420196595539
Loss after epoch 2 : 1.4171308091484893
Loss after epoch 3 : 0.4909401642439696
Loss after epoch 4 : 0.23462532199671818
Loss after epoch 5 : 0.23362828058409388
Loss after epoch 6 : 0.2339391955562715
Loss after epoch 7 : 0.23403101872804918
Loss after epoch 8 : 0.23408034835165503
Loss after epoch 9 : 0.23408641247524933
Loss after epoch 10 : 0.23405133302759268
Loss after epoch 11 : 0.2339796667265713
Loss after epoch 12 : 0.23387689997194383
Loss after epoch 13 : 0.23374862251877318
Loss after epoch 14 : 0.2336030481667226
Loss after epoch 15 : 0.23349178793024195
Loss after epoch 16 : 0.23336433962006922
Loss after epoch 17 : 0.23322363763084164
Loss after epoch 18 : 0.23307246338027093
Loss after epoch 19 : 0.23291342250478173
Loss after epoch 20 : 0.23274895271160082
Loss after epoch 21 : 0.23258134424862817
Loss after epoch 22 : 0.23241275768847888
Loss after epoch 23 : 0.23224522931747094
Loss after

Loss after epoch 196 : 0.23025850931269418
Loss after epoch 197 : 0.23025850931118047
Loss after epoch 198 : 0.23025850930993022
Loss after epoch 199 : 0.2302585093088975
Loss after 1epoch 201 : 0.2302585093088975


In [367]:
acc = nn.evaluate(X_test, y_test)

Accuracy : 0.1
