In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

Activation Function (sigmoid):
$$\sigma(x) = \frac{1} {1 + e^{-x}}$$

Derivative of Activation Function:
$$\sigma'(x) = \sigma'(x)(1-\sigma'(x))$$

In [7]:
class SimpleNeuralNetwork:
    
    # Simple 2 layer neural network
    # First layer has as a node for each feature 
    # Second layer has a configurable number of nodes that map to 1 output
    def __init__(self, num_of_features, second_layer_size):
        self.weights0 = np.random.rand(num_of_features, second_layer_size)
        self.weights1 = np.random.rand(second_layer_size, 1)
    
    def activation_func(self, x):
        return 1/(1+np.exp(-x))
    
    def activation_func_derivative(self, x):
        return self.activation_func(x)*(1-self.activation_func(x))
        
    def predict(self, features):
        results0 = self.activation_func(features.dot(self.weights0))
        results1 = self.activation_func(results0.dot(self.weights1))
        return results1;

$$\texttt{Let }w_0 \texttt{ represent the first layer's weights,}$$
$$w_1 \texttt{ represent the second layer's weights,}$$
$$\texttt{and }y_i \texttt{ represent the training labels}$$

For this simple example, we will use mean squared error for our loss function:
$$MSE = L(\boldsymbol{x_0}) = \sum_{i=1}^{n}(y_i-\sigma(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0))+b_1))^2$$

Ideally, for a logistic regression neural network classifier, the log loss would be the preferred loss function as it generates a convex curve while MSE does not. But for this simple example, we will proceed with using MSE as our loss function so the derivative for back propogation is easier to follow.

$$Log Loss = \sum_{i=1}^{n}(y_i\log(\sigma(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0))+b_1)) + (1-y_i)\log(1-\sigma(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0))+b_1)))$$

Note, these formulas applicable for batch gradient descent because we are summing all data for each single step.

Gradient with respect to layer 1 for backpropagation:
$$\nabla L(\boldsymbol{w_1}) = \sum_{i=1}^{n}(2/n)(\sigma(\boldsymbol{w_1}\boldsymbol{x_1}+b_1)-y_i)(\sigma'(\boldsymbol{w_1}\boldsymbol{x_1}+b_1))(\boldsymbol{x_1})$$

This formula can be mapped to the code below as follows:  
$$\texttt{get_errors(): }\sigma(\boldsymbol{w_1}\boldsymbol{x_1}+b_1)-y_i$$
$$\texttt{activation_func_derivative(results1): }\sigma'(\boldsymbol{w_1}\boldsymbol{x_1}+b_1)$$
$$\texttt{results0: }\boldsymbol{x_1}$$

Gradient with respoect to layer 0 for backpropagation:
$$\nabla(L\boldsymbol{w_0}) = \sum_{i=1}^{n}(2/n)(\sigma(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0)+b_1))-y_i)\sigma'(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0)+b_1))(\boldsymbol{w_1})\sigma'(\boldsymbol{w_0}\boldsymbol{x_0}+b_0)(\boldsymbol{x_0})$$

This formula can be mapped to the code below as follows:  
$$\texttt{get_errors(): }\sigma(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0)+b_1))-y_i$$
$$\texttt{activation_func_derivative(results1): }\sigma'(\boldsymbol{w_1}(\sigma(\boldsymbol{w_0}\boldsymbol{x_0}+b_0))+b_1)$$
$$\texttt{weights1: }\boldsymbol{w_1}$$
$$\texttt{activation_func_derivative(results0): }\sigma'(\boldsymbol{w_0}\boldsymbol{x_0}+b_0)$$
$$\texttt{features: }\boldsymbol{x_0}$$

Notice that for a simple two layer network, the derivative for first layer becomes large quickly through the chain rule. Most modern deep learning libraries will contain a automatic differentiation engine built upon computation graphs to help with this for more complex models. Autograd is an example from Pytorch: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html.

In [8]:
class Trainer:
    
    def get_errors(self, labels, predictions):
        return labels - predictions
    
    def get_mean_squared_error(self, errors):
        return np.sum(np.square(errors))/errors.size 
        
    def feedforward(self, nn, features):
        results0 = nn.activation_func(features.dot(nn.weights0))
        results1 = nn.activation_func(results0.dot(nn.weights1))
        return results0, results1
    
    def backpropagate(self, nn, features, results0, results1, errors, learning_rate):
        weights1_delta = (2/errors.size)*results0.T.dot(errors*nn.activation_func_derivative(results1))
        weights0_delta = (2/errors.size)*features.T.dot(((errors*nn.activation_func_derivative(results1)).dot(nn.weights1.T))*(nn.activation_func_derivative(results0)))
                
        nn.weights1 += learning_rate*weights1_delta
        nn.weights0 += learning_rate*weights0_delta
        
    def train(self, nn, features, labels, learning_rate, epochs):
        for epoch in range(epochs):
            results0, results1 = self.feedforward(nn, features)
            errors = self.get_errors(labels, results1)
            mean_squared_errors = self.get_mean_squared_error(errors)
            print("At epoch:", epoch, ", MSE = ", mean_squared_errors) 
            self.backpropagate(nn, features, results0, results1, errors, learning_rate)

In [9]:
trainer = Trainer()

In [10]:
# Creating some simple training data just to check basic funcationality
simple_features = np.array([[-3,-3],
                            [-3,3],
                            [3,-3],
                            [3,3]])

simple_labels = np.array([[0], [1], [1], [1]])

In [11]:
simple_neural_network = SimpleNeuralNetwork(2, 10)
trainer.train(simple_neural_network, simple_features, simple_labels, 0.1, 100)

At epoch: 0 , MSE =  0.09576415704131132
At epoch: 1 , MSE =  0.09512216288054746
At epoch: 2 , MSE =  0.09449519608429688
At epoch: 3 , MSE =  0.09388287364821901
At epoch: 4 , MSE =  0.0932848210078843
At epoch: 5 , MSE =  0.09270067206268927
At epoch: 6 , MSE =  0.09213006917363828
At epoch: 7 , MSE =  0.09157266313785678
At epoch: 8 , MSE =  0.09102811314247154
At epoch: 9 , MSE =  0.09049608670027723
At epoch: 10 , MSE =  0.08997625956940519
At epoch: 11 , MSE =  0.08946831565902003
At epoch: 12 , MSE =  0.08897194692289354
At epoch: 13 , MSE =  0.08848685324253829
At epoch: 14 , MSE =  0.08801274230143098
At epoch: 15 , MSE =  0.08754932945171366
At epoch: 16 , MSE =  0.08709633757462774
At epoch: 17 , MSE =  0.08665349693581538
At epoch: 18 , MSE =  0.0862205450365102
At epoch: 19 , MSE =  0.08579722646153623
At epoch: 20 , MSE =  0.08538329272493933
At epoch: 21 , MSE =  0.08497850211398723
At epoch: 22 , MSE =  0.08458261953219702
At epoch: 23 , MSE =  0.0841954163419738
At ep

In [12]:
# Creating some simple training data just to check basic funcationality
simple_test_data = np.array([[-10,-10],
                             [-10,10],
                             [10,-10],
                             [10,10]])

simple_test_labels = np.array([[0], [1], [1], [1]])
predicted_values = simple_neural_network.predict(simple_test_data)
print("Predicted Values:", predicted_values)
errors = trainer.get_errors(simple_test_labels, predicted_values)
mean_squared_error = trainer.get_mean_squared_error(errors);
print("Mean Squared Error:", mean_squared_error)

Predicted Values: [[0.50003473]
 [0.90679407]
 [0.97406575]
 [0.99727043]]
Mean Squared Error: 0.06485052790261346


In [13]:
# Okay, now that it seems to be working, lets test with some more complex data
# Using data from here: http://archive.ics.uci.edu/ml/datasets/Abalone
df = pd.read_csv("abalone.csv")
df.head()

Unnamed: 0,Type,LongestShell,Diameter,Height,WholeWeight,ShuckedWeight,VisceraWeight,ShellWeight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [14]:
# Doing a bit of preprocessing
def string_to_binary(string):
    if string == "F":
        return 1
    else:
        return 0
    
df = df.loc[df['Type'] != "I"]
df['Type'] = df['Type'].apply(string_to_binary)
df.head()

Unnamed: 0,Type,LongestShell,Diameter,Height,WholeWeight,ShuckedWeight,VisceraWeight,ShellWeight,Rings
0,0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,0,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,1,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,0,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
6,1,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20


In [15]:
# Splitting into training and testing data
training_data, testing_data = train_test_split(df, test_size=0.2)
training_features = training_data.iloc[:, ~training_data.columns.isin(['Type'])].values
training_labels = training_data['Type'].values.reshape(len(training_features),1)
testing_features = testing_data.iloc[:, ~testing_data.columns.isin(['Type'])].values
testing_labels = testing_data['Type'].values.reshape(len(testing_features),1)

In [16]:
# Training
classification_neural_network = SimpleNeuralNetwork(8, 10)
trainer.train(classification_neural_network, training_features, training_labels, 0.1, 100)

At epoch: 0 , MSE =  0.5222390349214945
At epoch: 1 , MSE =  0.5175807931473139
At epoch: 2 , MSE =  0.511954531789382
At epoch: 3 , MSE =  0.5052089091181172
At epoch: 4 , MSE =  0.4971947564779357
At epoch: 5 , MSE =  0.48777935589302085
At epoch: 6 , MSE =  0.4768657379710762
At epoch: 7 , MSE =  0.46441595541928615
At epoch: 8 , MSE =  0.45047541818215414
At epoch: 9 , MSE =  0.43519324514394236
At epoch: 10 , MSE =  0.4188320635851792
At epoch: 11 , MSE =  0.401761004491445
At epoch: 12 , MSE =  0.3844287502183131
At epoch: 13 , MSE =  0.367319095742956
At epoch: 14 , MSE =  0.35089755419011226
At epoch: 15 , MSE =  0.3355611166245467
At epoch: 16 , MSE =  0.3216023611904771
At epoch: 17 , MSE =  0.30919410057957714
At epoch: 18 , MSE =  0.2983942100274783
At epoch: 19 , MSE =  0.28916515334455856
At epoch: 20 , MSE =  0.2814006586236127
At epoch: 21 , MSE =  0.27495279028739855
At epoch: 22 , MSE =  0.2696550054280362
At epoch: 23 , MSE =  0.26533929372507287
At epoch: 24 , MSE =

In [17]:
# Checking performance on testing data
# Note: not expecting optimal performance here as a non optimal loss function is being used
predicted_values = classification_neural_network.predict(testing_features)
errors = trainer.get_errors(testing_labels, predicted_values)
print("Mean Squared Error:", trainer.get_mean_squared_error(errors))

Mean Squared Error: 0.249204095367048
