### Deep Learning Fundamentals - Feedforward Neural Network from Scratch
_Yoav Rabinovich, October 2019_

-------------

In this assignment I avoided using any external package bar CSV and Numpy. I implemented a simple feedforward, fully connected neural network of arbitrary size and activation function, and arbitrary input and output dimensions. I've implemented the Sigmoid, Tanh, and Leaky ReLU activation functions, matching them with approporiate weight initialization methods, as well as the logarithmic loss function.

At first, I implemented Sigmoid and Tanh, an interesting result occurs when the network was trained using either: The network defaults to outputing 1 for both classes regardless of input. This was caused due to saturation of the activation functions: the activation values were all high enough to map to 1 on the activation function curves. Indeed, implementing and training the network using Leaky ReLU, the problem disappeared.

Training on the Titanic dataset using Leaky ReLU, the network hit a local minimum at loss=500. An evaluation of the result shows perfect precision and recall, but low accuracy. This is because the local minimum found defaulted to declaring all people on board as casualties automatically. This suggests a need to decrease the biasof the network, but as this wasn't the focus of the assignment, this step was skipped to conserve time.

Colab Link: https://colab.research.google.com/drive/1LSppmf8Ws0H4jtTkcoltxGdmZjc8MNGq

In [0]:
#@title Imports and Data (Automatic Import)

Kaggle_Username = 'xxx' #@param {type:"string"}
Kaggle_Key = 'xxxx' #@param {type:"string"}

import numpy as np
import csv

import os
os.environ['KAGGLE_USERNAME'] = Kaggle_Username 
os.environ['KAGGLE_KEY'] = Kaggle_Key
!kaggle competitions download -c titanic


#### Neuron and Layer Classes

In [0]:
class Neuron:
    """
    Initializes weights and bias using schemes depending on activation function:
    - Sigmoid -> uniform from [0,1).
    - Tanh -> Xavier initialization.
    - Leaky ReLU -> He initialization.
    """
    def __init__(self, inputs_n, activation_f):
        if activation_f == sigmoid_activation_f:
            self.weights = np.random.uniform(-1,1,inputs_n+1)
        if activation_f == tanh_activation_f:
            self.weights = np.random.uniform(-1,1,inputs_n+1)*np.sqrt(3/inputs_n)
        if activation_f == lrelu_activation_f:
            self.weights = np.random.uniform(-1,1,inputs_n+1)*np.sqrt(2/inputs_n)
        self.activation = activation_f
        self.output = 0
        self.delta = 0

    def activate(self,inputs):
        """
        Adds together bias and weighted inputs and applies the activation function.
        """
        a = self.weights[-1]
        for i in range(len(self.weights)-1):
            a += self.weights[i]*inputs[i]
        self.output = self.activation.evaluate(a)
        return (self.output)

class Layer:
    """
    Collects neurons into a list representing a layer
    """
    def __init__(self, neurons_n, inputs_n, activation_f):
        self.neurons = []
        for i in range(neurons_n):
            self.neurons.append(Neuron(inputs_n,activation_f))


#### Activation Function Class

In [0]:
class Activation:
    """
    Wraps an activation function together with its derivative
    """
    def __init__(self,f,fp):
        self.evaluate = f
        self.evaluate_derivative = fp

#### Feedforward Neural Network Class

In [0]:
class FFNN:
    """
        Takes:
        - Input_dim (Integer): number of input features.
        - Hidden_dims (List of integers): where the length represents the number
            of hidden layers, and each entry represents the number of neurons
            in that layer.
        - Output_dim (Integer): number of output neurons.
        - Activation_f (activation object): activation function.
            Activations are assigned per network for now, but can be set by hand
            per neuron if one wishes to experiment.
        - Eta (Float): Initial learning rate.
        """
    def __init__(self,input_dim,hidden_dims,output_dim,activation_f,loss_f):
        self.layers = []
        self.layers.append(Layer(hidden_dims[0],input_dim,activation_f))
        for i in range(1,len(hidden_dims)):
            self.layers.append(Layer(hidden_dims[i],hidden_dims[i-1],activation_f))
        self.layers.append(Layer(output_dim,hidden_dims[-1],activation_f))
        self.loss = loss_f

    # Pass an input through the system, writing output values for each neuron,
    # and returning the output prediction
    def forward_prop(self, inputs):
        # Walk through layers
        for l in self.layers:
            activations=[]
            # Activate each neuron and record
            for n in l.neurons:
                activations.append(n.activate(inputs))
            inputs = activations
        return inputs

    # Pass the true label backwards through the network, updating delta values
    # for each neuron, based on the product rule using the derivative of the
    # activation functions
    def back_prop(self, labels):
        # Backward walk through layers
        for i in reversed(range(len(self.layers))):
            errors = []
            # For intermediate layers, propagate error
            if i != len(self.layers)-1:
                for j in range(len(self.layers[i].neurons)):
                    error = 0
                    for n in self.layers[i+1].neurons:
                        error += (n.weights[j] * n.delta)
                    errors.append(error)
            # For first layer, calculate loss
            else:
                for j in range(len(self.layers[i].neurons)):
                    errors.append(self.loss(labels[j],self.layers[i].neurons[j].output))
            # For all layers, set delta
            for j in range(len(self.layers[i].neurons)):
                self.layers[i].neurons[j].delta = errors[j] * self.layers[i].neurons[j].activation.evaluate_derivative(self.layers[i].neurons[j].output)

    # Update the weights of each neurons based on the delta values and input
    def update_weights(self, inputs, eta):
        # Walk through layers
        for i in range(len(self.layers)):
            inputs = inputs[0:-1]
            # For intermediate layers, set inputs as outputs of prev layer
            if i != 0:
                inputs = []
                for n in self.layers[i-1].neurons:
                    inputs.append(n.output)
            # Update weights based on delta and inputs (except bias)
            for n in self.layers[i].neurons:
                n.weights[-1] += eta * n.delta
                for j in range(len(inputs)):
                    n.weights[j] += eta * n.delta * inputs[j]
    # Run forward and back propagation for all data for each epoch
    def fit(self,X,y,eta,epochs_n,verbose=1):

        for e in range(epochs_n):
            loss = 0
            for i in range(len(X)):
                outputs = self.forward_prop(X[i])
                for j in range(len(y[i])):
                    loss += np.abs(self.loss(y[i][j],outputs[j]))
                self.back_prop(y[i])
                self.update_weights(X[i], eta)
            if verbose==1:
                print('Epoch: %d, Loss=%.2f' % (e, loss))
            np.random.shuffle(X)



#### Sigmoid Activation:

In [0]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def sigmoid_p(x):
    sx = sigmoid(x)
    return sx*(1-sx)

sigmoid_activation_f = Activation(sigmoid,sigmoid_p)

#### Tanh Activation

In [0]:
def tanh(x):
    return np.sinh(x)/np.cosh(x)

def tanh_p(x):
    return 1-tanh(x)**2

tanh_activation_f = Activation(tanh,tanh_p)

#### Leaky Relu Activation

In [0]:
def lrelu(x):
    return x/20 if x<0 else x

def lrelu_p(x):
    return 1/20 if x<=0 else 1

lrelu_activation_f = Activation(lrelu,lrelu_p)

#### Logistic Loss

In [0]:
def log_loss(y,y_hat):
    # print(y,y_hat)
    return -np.log(np.abs(1-y_hat)) if y==0 else -np.log(np.abs(y_hat))

#### Data Preprocessing

In [0]:
# Read data
X_raw = []
with open("train.csv", 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        if not row:
            continue
        X_raw.append(row)
del X_raw[0]

# Separate out labels, discard name, ticket number, cabin and origin, 
# remove incomplete observations, destring numbers,
# convert gender to one-hot encoding
X = []
y = []
for row in X_raw:
    if int(row[1])==0:
        y.append([1,0])
    else:
        y.append([0,1])
    del row[1],row[2],row[6],row[7],row[7]
    processed_row=[]
    incomplete = False
    for i in range(len(row)):
        if row[i]=="":
            incomplete = True
            break
        if i in [0,1,4,5]:
            row[i] = int(row[i])
            processed_row.append(row[i])
        if i in [3,6]:
            row[i] = float(row[i])
            processed_row.append(row[i])
        if i==2:
            if row[i]=="male":
                processed_row.append(0)
                processed_row.append(1)
            else:
                processed_row.append(1)
                processed_row.append(0)
    if not incomplete:
        X.append(processed_row)
    incomplete = False

    # Train/test split
    split = len(X)//4
    X_train = X[split:]
    X_test = X[:split]
    y_train = y[split:]
    y_test = y[:split]

#### Test

In [0]:
nn = FFNN(8,[8,8,8,8,8],2,lrelu_activation_f,log_loss)

In [0]:
nn.fit(X_train,y_train,1e-3,50)

Epoch: 0, Loss=945.50
Epoch: 1, Loss=792.56
Epoch: 2, Loss=756.39
Epoch: 3, Loss=690.52
Epoch: 4, Loss=603.64
Epoch: 5, Loss=574.78
Epoch: 6, Loss=552.89
Epoch: 7, Loss=543.16
Epoch: 8, Loss=533.27
Epoch: 9, Loss=527.34
Epoch: 10, Loss=523.59
Epoch: 11, Loss=520.38
Epoch: 12, Loss=517.33
Epoch: 13, Loss=514.79
Epoch: 14, Loss=513.00
Epoch: 15, Loss=512.01
Epoch: 16, Loss=511.22
Epoch: 17, Loss=511.14
Epoch: 18, Loss=508.93
Epoch: 19, Loss=508.33
Epoch: 20, Loss=507.64
Epoch: 21, Loss=508.21
Epoch: 22, Loss=507.95
Epoch: 23, Loss=507.64
Epoch: 24, Loss=506.05
Epoch: 25, Loss=506.66
Epoch: 26, Loss=506.50
Epoch: 27, Loss=506.36
Epoch: 28, Loss=505.97
Epoch: 29, Loss=506.11
Epoch: 30, Loss=506.38
Epoch: 31, Loss=505.46
Epoch: 32, Loss=505.71
Epoch: 33, Loss=505.48
Epoch: 34, Loss=505.77
Epoch: 35, Loss=505.56
Epoch: 36, Loss=505.38
Epoch: 37, Loss=505.61
Epoch: 38, Loss=505.43
Epoch: 39, Loss=505.44
Epoch: 40, Loss=504.96
Epoch: 41, Loss=505.40
Epoch: 42, Loss=504.66
Epoch: 43, Loss=504.3

#### Evaluate Network

In [0]:
n = len(X_test)
T=0
P=0
FP = 0
TP = 0
FN = 0
TN = 0
for i in range(n):
    output = nn.forward_prop(X_test[i])
    decision = np.argmax(output)
    true = False
    positive = False
    if y_test[i][1]==1:
        true = True
        T+=1
    if y_test[i][decision] == 1:
        positive = True
        P+=0
    if true and positive:
        TP+=1
    if true and not positive:
        TN+=1
    if positive and not true:
        FP+=1
    if not positive and not true:
        FN+=1
print("Precision: ",TP/(TP+FP))
print("Recall: ",TP/T)
print("Accuracy: ",(TN+TP)/n)
print(n,T,P)


Precision:  1.0
Recall:  1.0
Accuracy:  0.33146067415730335
178 59 0
