# Implementing a Neural Network from Scratch

--- 
## Introduction
In this post, we're going to implement a fully-connected neural network from scratch and use it to perform sentiment analysis on the IMDB movie reviews dataset. That is – we'll create a network to tell us if a movie review is positive or negative. The IMDB dataset is included with Keras, so let's dive right in. 

---
## Loading the Data

We load the data like usual. The `num_words=10000` says that we only want to keep that 10,000 most frequent words. 

In [1]:
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

Using TensorFlow backend.


Currently the reviews are stored as integer sequences where each integer represents a given word. The following function can convert an integer sequence back into the original review. The `word_index` maps each word to its integer, and by reversing that index, we can map each integer to its word. The '?' represents a word that was not in the top 10,000 words, so we don't know what it is. 

In [2]:
def sequence_to_text(sequence):
    """Converts an integer sequence into the actual review text."""
    word_index = imdb.get_word_index()
    reverse_index = {value: key for (key, value) in word_index.items()}
    return ' '.join([reverse_index.get(i - 3, '?') for i in sequence])

print("Review")
print("~~~~~~")
print(sequence_to_text(train_data[0]), end="\n\n")

print("Sentiment")
print("~~~~~~~~~")
print(train_labels[0])

Review
~~~~~~
? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done

We have to prepare our data to be fed into a neural network. We'll do this by creating a binary vector to represent each review. Each vector will be be `(10000,1)` and a 1 at index $i$ will indicate that word $i$ occurred in our review. One thng to not that this loses some valuable sequential information. For instance, the reviews "bad. not good." and "good. not bad." would have the same representation since they contain the same words. However, the first review describes a bad movie and the second review describes a good movie. There are other models that can handle sequential data like this, but this is an okay representation for now.

In [3]:
import numpy as np 

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1
    return results

# Vectorize our input data 
vectorized_train_data = vectorize_sequences(train_data)
vectorized_test_data = vectorize_sequences(test_data)

# Vectorize our label data as well 
vectorized_train_labels = np.asarray(train_labels).astype('float32')
vectorized_test_labels = np.asarray(test_labels).astype('float32')

Just to make everything is in order, let's check the shape of our data. 

In [4]:
print('Input Data')
print('Train Data', vectorized_train_data.shape)
print('Test Data', vectorized_test_data.shape)

print('\nOutput Data')
print('Train Labels', vectorized_train_labels.shape)
print('Train Labels', vectorized_test_labels.shape)

Input Data
Train Data (25000, 10000)
Test Data (25000, 10000)

Output Data
Train Labels (25000,)
Train Labels (25000,)


Finally, we can create our train, validation, and test splits.

In [30]:
# Separate input data
x_val = vectorized_train_data[:10000].T
x_train = vectorized_train_data[10000:].T
x_test = vectorized_test_data[:].T

# Separate labels 
y_val = vectorized_train_labels[:10000]
y_train = vectorized_train_labels[10000:]
y_test = vectorized_test_labels[:]

--- 
## Activation and Cost Functions 

### Activation Functions

In [14]:
class ActivationFunction:
    """An abstract class representing an activation function."""
    
    def __init__(self):
        pass 
    
    def forward(self, Z): 
        raise NotImplementedError
        
    def backward(self, dA, Z): 
        raise NotImplementedError
        

class Sigmoid(ActivationFunction): 
    """Implementation of the sigmoid activation function."""
    
    def forward(self, Z): 
        return 1.0 / (1.0 + np.exp(-Z))
        
    def backward(self, dA, Z): 
        s = 1 / (1+np.exp(-Z))
        dZ = dA * s * (1-s)
        return dZ
    

class Tanh(ActivationFunction): 
    """Implementation of the hypterbolic tangent activation function."""
    
    def forward(self, Z): 
        return np.tanh(Z)
        
    def backward(self, dA, Z): 
        t = tanh(Z)
        return dA * (1 - t*t)
    

class ReLU(ActivationFunction): 
    """Implementation of the ReLU function."""
    
    def forward(self, Z): 
        return np.maximum(Z, 0)
        
    def backward(self, dA, Z): 
        dZ = np.array(dA, copy=True)
        dZ[Z <= 0] = 0
        return dZ
    

class LeakyReLU(ActivationFunction): 
    """Implementation of the Leaky ReLU function."""
    
    def __init__(self, alpha): 
        super().__init__()
        self.alpha = alpha
    
    def forward(self, Z): 
        return np.maximum(Z, self.alpha*Z)
        
    def backward(self, dA, Z): 
        dZ = np.ones_like(Z)
        dZ[Z <= 0] = self.alpha
        return dA*dZ

### Cost Functions

In [33]:
class CostFunction:
    """An abstract class representing a cost function."""
    
    def __init__(self):
        pass 
    
    def cost(self, AL, Y):
        raise NotImplementedError
        
    def derivative(self, AL, Y):
        raise NotImplementedError
    

class CrossEntropy(CostFunction):
    """Implementation of the Cross Entropy Cost Function."""
    
    def cost(self, AL, Y):
        m = Y.shape[-1]
        eps = 1e-8
        cost = (-1/m) * np.sum(np.multiply(np.log(AL + eps), Y) + np.multiply(np.log((1-AL) + eps),(1-Y)))
        return np.squeeze(cost)
    
    def derivative(self, AL, Y):
        eps = 1e-8
        return - (np.divide(Y, AL + eps) - np.divide(1 - Y, 1 - AL + eps))
    

class MeanSquaredError(CostFunction):
    """Implementation of the Mean Squared Error Cost Function"""
    
    def cost(self, AL, Y):
        m = Y.shape[-1]
        cost = (1/(2*m)) * np.sum(np.square(Y - AL))
        return np.squeeze(cost)
    
    def derivative(self, AL, Y):
        return AL - Y

## Implementing our Neural Network

### Initialization 

### Forward Propagation 

### Backpropagation

### Putting It All Together

In [39]:
class Network:
    
    #
    # Initialization
    #
    
    def __init__(self, layer_sizes):
        """Initializes a new network with the given dimensions."""
        self.num_layers = len(layer_sizes)
        self.layer_sizes = layer_sizes
        self.cost_function = CrossEntropy()
        self.activation_functions = self.initialize_activation_functions(layer_sizes)
        self.parameters = self.initialize_parameters(layer_sizes) 
        
        
    def initialize_activation_functions(self, layer_sizes):
        """Returns an array of activation functions."""
        activations = [None]
        for _ in range(1, self.num_layers-1):
            activations.append(ReLU())
        activations.append(Sigmoid())
        return activations
            
        
    def initialize_parameters(self, layer_sizes):
        """Returns a new dictionary of weights and biases."""
        parameters = {}
        for l in range(1, self.num_layers):
            parameters['W' + str(l)] = np.random.randn(layer_sizes[l], layer_sizes[l-1]) * 0.01
            parameters['b' + str(l)] = np.zeros((layer_sizes[l], 1))
        return parameters
    
    
    #
    # Forward Propagation
    #
    
    def forward_propagation(self, X):
        """Compute the output for the given examples."""
        A = X 
        caches = [None]
        
        for layer in range(1, self.num_layers):
            A_prev = A
            W = self.parameters['W' + str(layer)]
            b = self.parameters['b' + str(layer)]
            activation = self.activation_functions[layer]
            A, cache = self.forward_propagation_step(W, A_prev, b, activation)
            caches.append(cache)
            
        return A, caches
    
    
    def forward_propagation_step(self, W, A_prev, b, activation):
        """Compute the output of a single layer."""
        Z = np.dot(W, A_prev) + b
        A = activation.forward(Z)
        cache = (W, A_prev, b, Z, A)
        return A, cache
    
    
    #
    # Backward Propagation 
    #
    
    def backward_propagation(self, AL, Y, caches): 
        """Compute the gradients based on the results of forward propagation."""
        gradients = {}
        m = AL.shape[1]
        Y = Y.reshape(AL.shape)
        eps = 1e-8
        
        dA = self.cost_function.derivative(AL, Y)
        
        for layer in reversed(range(1,self.num_layers)):
            cache = caches[layer]
            activation = self.activation_functions[layer]
            dA, dW, db = self.backward_propagation_step(dA, cache, activation)
            gradients["dA" + str(layer-1)] = dA 
            gradients["dW" + str(layer)] = dW 
            gradients["db" + str(layer)] = db
            
        return gradients 
    
    
    def backward_propagation_step(self, dA, cache, activation):
        """Compute the gradients of a single layer."""
        (W, A_prev, b, Z, A) = cache
        m = A_prev.shape[1]
        dZ = activation.backward(dA, Z)
        dW = (1/m) * np.dot(dZ, A_prev.T)
        db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
        dA_prev = np.dot(W.T, dZ)
        return dA_prev, dW, db
    
    
    #
    # Compute Cost 
    #
    
    def compute_cost(self, AL, Y):
        """Computes the overall cost using the network's cost function."""
        return self.cost_function.cost(AL, Y)

---
## Training the Neural Network

In [40]:
def gradient_descent_update(network, gradients, eta):
    parameters = network.parameters
    for i in range(1, network.num_layers):
        parameters["W" + str(i)] = network.parameters["W" + str(i)] - eta * gradients["dW" + str(i)]
        parameters["b" + str(i)] = network.parameters["b" + str(i)] - eta * gradients["db" + str(i)]
    return parameters

In [41]:
def minibatches(X, Y, batch_size):
    """Generates shuffled mini batches up to the given batch size."""
    permutation = np.random.permutation(X.shape[1])
    for k in range(0, X.shape[1], batch_size):
        batch_columns = permutation[k:k+batch_size]
        X_mini = X[:,batch_columns]
        Y_mini = Y[batch_columns]
        yield (X_mini, Y_mini)

In [42]:
def train(network, X, Y, epochs, learning_rate, batch_size=128, validation_data=None):
    """Trains a neural network using standard gradient descent."""
    for epoch in range(epochs):
        cost = 0
        for (mini_X, mini_Y) in minibatches(X, Y, batch_size):
            AL, caches = network.forward_propagation(mini_X)
            gradients = network.backward_propagation(AL, mini_Y, caches)
            cost += network.compute_cost(AL, mini_Y)
            parameters = gradient_descent_update(network, gradients, learning_rate)
        print("Training cost on epoch {}: {}".format(epoch+1, cost))
        if validation_data and (epoch+1) % 5 == 0:
            (val_X, val_y) = validation_data
            for (mini_val_X, mini_val_Y) in minibatches(val_X, val_Y, batch_size):
                AL, caches = network.forward_propagation(mini_val_X)
                cost += network.compute_cost(AL, mini_Y)
            print("Validation cost on epoch {}: {}".format(epoch+1, cost))

In [None]:
network = Network([10000, 32, 32, 1])
train(network, x_train, y_train, 50, 0.3)

Training cost on epoch 1: 81.81183973260202
Training cost on epoch 2: 81.80225538315646
Training cost on epoch 3: 81.7751012294881
Training cost on epoch 4: 77.46582880480942
Training cost on epoch 5: 55.51079593481038
Training cost on epoch 6: 40.635461493206876
Training cost on epoch 7: 36.007702988856714
Training cost on epoch 8: 31.694659472161195
Training cost on epoch 9: 29.182836233801
Training cost on epoch 10: 26.517782052437877
Training cost on epoch 11: 24.978710591139063
Training cost on epoch 12: 28.43087660277484
Training cost on epoch 13: 22.14125657947487
Training cost on epoch 14: 24.96093224206402
Training cost on epoch 15: 17.65338243271313
Training cost on epoch 16: 30.16677157439399
Training cost on epoch 17: 20.70157852340505


In [31]:
x_train.shape

(10000, 15000)