# Buzz Words but What do They Mean?

![NNs are a small subset of ai, ml, and deep learning](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/small_subset_of_ai.jpg?raw=true)

### Simple Neural Network

![simple neural network architecture](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/basic_neural_network.jpeg?raw=true)

Neural networks are really a tiny subset of a bunch of different, larger categories of problem-solving techniques.

![basic neuron in a neural network](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/basic_neuron_with_bias.jpg?raw=true)

The most basic unit of a neural network is a single neuron.

### What are the Parts of a Neuron?

1. inputs
2. weight & bias
3. sum (can be expressed as the dot product)
4. activation function 
5. output

### The basic Neuron class

In [None]:
class BasicNeuron:
    def __init__(self):
        self.weights = None
        self.bias = None
        
    def calc_neuron_output(self, inputs):
        sum = 0
        for inpt, weight in zip(inputs, self.weights):
            sum += inpt * weight
        sum += self.bias
        return sum 
    
        ###############
        ## IMPORTANT ##
        ###############
        # there is a much better way to do this calculation, which we will talk about. This is just for instructional purposes

Our basic Neuron class contains a few methods and a few attributes. 

It has a constructor which allows us to create Neurons.

It has a calc_neuron_output function (function and method mean the same thing) which takes in some inputs and performs the following calculation: 

\begin{equation*}
y_j = b_j +  \sum_{i} x_iw_{ij}
\end{equation*}

Which means the output of neuron "y sub j" is the sum of all the neuron's inputs times their respective weights plus a bias.

In [None]:
n = BasicNeuron()
n.weights = [0.1, 0.2, 0.3, 0.4]
n.bias = 1
inpts = [1, 1, 1, 1]

n.calc_neuron_output(inpts)

It looks like our basic Neuron class works!

### Putting a layer together

A "layer" is composed of many neurons, each with their own weights and biases.

The benefit of thinking of our neural network in terms of layers is it simplifies a lot of the calculations we must do. 

### The dot product

![types of activation functions](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/dot_product_representation.png?raw=true)

using the dot product instead of calculting the output for each and every neuron makes things so much easier and more efficient. 

Actually, we don't even need our BasicNeuron class if we use the dot product instead of calculating the output of each neuron one-by-one. All we need to consider is the ENTIRE layer and all of the weights and biases that belong to it.

Lets make things simpler by making a layer class.

In [None]:
import numpy as np
import os
import codecs
np.random.seed(2)

In [None]:
class Layer:
    def __init__(self, number_input_neurons = 0, number_output_neurons = 0, weights = np.array([]), biases = np.array([])):
        self.number_input_neurons = number_input_neurons
        self.number_output_neurons = number_output_neurons
        self.weights = weights
        self.biases = biases
        self.input = None
        self.output = None
        
    def initalize_random_weights(self):
        self.weights = np.random.rand(self.number_output_neurons, self.number_input_neurons) - 0.5
    
    def initalize_random_biases(self):
        self.biases = np.random.rand(self.number_output_neurons, 1) - 0.5
        
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = np.dot(self.weights, self.input) + self.biases
        return self.output
    
    # We'll explain this in a bit! For now know that this is important for later.
    def backward_propagation(self, output_error, learning_rate):
        input_error = np.dot(self.weights.T, output_error)
        weights_error = np.dot(output_error, self.input.T)

        # update parameters
        self.weights -= learning_rate * weights_error
        self.biases -= learning_rate * output_error
        return input_error

In [None]:
layer = Layer(2, 2)
layer.initalize_random_weights()
layer.initalize_random_biases()
print("weights:\n{}".format(layer.weights))
print("\n##########\n")
print("baises:\n{}".format(layer.biases))

In [None]:
inputs = np.array([[0,0],
                   [0,1],
                   [1,0],
                   [1,1]])

expected_outputs = np.array([[0], 
                             [1], 
                             [1], 
                             [0]])

inputs = np.reshape(inputs, (4,-1,1))
 # rotates the entire array. We need to rotate it because we wanted the input to our NN make visual sense.
 # In most cases, you would not have to do this, but it might aid in understanding to see it this way
     #[[[0],
     # [0]],
     #
     #[[0],
     #[1]],
     #
     #[[1],
     #[0]],
     #
     #[[1],
     #[1]]]

for inpt in inputs:
    print(layer.forward_propagation(inpt))
    print("----")

### Types of Activation Functions

![a few activation functions](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/types_of_activation_functions.jpg?raw=true)

### The Sigmoid Activation Function
This is what we'll be using for this simple example:

![the sigmoid function](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/sigmoid.png?raw=true)

In [None]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

# we'll need the derivative of this function later!
def sigmoid_derivative(z):
    return sigmoid(z) * (1 - sigmoid(z))
    

In [None]:
# numpy allows us to apply a function to every value in the array
activation = sigmoid(layer.output)

print(layer.output)
print("\n#######\n")
print("after applying the sigmoid function to every value in the array:\n\n{}".format(activation))


You will often see these activation functions being included in their own layers called the "activation layer."

Lets make an activation layer class.

In [None]:
class ActivationLayer(Layer):
    def __init__(self):
        self.output = None
        self.input = None
        
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = sigmoid(self.input)
        return self.output
    
    # Stay with us with all this backward propagation stuff. It'll make more sense in a moment.
    def backward_propagation(self, output_error, learning_rate):
        return sigmoid_derivative(self.input) * output_error

In [None]:
activation = ActivationLayer()
activation.forward_propagation(layer.output)

We can add another layer to our network with two inputs and one output


In [None]:
layer_2 = Layer(2, 1)
layer_2.initalize_random_weights()
layer_2.initalize_random_biases()
print("weights:\n{}".format(layer_2.weights))
print("\n##########\n")
print("baises:\n{}".format(layer_2.biases))

In [None]:
layer_2.forward_propagation(activation.output)

Works just about how you would expect it to!

We've created a basic Layer class which takes some inputs, has some weights and biases, and helps us "train" our neural network. We've also made an activation layer class that applies an activation function to all the data we pass into it and will also help us "train" our NN.

You might be wondering what it means to "train" a NN, and that is a very important question. 

A neural network works because we can use a little bit of calculus to adjust the weights and biases in a smart way that allows us to make our predictions more accurate. This process of adjusting the weights and biases is what it means to "train" a NN. It is ok if you are not super comfortable with the calculus, it is not super important to understand fully right now, but it might give you a better intuition into exactly what makes neural networks work. We'll keep things as simple as we can so let's dive into it.

### Measuring Error

The first step to adjusting our weights and biases is understanding how off our prediction was. After we understand how off our prediction was, we can start talking about how we can go about making our neural network perform better.

### Loss Functions (Cost Functions)

We can measure the error in our prediction using what is called a loss function (also known as a cost function). We will be using the Mean Squared Error function for this example
![MSE](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/mean_squared_error.png?raw=true)

In [None]:
# we take the two arrays and calculate the mse.
def mean_squared_error(y_true, y_pred):
    return np.mean(np.power(y_true-y_pred, 2));

# we'll need this later too!
def mean_squared_error_derivative(y_true, y_pred):
    return 2*(y_pred-y_true)/y_true.size;

This gives us a measure of how off our predictions are. 

### Back Propagation

The goal of Back Propogation (backprop) is pretty simple: adjust the weights in the network in proportion to how much each neuron contributes to the overall error. Through optimization with gradient descent (which will be explained later), the weights will be iteratively changed so that the loss is minimized. 

Since forward propogation is plugging into a formula that is effectively y=mx+b but nested, back propogation is reversing that process using calculus to determine the derivative w.r.t a variable. 

Given a forward prop:

f(x) = A(B(C(x)))

with A, B, and C being the activation functions, we can find neuron B's impact on the overall error by finding df/dB, or f'(B), or the derivative of f(x) w.r.t B.

### Gradient Descent

Gradient Descent is an optimization method to minimize the loss after each cycle of backprop. If you have taken calculus before, you know that the derivative of the function will point towards the local minima. We compute the derivative with a small step size, or small run, to find the weights for which the loss is the lowest. 

![gd_1](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/gradient_descent.png?raw=true)

![gd_2](https://github.com/epochml/epoch_intersession_2020-2021/blob/master/images/what_is_a_neural_network/gradient_descent_demystified.png?raw=true)

Now we have all the parts of our NN, We can put them all together in a nice little class to make things easy.

In [None]:
class NeuralNetwork:
    def __init__(self):
        self.layers = []

        
        
    def add(self, layer):
        self.layers.append(layer)
        
    def predict(self, input_data):
        
        result = []
        
        for data in input_data:
            output = data
            for layer in self.layers:
                output = layer.forward_propagation(output)
            result.append(output)
            
        return result
    
    def fit(self, x_train, y_train, no_epochs, learning_rate):
        for i in range(no_epochs):
            error = 0
            for counter, j in enumerate(x_train):
                output = j
                for layer in self.layers:
                    output = layer.forward_propagation(output)
                
                error += mean_squared_error(y_train[counter], output)
                
                # backprop
                
                err = mean_squared_error_derivative(y_train[counter], output)
                for layer in reversed(self.layers):
                    err = layer.backward_propagation(err, learning_rate)
                    
            error /= len(x_train)
            print("epoch {}  error = {}".format(i, error))
    

In [None]:
net = NeuralNetwork()
first_layer = Layer(2,2)
first_layer.initalize_random_weights()
first_layer.initalize_random_biases()
second_layer = Layer(2,1)
second_layer.initalize_random_weights()
second_layer.initalize_random_biases()

print(first_layer.forward_propagation(np.array([[0],[0]])))

net.add(first_layer)
net.add(ActivationLayer())
net.add(second_layer)
net.add(ActivationLayer())

net.fit(inputs, expected_outputs, 1000, 1)

In [None]:
predictions = net.predict(inputs)
for inpt, expected, actual in zip(inputs, expected_outputs, predictions):
    print("input: {}, expected: {}, result: {}".format([int(inpt[0]), int(inpt[1])], expected, actual))


It works! Our Hand-coded neural network has correctly learned how to classify our inputs into the desired outputs! Maybe this doesn't seem like a huge deal, but lets scale up the project a bit.

### The MNIST Dataset

The MNIST dataset is very famous dataset that contains a bunch of labled hand-written digits. Lets use the same classes we created to help us classify these digits.

The MNIST dataset in included in our "data" folder, but it can also be downloaded from [this link](http://yann.lecun.com/exdb/mnist/)

All the image files and labels of the MNIST dataset is encoded into these 4 files. We need to be able to extract the images from the files to work with them.

### File descriptions
Four files are provided (the .gz files are zipped versions of these files):

* Test Images : t10k-images-idx3-ubyte
* Test Labels :  t10k-labels-idx1-ubyte
* Train Images : train-images-idx3-ubyte
* Train Labels :  train-labels-idx1-ubyte

The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.

#### The basic format for labels
  
|Offset | Type               | Value           |   Description                   |
|-------|--------------------|-----------------|---------------------------------|
|0000   |4 byte integer      |0x00000801(2049) |magic number (MSB first)         |
|0004   |4 byte integer      |10000 or 60000   |number of items (test or train)  |
|0008   |unsigned byte       |??               |label                            |
|0009   |unsigned byte       |??               |label                            |
|...    |...                 |...              |...                              |
|xxxx   |unsigned byte       |??               |label                            |


#### The basic format for images

|Offset | Type               | Value           |   Description                   |
|-------|--------------------|-----------------|---------------------------------|
|0000   |4 byte integer      |0x00000801(2051) |magic number (MSB first)         |
|0004   |4 byte integer      |10000 or 60000   |number of images (test or train) |
|0008   |4 byte integer      |28               |number of rows                   |
|0012   |4 byte integer      |28               |number of columns                |
|0016   |unsigned byte       |??               |pixel intensity (0-255)          |
|0017   |unsigned byte       |??               |pixel intensity (0-255)          |
|...    |...                 |...              |...                              |
|xxxx   |unsigned byte       |??               |pixel intensity (0-255)          |


### Converting the ubyte files to numpy arrays for easy processing
The following code converts the ubyte files into four numpy n dimensional arrays and stores them in a dictionary called `data_dict` which has four key, value pairs.

| Key           |  Type        |Shape         |
|---------------|--------------|--------------|
|*train_images* |numpy ndarray |[60000,28,28] |
|*train_labels* |numpy ndarray |[60000]       |
|*test_images*  |numpy ndarray |[10000,28,28] |
|*test_labels*  |numpy ndarray |[10000]       |


In [None]:
from google.colab import files # this only works using google colabs
files = files.upload()

# PROVIDE YOUR DIRECTORY WITH THE EXTRACTED FILES HERE
# datapath = '../data/MNIST/raw/uncompressed/'

# files = os.listdir(datapath)
# print(files)

def get_int(b):   # CONVERTS 4 BYTES TO A INT
    return int(codecs.encode(b, 'hex'), 16)

data_dict = {}
for file in files:
    if file.endswith('ubyte'):  # FOR ALL 'ubyte' FILES
        print('Reading ',file)
        # with open (datapath+file,'rb') as f:
        with open (file,'rb') as f:
            data = f.read()
            type = get_int(data[:4])   # 0-3: THE MAGIC NUMBER TO WHETHER IMAGE OR LABEL
            length = get_int(data[4:8])  # 4-7: LENGTH OF THE ARRAY  (DIMENSION 0)
            if (type == 2051):
                category = 'images'
                num_rows = get_int(data[8:12])  # NUMBER OF ROWS  (DIMENSION 1)
                num_cols = get_int(data[12:16])  # NUMBER OF COLUMNS  (DIMENSION 2)
                parsed = np.frombuffer(data,dtype = np.uint8, offset = 16)  # READ THE PIXEL VALUES AS INTEGERS
                parsed = parsed.reshape(length,num_rows,num_cols)  # RESHAPE THE ARRAY AS [NO_OF_SAMPLES x HEIGHT x WIDTH]           
            elif(type == 2049):
                category = 'labels'
                parsed = np.frombuffer(data, dtype=np.uint8, offset=8) # READ THE LABEL VALUES AS INTEGERS
                parsed = parsed.reshape(length)  # RESHAPE THE ARRAY AS [NO_OF_SAMPLES]                           
            if (length==10000):
                set = 'test'
            elif (length==60000):
                set = 'train'
            data_dict[set+'_'+category] = parsed  # SAVE THE NUMPY ARRAY TO A CORRESPONDING KEY   

In [None]:
print(data_dict["train_images"].shape)
print(data_dict["train_labels"].shape)
print(data_dict["test_images"].shape)
print(data_dict["test_labels"].shape)

Lets make a NN that'll help us classify these digits

In [None]:
mnist_net = NeuralNetwork()

input_layer = Layer(28*28, 1000)
input_layer.initalize_random_weights()
input_layer.initalize_random_biases()
mnist_net.add(input_layer)                

mnist_net.add(ActivationLayer())

first_hidden_layer = Layer(1000, 50)
first_hidden_layer.initalize_random_weights()
first_hidden_layer.initalize_random_biases()
mnist_net.add(first_hidden_layer)   

mnist_net.add(ActivationLayer())

second_hidden_layer = Layer(50, 10)
second_hidden_layer.initalize_random_weights()
second_hidden_layer.initalize_random_biases()
mnist_net.add(second_hidden_layer)   

mnist_net.add(ActivationLayer())

# 784 -> 100 -> activation -> 50 -> activation -> 10 -> activation

reshaped_train_image_data = np.reshape(data_dict["train_images"], (60000, 28*28, 1))
reshaped_train_label_data = np.reshape(data_dict["train_labels"], (60000, -1, 1))

In [None]:
# this is a little complex, but in order to change our labels from a single value like 1 or 2 to an array
# like [0,0,1,0,0,0,0,0,0,0], we need to create a "one-hot" vector/array.
def one_hot(classes, data):
    vector = np.zeros(classes)
    vector[int(data)] = 1
    vector = np.reshape(vector, (-1,1))
    return vector
    
reshaped_train_label_data_one_hot = []
for data in reshaped_train_label_data:
    reshaped_train_label_data_one_hot.append(one_hot(10, data))
    

reshaped_train_label_data = np.array(reshaped_train_label_data_one_hot)

shuffler = np.random.permutation(len(reshaped_train_label_data))
reshaped_train_image_data = reshaped_train_image_data[shuffler]
reshaped_train_label_data = reshaped_train_label_data[shuffler]


In [None]:
# we divide by 255 to normalize the data i.e. making all the data between 0 and 1 instead of between 0 and 255
mnist_net.fit((reshaped_train_image_data / 255)[:1000], (reshaped_train_label_data)[:1000], 50, .1)

In [None]:
reshaped_test_image_data = np.reshape(data_dict["test_images"], (10000, 28*28, 1))
reshaped_test_label_data = np.reshape(data_dict["test_labels"], (10000, -1, 1))

In [None]:
reshaped_test_label_data_one_hot = []
for data in reshaped_test_label_data:
    reshaped_test_label_data_one_hot.append(one_hot(10, data))
    

reshaped_test_label_data = np.array(reshaped_test_label_data_one_hot)

shuffler = np.random.permutation(len(reshaped_train_label_data))
reshaped_test_image_data = reshaped_train_image_data[shuffler]
reshaped_test_label_data = reshaped_train_label_data[shuffler]

In [None]:
outputs = mnist_net.predict(reshaped_test_image_data)
wrong = 0
for count, output in enumerate(outputs):
    index = np.argmax(output)
    true_index = np.argmax(reshaped_test_label_data[count])
    print(index, true_index)
    if index != true_index:
        wrong += 1
        
print("percent wrong: {}%".format(100 * wrong / len(reshaped_test_image_data)))

And we've done it. There are ways to get much higher accuracies, but we've proved that our code does what we want it to do. It can, with relatively ok accuracy, classifiy hand-written digits! Tomorrow we'll talk about another very common technique that can do this task with orders of magnitude of more accuracy.