# Buzz Words but What do They Mean?

![NNs are a small subset of ai, ml, and deep learning](images\what_is_a_neural_network\small_subset_of_ai.jpg)

### Simple Neural Network

![simple neural network architecture](images\what_is_a_neural_network\basic_neural_network.jpeg)

Neural networks are really a tiny subset of a bunch of different, larger categories of problem-solving techniques.

![basic neuron in a neural network](images\what_is_a_neural_network\basic_neuron_with_bias.jpg)

The most basic unit of a neural network is a single neuron.

### What are the Parts of a Neuron?

1. inputs
2. weight & bias
3. sum (can be expressed as the dot product)
4. activation function 
5. output

### The basic Neuron class

In [336]:
class BasicNeuron:
    def __init__(self):
        self.weights = None
        self.bias = None
        
    def calc_neuron_output(self, inputs):
        sum = 0
        for inpt, weight in zip(inputs, self.weights):
            sum += inpt * weight
        sum += self.bias
        return sum 
    
        ###############
        ## IMPORTANT ##
        ###############
        # there is a much better way to do this calculation, which we will talk about. This is just for instructional purposes

Our basic Neuron class contains a few methods and a few attributes. 

It has a constructor which allows us to create Neurons.

It has a calc_neuron_output function (function and method mean the same thing) which takes in some inputs and performs the following calculation: 

\begin{equation*}
y_j = b_j +  \sum_{i} x_iw_{ij}
\end{equation*}

Which means the output of neuron "y sub j" is the sum of all the neuron's inputs times their respective weights plus a bias.

In [337]:
n = BasicNeuron()
n.weights = [0.1, 0.2, 0.3, 0.4]
n.bias = 1
inpts = [1, 1, 1, 1]

n.calc_neuron_output(inpts)

2.0

It looks like our basic Neuron class works!

### Putting a layer together

A "layer" is composed of many neurons, each with their own weights and biases.

The benefit of thinking of our neural network in terms of layers is it simplifies a lot of the calculations we must do. 

### The dot product

![types of activation functions](images\what_is_a_neural_network\dot_product_representation.png)

using the dot product instead of calculting the output for each and every neuron makes things so much easier and more efficient. 

Actually, we don't even need our BasicNeuron class if we use the dot product instead of calculating the output of each neuron one-by-one. All we need to consider is the ENTIRE layer and all of the weights and biases that belong to it.

Lets make things simpler by making a layer class.

In [338]:
import numpy as np
import os
import codecs
np.random.seed(2)

In [339]:
class Layer:
    def __init__(self, number_input_neurons = 0, number_output_neurons = 0, weights = np.array([]), biases = np.array([])):
        self.number_input_neurons = number_input_neurons
        self.number_output_neurons = number_output_neurons
        self.weights = weights
        self.biases = biases
        self.input = None
        self.output = None
        
    def initalize_random_weights(self):
        self.weights = np.random.rand(self.number_output_neurons, self.number_input_neurons)
    
    def initalize_random_biases(self):
        self.biases = np.random.rand(self.number_output_neurons, 1)
        
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = np.dot(self.weights, self.input) + self.biases
        return self.output
    
    # We'll explain this in a bit! For now know that this is important for later.
    def backward_propagation(self, output_error, learning_rate):
        input_error = np.dot(self.weights.T, output_error)
        weights_error = np.dot(output_error, self.input.T)

        # update parameters
        self.weights -= learning_rate * weights_error
        self.biases -= learning_rate * output_error
        return input_error

In [340]:
layer = Layer(2, 2)
layer.initalize_random_weights()
layer.initalize_random_biases()
print("weights:\n{}".format(layer.weights))
print("\n##########\n")
print("baises:\n{}".format(layer.biases))

weights:
[[0.4359949  0.02592623]
 [0.54966248 0.43532239]]

##########

baises:
[[0.4203678 ]
 [0.33033482]]


In [341]:
inputs = np.array([[0,0],
                   [0,1],
                   [1,0],
                   [1,1]])

expected_outputs = np.array([[0], 
                             [1], 
                             [1], 
                             [0]])

inputs = np.reshape(inputs, (4,-1,1))
 # rotates the entire array. We need to rotate it because we wanted the input to our NN make visual sense.
 # In most cases, you would not have to do this, but it might aid in understanding to see it this way
     #[[[0],
     # [0]],
     #
     #[[0],
     #[1]],
     #
     #[[1],
     #[0]],
     #
     #[[1],
     #[1]]]

for inpt in inputs:
    print(layer.forward_propagation(inpt))
    print("----")

[[0.4203678 ]
 [0.33033482]]
----
[[0.44629403]
 [0.76565721]]
----
[[0.8563627]
 [0.8799973]]
----
[[0.88228894]
 [1.31531969]]
----


### Types of Activation Functions

![types of activation functions](images\what_is_a_neural_network\types_of_activation_functions.jpg)

### The Sigmoid Activation Function
This is what we'll be using for this simple example:

![the sigmoid function](images\what_is_a_neural_network\sigmoid.png)

In [342]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

# we'll need the derivative of this function later!
def sigmoid_derivative(z):
    return sigmoid(z) * (1 - sigmoid(z))
    

In [343]:
# numpy allows us to apply a function to every value in the array
activation = sigmoid(layer.output)

print(layer.output)
print("\n#######\n")
print("after applying the sigmoid function to every value in the array:\n\n{}".format(activation))


[[0.88228894]
 [1.31531969]]

#######

after applying the sigmoid function to every value in the array:

[[0.70729632]
 [0.78840197]]


You will often see these activation functions being included in their own layers called the "activation layer."

Lets make an activation layer class.

In [344]:
class ActivationLayer(Layer):
    def __init__(self):
        self.output = None
        self.input = None
        
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = sigmoid(self.input)
        return self.output
    
    # Stay with us with all this backward propagation stuff. It'll make more sense in a moment.
    def backward_propagation(self, output_error, learning_rate):
        return sigmoid_derivative(self.input) * output_error

In [345]:
activation = ActivationLayer()
activation.forward_propagation(layer.output)

array([[0.70729632],
       [0.78840197]])

We can add another layer to our network with two inputs and one output


In [346]:
layer_2 = Layer(2, 1)
layer_2.initalize_random_weights()
layer_2.initalize_random_biases()
print("weights:\n{}".format(layer_2.weights))
print("\n##########\n")
print("baises:\n{}".format(layer_2.biases))

weights:
[[0.20464863 0.61927097]]

##########

baises:
[[0.29965467]]


In [347]:
layer_2.forward_propagation(activation.output)

array([[0.93263635]])

Works just about how you would expect it to!

We've created a basic Layer class which takes some inputs, has some weights and biases, and helps us "train" our neural network. We've also made an activation layer class that applies an activation function to all the data we pass into it and will also help us "train" our NN.

You might be wondering what it means to "train" a NN, and that is a very important question. 

A neural network works because we can use a little bit of calculus to adjust the weights and biases in a smart way that allows us to make our predictions more accurate. This process of adjusting the weights and biases is what it means to "train" a NN. It is ok if you are not super comfortable with the calculus, it is not super important to understand fully right now, but it might give you a better intuition into exactly what makes neural networks work. We'll keep things as simple as we can so let's dive into it.

### Measuring Error

The first step to adjusting our weights and biases is understanding how off our prediction was. After we understand how off our prediction was, we can start talking about how we can go about making our neural network perform better.

### Loss Functions (Cost Functions)

We can measure the error in our prediction using what is called a loss function (also known as a cost function). We will be using the Mean Squared Error function for this example
![MSE](images\what_is_a_neural_network\mean_squared_error.png)

In [357]:
# we take the two arrays and calculate the mse.
def mean_squared_error(y_true, y_pred):
    return np.mean(np.power(y_true-y_pred, 2));

# we'll need this later too!
def mean_squared_error_derivative(y_true, y_pred):
    return 2*(y_pred-y_true)/y_true.size;

This gives us a measure of how off our predictions are. 

### Back Propagation

 **Explain back prop + gradient descent**

 what it is in 2d, 3d, and how to think about it in higher-dimension space

Now we have all the parts of our NN! We can put them all together in a nice little class to make things easy.

In [349]:
class NeuralNetwork:
    def __init__(self):
        self.layers = []

        
        
    def add(self, layer):
        self.layers.append(layer)
        
    def predict(self, input_data):
        
        result = []
        
        for data in input_data:
            output = data
            for layer in self.layers:
                output = layer.forward_propagation(output)
            result.append(output)
            
        return result
    
    def fit(self, x_train, y_train, no_epochs, learning_rate):
        for i in range(no_epochs):
            error = 0
            for counter, j in enumerate(x_train):
                output = j
                for layer in self.layers:
                    output = layer.forward_propagation(output)
                
                error += mean_squared_error(y_train[counter], output)
                
                # backprop
                
                err = mean_squared_error_derivative(y_train[counter], output)
                for layer in reversed(self.layers):
                    err = layer.backward_propagation(err, learning_rate)
                    
            error /= len(x_train)
            print("epoch {}  error = {}".format(i, error))
    

In [350]:
net = NeuralNetwork()
first_layer = Layer(2,2)
first_layer.initalize_random_weights()
first_layer.initalize_random_biases()
second_layer = Layer(2,1)
second_layer.initalize_random_weights()
second_layer.initalize_random_biases()

print(first_layer.forward_propagation(np.array([[0],[0]])))

net.add(first_layer)
net.add(ActivationLayer())
net.add(second_layer)
net.add(ActivationLayer())

net.fit(inputs, expected_outputs, 1000, 1)

[[0.51357812]
 [0.18443987]]
epoch 0  error = 0.3619846065449515
epoch 1  error = 0.3303944767799727
epoch 2  error = 0.3171451337041745
epoch 3  error = 0.3128223147569227
epoch 4  error = 0.3112581260445585
epoch 5  error = 0.31042065411551256
epoch 6  error = 0.30975261295354267
epoch 7  error = 0.3091155221223979
epoch 8  error = 0.30847896007551767
epoch 9  error = 0.30783853706625364
epoch 10  error = 0.30719547108757245
epoch 11  error = 0.30655184757082055
epoch 12  error = 0.3059096718467903
epoch 13  error = 0.3052707561459219
epoch 14  error = 0.30463674979403493
epoch 15  error = 0.30400917247024484
epoch 16  error = 0.3033894331501373
epoch 17  error = 0.30277883892009294
epoch 18  error = 0.30217859846983586
epoch 19  error = 0.3015898229591944
epoch 20  error = 0.301013525465779
epoch 21  error = 0.3004506195229071
epoch 22  error = 0.2999019170405745
epoch 23  error = 0.2993681259060479
epoch 24  error = 0.2988498476280575
epoch 25  error = 0.29834757543888174
epoch 26 

epoch 272  error = 0.24765088114572953
epoch 273  error = 0.2464657940157132
epoch 274  error = 0.24526870377784749
epoch 275  error = 0.24406186919041967
epoch 276  error = 0.2428476219041365
epoch 277  error = 0.24162834226643595
epoch 278  error = 0.24040643425115804
epoch 279  error = 0.23918430012604786
epoch 280  error = 0.23796431547201286
epoch 281  error = 0.2367488051435352
epoch 282  error = 0.2355400207112568
epoch 283  error = 0.23434011985900907
epoch 284  error = 0.23315114812308335
epoch 285  error = 0.2319750232665633
epoch 286  error = 0.2308135224815967
epoch 287  error = 0.22966827251286975
epoch 288  error = 0.2285407427010599
epoch 289  error = 0.2274322408596575
epoch 290  error = 0.22634391182525565
epoch 291  error = 0.2252767384621629
epoch 292  error = 0.22423154485788976
epoch 293  error = 0.22320900141667582
epoch 294  error = 0.2222096315429195
epoch 295  error = 0.22123381960371843
epoch 296  error = 0.22028181986787293
epoch 297  error = 0.21935376613556

epoch 558  error = 0.016012453740705083
epoch 559  error = 0.015736682633237298
epoch 560  error = 0.015469327523015717
epoch 561  error = 0.015210033999995345
epoch 562  error = 0.014958466131021093
epoch 563  error = 0.014714305343377535
epoch 564  error = 0.014477249382580586
epoch 565  error = 0.014247011339271889
epoch 566  error = 0.01402331874041369
epoch 567  error = 0.013805912700305012
epoch 568  error = 0.0135945471272444
epoch 569  error = 0.013388987981954783
epoch 570  error = 0.01318901258415623
epoch 571  error = 0.01299440896392937
epoch 572  error = 0.01280497525474952
epoch 573  error = 0.012620519125295911
epoch 574  error = 0.012440857247348051
epoch 575  error = 0.012265814797274992
epoch 576  error = 0.012095224988803756
epoch 577  error = 0.011928928634920146
epoch 578  error = 0.01176677373691137
epoch 579  error = 0.01160861509870251
epoch 580  error = 0.011454313964774053
epoch 581  error = 0.011303737680070276
epoch 582  error = 0.011156759370423455
epoch 58

epoch 826  error = 0.002445553361059408
epoch 827  error = 0.00243729592395864
epoch 828  error = 0.0024290918148812076
epoch 829  error = 0.002420940528792512
epoch 830  error = 0.002412841566939361
epoch 831  error = 0.0024047944367534724
epoch 832  error = 0.002396798651756676
epoch 833  error = 0.0023888537314678744
epoch 834  error = 0.0023809592013116954
epoch 835  error = 0.0023731145925287626
epoch 836  error = 0.002365319442087683
epoch 837  error = 0.0023575732925984446
epoch 838  error = 0.002349875692227582
epoch 839  error = 0.002342226194614669
epoch 840  error = 0.0023346243587904257
epoch 841  error = 0.0023270697490962444
epoch 842  error = 0.0023195619351051248
epoch 843  error = 0.0023121004915440412
epoch 844  error = 0.0023046849982176383
epoch 845  error = 0.0022973150399333165
epoch 846  error = 0.002289990206427608
epoch 847  error = 0.002282710092293851
epoch 848  error = 0.0022754742969110734
epoch 849  error = 0.0022682824243742245
epoch 850  error = 0.002261

In [351]:
predictions = net.predict(inputs)
for inpt, expected, actual in zip(inputs, expected_outputs, predictions):
    print("input: {}, expected: {}, result: {}".format([int(inpt[0]), int(inpt[1])], expected, actual))


input: [0, 0], expected: [0], result: [[0.03328136]]
input: [0, 1], expected: [1], result: [[0.96342524]]
input: [1, 0], expected: [1], result: [[0.96248899]]
input: [1, 1], expected: [0], result: [[0.04699934]]


It works! Our Hand-coded neural network has correctly learned how to classify our inputs into the desired outputs! Maybe this doesn't seem like a huge deal, but lets scale up the project a bit.

### The MNIST Dataset

The MNIST dataset is very famous dataset that contains a bunch of labled hand-written digits. Lets use the same classes we created to help us classify these digits.

The MNIST dataset in included in our "data" folder, but it can also be downloaded from [this link](http://yann.lecun.com/exdb/mnist/)

All the image files and labels of the MNIST dataset is encoded into these 4 files. We need to be able to extract the images from the files to work with them.

### File descriptions
Four files are provided (the .gz files are zipped versions of these files):

* Test Images : t10k-images-idx3-ubyte
* Test Labels :  t10k-labels-idx1-ubyte
* Train Images : train-images-idx3-ubyte
* Train Labels :  train-labels-idx1-ubyte

The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.

#### The basic format for labels
  
|Offset | Type               | Value           |   Description                   |
|-------|--------------------|-----------------|---------------------------------|
|0000   |4 byte integer      |0x00000801(2049) |magic number (MSB first)         |
|0004   |4 byte integer      |10000 or 60000   |number of items (test or train)  |
|0008   |unsigned byte       |??               |label                            |
|0009   |unsigned byte       |??               |label                            |
|...    |...                 |...              |...                              |
|xxxx   |unsigned byte       |??               |label                            |


#### The basic format for images

|Offset | Type               | Value           |   Description                   |
|-------|--------------------|-----------------|---------------------------------|
|0000   |4 byte integer      |0x00000801(2051) |magic number (MSB first)         |
|0004   |4 byte integer      |10000 or 60000   |number of images (test or train) |
|0008   |4 byte integer      |28               |number of rows                   |
|0012   |4 byte integer      |28               |number of columns                |
|0016   |unsigned byte       |??               |pixel intensity (0-255)          |
|0017   |unsigned byte       |??               |pixel intensity (0-255)          |
|...    |...                 |...              |...                              |
|xxxx   |unsigned byte       |??               |pixel intensity (0-255)          |


### Converting the ubyte files to numpy arrays for easy processing
The following code converts the ubyte files into four numpy n dimensional arrays and stores them in a dictionary called `data_dict` which has four key, value pairs.

| Key           |  Type        |Shape         |
|---------------|--------------|--------------|
|*train_images* |numpy ndarray |[60000,28,28] |
|*train_labels* |numpy ndarray |[60000]       |
|*test_images*  |numpy ndarray |[10000,28,28] |
|*test_labels*  |numpy ndarray |[10000]       |


In [352]:
# PROVIDE YOUR DIRECTORY WITH THE EXTRACTED FILES HERE
datapath = 'data/MNIST/raw/uncompressed/'

files = os.listdir(datapath)
print(files)

def get_int(b):   # CONVERTS 4 BYTES TO A INT
    return int(codecs.encode(b, 'hex'), 16)

data_dict = {}
for file in files:
    if file.endswith('ubyte'):  # FOR ALL 'ubyte' FILES
        print('Reading ',file)
        with open (datapath+file,'rb') as f:
            data = f.read()
            type = get_int(data[:4])   # 0-3: THE MAGIC NUMBER TO WHETHER IMAGE OR LABEL
            length = get_int(data[4:8])  # 4-7: LENGTH OF THE ARRAY  (DIMENSION 0)
            if (type == 2051):
                category = 'images'
                num_rows = get_int(data[8:12])  # NUMBER OF ROWS  (DIMENSION 1)
                num_cols = get_int(data[12:16])  # NUMBER OF COLUMNS  (DIMENSION 2)
                parsed = np.frombuffer(data,dtype = np.uint8, offset = 16)  # READ THE PIXEL VALUES AS INTEGERS
                parsed = parsed.reshape(length,num_rows,num_cols)  # RESHAPE THE ARRAY AS [NO_OF_SAMPLES x HEIGHT x WIDTH]           
            elif(type == 2049):
                category = 'labels'
                parsed = np.frombuffer(data, dtype=np.uint8, offset=8) # READ THE LABEL VALUES AS INTEGERS
                parsed = parsed.reshape(length)  # RESHAPE THE ARRAY AS [NO_OF_SAMPLES]                           
            if (length==10000):
                set = 'test'
            elif (length==60000):
                set = 'train'
            data_dict[set+'_'+category] = parsed  # SAVE THE NUMPY ARRAY TO A CORRESPONDING KEY   

['t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte', 'train-images-idx3-ubyte', 'train-labels-idx1-ubyte']
Reading  t10k-images-idx3-ubyte
Reading  t10k-labels-idx1-ubyte
Reading  train-images-idx3-ubyte
Reading  train-labels-idx1-ubyte


In [353]:
print(data_dict["train_images"].shape)
print(data_dict["train_labels"].shape)
print(data_dict["test_images"].shape)
print(data_dict["test_labels"].shape)

(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


Lets make a NN that'll help us classify these digits

In [420]:
mnist_net = NeuralNetwork()

input_layer = Layer(28*28, 100)
input_layer.initalize_random_weights()
input_layer.initalize_random_biases()
mnist_net.add(input_layer)                

mnist_net.add(ActivationLayer())

first_hidden_layer = Layer(100, 50)
first_hidden_layer.initalize_random_weights()
first_hidden_layer.initalize_random_biases()
mnist_net.add(first_hidden_layer)   

mnist_net.add(ActivationLayer())

second_hidden_layer = Layer(50, 10)
second_hidden_layer.initalize_random_weights()
second_hidden_layer.initalize_random_biases()
mnist_net.add(second_hidden_layer)   

mnist_net.add(ActivationLayer())

# 784 -> 100 -> activation -> 50 -> activation -> 10 -> activation

reshaped_train_image_data = np.reshape(data_dict["train_images"], (60000, 28*28, 1))
reshaped_train_label_data = np.reshape(data_dict["train_labels"], (60000, -1, 1))

In [421]:
# this is a little complex, but in order to change our labels from a single value like 1 or 2 to an array
# like [0,0,1,0,0,0,0,0,0,0], we need to create a "one-hot" vector/array.
def one_hot(classes, data):
    vector = np.zeros(classes)
    vector[int(data)] = 1
    vector = np.reshape(vector, (-1,1))
    return vector
    
reshaped_train_label_data_one_hot = []
for data in reshaped_train_label_data:
    reshaped_train_label_data_one_hot.append(one_hot(10, data))
    

np.array(reshaped_train_label_data_one_hot)

array([[[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]],

       [[1.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]],

       [[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]],

       ...,

       [[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]],

       [[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]],

       [[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [1.],
        [0.]]])

In [403]:
# we divide by 255 to normalize the data i.e. making all the data between 0 and 1 instead of between 0 and 255
mnist_net.fit(reshaped_train_image_data / 255, reshaped_train_label_data, 1, 0.1)

IndexError: index 10 is out of bounds for axis 0 with size 10