# Buzz Words but What do They Mean?

![NNs are a small subset of ai, ml, and deep learning](images\what_is_a_neural_network\small_subset_of_ai.jpg)

### Simple Neural Network

![simple neural network architecture](images\what_is_a_neural_network\basic_neural_network.jpeg)

Neural networks are really a tiny subset of a bunch of different, larger categories of problem-solving techniques.

![basic neuron in a neural network](images\what_is_a_neural_network\basic_neuron_with_bias.jpg)

The most basic unit of a neural network is a single neuron.

### What are the Parts of a Neuron?

1. inputs
2. weight & bias
3. sum (can be expressed as the dot product)
4. activation function 
5. output

### The basic Neuron class

In [874]:
class BasicNeuron:
    def __init__(self):
        self.weights = None
        self.bias = None
        
    def calc_neuron_output(self, inputs):
        sum = 0
        for inpt, weight in zip(inputs, self.weights):
            sum += inpt * weight
        sum += self.bias
        return sum 
    
        ###############
        ## IMPORTANT ##
        ###############
        # there is a much better way to do this calculation, which we will talk about. This is just for instructional purposes

Our basic Neuron class contains a few methods and a few attributes. 

It has a constructor which allows us to create Neurons.

It has a calc_neuron_output function (function and method mean the same thing) which takes in some inputs and performs the following calculation: 

\begin{equation*}
y_j = b_j +  \sum_{i} x_iw_{ij}
\end{equation*}

Which means the output of neuron "y sub j" is the sum of all the neuron's inputs times their respective weights plus a bias.

In [875]:
n = BasicNeuron()
n.weights = [0.1, 0.2, 0.3, 0.4]
n.bias = 1
inpts = [1, 1, 1, 1]

n.calc_neuron_output(inpts)

2.0

It looks like our basic Neuron class works!

### Putting a layer together

A "layer" is composed of many neurons, each with their own weights and biases.

The benefit of thinking of our neural network in terms of layers is it simplifies a lot of the calculations we must do. 

### The dot product

![types of activation functions](images\what_is_a_neural_network\dot_product_representation.png)

using the dot product instead of calculting the output for each and every neuron makes things so much easier and more efficient. 

Actually, we don't even need our BasicNeuron class if we use the dot product instead of calculating the output of each neuron one-by-one. All we need to consider is the ENTIRE layer and all of the weights and biases that belong to it.

Lets make things simpler by making a layer class.

In [876]:
import numpy as np
import os
import codecs
np.random.seed(2)

In [877]:
class Layer:
    def __init__(self, number_input_neurons = 0, number_output_neurons = 0, weights = np.array([]), biases = np.array([])):
        self.number_input_neurons = number_input_neurons
        self.number_output_neurons = number_output_neurons
        self.weights = weights
        self.biases = biases
        self.input = None
        self.output = None
        
    def initalize_random_weights(self):
        self.weights = np.random.rand(self.number_output_neurons, self.number_input_neurons) - 0.5
    
    def initalize_random_biases(self):
        self.biases = np.random.rand(self.number_output_neurons, 1) - 0.5
        
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = np.dot(self.weights, self.input) + self.biases
        return self.output
    
    # We'll explain this in a bit! For now know that this is important for later.
    def backward_propagation(self, output_error, learning_rate):
        input_error = np.dot(self.weights.T, output_error)
        weights_error = np.dot(output_error, self.input.T)

        # update parameters
        self.weights -= learning_rate * weights_error
        self.biases -= learning_rate * output_error
        return input_error

In [878]:
layer = Layer(2, 2)
layer.initalize_random_weights()
layer.initalize_random_biases()
print("weights:\n{}".format(layer.weights))
print("\n##########\n")
print("baises:\n{}".format(layer.biases))

weights:
[[-0.0640051  -0.47407377]
 [ 0.04966248 -0.06467761]]

##########

baises:
[[-0.0796322 ]
 [-0.16966518]]


In [879]:
inputs = np.array([[0,0],
                   [0,1],
                   [1,0],
                   [1,1]])

expected_outputs = np.array([[0], 
                             [1], 
                             [1], 
                             [0]])

inputs = np.reshape(inputs, (4,-1,1))
 # rotates the entire array. We need to rotate it because we wanted the input to our NN make visual sense.
 # In most cases, you would not have to do this, but it might aid in understanding to see it this way
     #[[[0],
     # [0]],
     #
     #[[0],
     #[1]],
     #
     #[[1],
     #[0]],
     #
     #[[1],
     #[1]]]

for inpt in inputs:
    print(layer.forward_propagation(inpt))
    print("----")

[[-0.0796322 ]
 [-0.16966518]]
----
[[-0.55370597]
 [-0.23434279]]
----
[[-0.1436373]
 [-0.1200027]]
----
[[-0.61771106]
 [-0.18468031]]
----


### Types of Activation Functions

![types of activation functions](images\what_is_a_neural_network\types_of_activation_functions.jpg)

### The Sigmoid Activation Function
This is what we'll be using for this simple example:

![the sigmoid function](images\what_is_a_neural_network\sigmoid.png)

In [880]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

# we'll need the derivative of this function later!
def sigmoid_derivative(z):
    return sigmoid(z) * (1 - sigmoid(z))
    

In [881]:
# numpy allows us to apply a function to every value in the array
activation = sigmoid(layer.output)

print(layer.output)
print("\n#######\n")
print("after applying the sigmoid function to every value in the array:\n\n{}".format(activation))


[[-0.61771106]
 [-0.18468031]]

#######

after applying the sigmoid function to every value in the array:

[[0.35030221]
 [0.4539607 ]]


You will often see these activation functions being included in their own layers called the "activation layer."

Lets make an activation layer class.

In [882]:
class ActivationLayer(Layer):
    def __init__(self):
        self.output = None
        self.input = None
        
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = sigmoid(self.input)
        return self.output
    
    # Stay with us with all this backward propagation stuff. It'll make more sense in a moment.
    def backward_propagation(self, output_error, learning_rate):
        return sigmoid_derivative(self.input) * output_error

In [883]:
activation = ActivationLayer()
activation.forward_propagation(layer.output)

array([[0.35030221],
       [0.4539607 ]])

We can add another layer to our network with two inputs and one output


In [884]:
layer_2 = Layer(2, 1)
layer_2.initalize_random_weights()
layer_2.initalize_random_biases()
print("weights:\n{}".format(layer_2.weights))
print("\n##########\n")
print("baises:\n{}".format(layer_2.biases))

weights:
[[-0.29535137  0.11927097]]

##########

baises:
[[-0.20034533]]


In [885]:
layer_2.forward_propagation(activation.output)

array([[-0.24966323]])

Works just about how you would expect it to!

We've created a basic Layer class which takes some inputs, has some weights and biases, and helps us "train" our neural network. We've also made an activation layer class that applies an activation function to all the data we pass into it and will also help us "train" our NN.

You might be wondering what it means to "train" a NN, and that is a very important question. 

A neural network works because we can use a little bit of calculus to adjust the weights and biases in a smart way that allows us to make our predictions more accurate. This process of adjusting the weights and biases is what it means to "train" a NN. It is ok if you are not super comfortable with the calculus, it is not super important to understand fully right now, but it might give you a better intuition into exactly what makes neural networks work. We'll keep things as simple as we can so let's dive into it.

### Measuring Error

The first step to adjusting our weights and biases is understanding how off our prediction was. After we understand how off our prediction was, we can start talking about how we can go about making our neural network perform better.

### Loss Functions (Cost Functions)

We can measure the error in our prediction using what is called a loss function (also known as a cost function). We will be using the Mean Squared Error function for this example
![MSE](images\what_is_a_neural_network\mean_squared_error.png)

In [886]:
# we take the two arrays and calculate the mse.
def mean_squared_error(y_true, y_pred):
    return np.mean(np.power(y_true-y_pred, 2));

# we'll need this later too!
def mean_squared_error_derivative(y_true, y_pred):
    return 2*(y_pred-y_true)/y_true.size;

This gives us a measure of how off our predictions are. 

### Back Propagation

 **Explain back prop + gradient descent**

 what it is in 2d, 3d, and how to think about it in higher-dimension space

Now we have all the parts of our NN! We can put them all together in a nice little class to make things easy.

In [887]:
class NeuralNetwork:
    def __init__(self):
        self.layers = []

        
        
    def add(self, layer):
        self.layers.append(layer)
        
    def predict(self, input_data):
        
        result = []
        
        for data in input_data:
            output = data
            for layer in self.layers:
                output = layer.forward_propagation(output)
            result.append(output)
            
        return result
    
    def fit(self, x_train, y_train, no_epochs, learning_rate):
        for i in range(no_epochs):
            error = 0
            for counter, j in enumerate(x_train):
                output = j
                for layer in self.layers:
                    output = layer.forward_propagation(output)
                
                error += mean_squared_error(y_train[counter], output)
                
                # backprop
                
                err = mean_squared_error_derivative(y_train[counter], output)
                for layer in reversed(self.layers):
                    err = layer.backward_propagation(err, learning_rate)
                    
            error /= len(x_train)
            print("epoch {}  error = {}".format(i, error))
    

In [888]:
net = NeuralNetwork()
first_layer = Layer(2,2)
first_layer.initalize_random_weights()
first_layer.initalize_random_biases()
second_layer = Layer(2,1)
second_layer.initalize_random_weights()
second_layer.initalize_random_biases()

print(first_layer.forward_propagation(np.array([[0],[0]])))

net.add(first_layer)
net.add(ActivationLayer())
net.add(second_layer)
net.add(ActivationLayer())

net.fit(inputs, expected_outputs, 1000, 1)

[[ 0.01357812]
 [-0.31556013]]
epoch 0  error = 0.2995760762143457
epoch 1  error = 0.29733144744057466
epoch 2  error = 0.2962840620853773
epoch 3  error = 0.295663120905906
epoch 4  error = 0.2952055582942539
epoch 5  error = 0.2948203028502162
epoch 6  error = 0.29447397443709233
epoch 7  error = 0.29415316694511917
epoch 8  error = 0.29385181626510937
epoch 9  error = 0.2935667810713601
epoch 10  error = 0.29329619140369256
epoch 11  error = 0.29303877969260056
epoch 12  error = 0.29279358633969743
epoch 13  error = 0.29255982050573615
epoch 14  error = 0.292336790729144
epoch 15  error = 0.29212386919555344
epoch 16  error = 0.291920473028459
epoch 17  error = 0.2917260544318798
epoch 18  error = 0.2915400954790287
epoch 19  error = 0.29136210531880274
epoch 20  error = 0.2911916186042094
epoch 21  error = 0.29102819450120454
epoch 22  error = 0.2908714159388566
epoch 23  error = 0.29072088892772985
epoch 24  error = 0.290576241864205
epoch 25  error = 0.29043712478751865
epoch 26

epoch 260  error = 0.2238024837416421
epoch 261  error = 0.22294460701139712
epoch 262  error = 0.22210622905120864
epoch 263  error = 0.22128749606850706
epoch 264  error = 0.2204884754524482
epoch 265  error = 0.21970916219264397
epoch 266  error = 0.21894948537549502
epoch 267  error = 0.21820931465062052
epoch 268  error = 0.21748846657597914
epoch 269  error = 0.2167867107661263
epoch 270  error = 0.21610377578318246
epoch 271  error = 0.21543935472416073
epoch 272  error = 0.21479311047108854
epoch 273  error = 0.21416468058170787
epoch 274  error = 0.21355368180841275
epoch 275  error = 0.21295971424147045
epoch 276  error = 0.21238236507954186
epoch 277  error = 0.21182121203615512
epoch 278  error = 0.21127582639521048
epoch 279  error = 0.21074577573193617
epoch 280  error = 0.2102306263181045
epoch 281  error = 0.20972994523189592
epoch 282  error = 0.20924330219368953
epoch 283  error = 0.20877027114938373
epoch 284  error = 0.20831043162272067
epoch 285  error = 0.20786336

epoch 556  error = 0.1843576914545333
epoch 557  error = 0.1843182950088189
epoch 558  error = 0.18427830606425455
epoch 559  error = 0.1842376989949751
epoch 560  error = 0.18419644700745705
epoch 561  error = 0.1841545220724608
epoch 562  error = 0.18411189485240392
epoch 563  error = 0.18406853462382175
epoch 564  error = 0.18402440919453472
epoch 565  error = 0.18397948481511714
epoch 566  error = 0.18393372608422243
epoch 567  error = 0.18388709584728632
epoch 568  error = 0.18383955508808497
epoch 569  error = 0.183791062812584
epoch 570  error = 0.1837415759244624
epoch 571  error = 0.18369104909164635
epoch 572  error = 0.18363943460312607
epoch 573  error = 0.18358668221527014
epoch 574  error = 0.1835327389867813
epoch 575  error = 0.1834775491013641
epoch 576  error = 0.183421053677096
epoch 577  error = 0.1833631905614022
epoch 578  error = 0.18330389411044548
epoch 579  error = 0.1832430949516343
epoch 580  error = 0.18318071972784566
epoch 581  error = 0.18311669082183757

epoch 854  error = 0.004896187159041639
epoch 855  error = 0.00486450484242925
epoch 856  error = 0.004833211035723929
epoch 857  error = 0.004802298807487204
epoch 858  error = 0.004771761387456301
epoch 859  error = 0.00474159216194803
epoch 860  error = 0.004711784669417178
epoch 861  error = 0.004682332596163676
epoch 862  error = 0.004653229772182651
epoch 863  error = 0.004624470167151862
epoch 864  error = 0.004596047886551558
epoch 865  error = 0.004567957167911411
epoch 866  error = 0.004540192377179952
epoch 867  error = 0.004512748005211831
epoch 868  error = 0.004485618664368552
epoch 869  error = 0.004458799085228426
epoch 870  error = 0.0044322841134017505
epoch 871  error = 0.00440606870644731
epoch 872  error = 0.0043801479308864895
epoch 873  error = 0.004354516959311513
epoch 874  error = 0.004329171067584307
epoch 875  error = 0.004304105632122746
epoch 876  error = 0.004279316127271253
epoch 877  error = 0.0042547981227525275
epoch 878  error = 0.004230547281197749


In [889]:
predictions = net.predict(inputs)
for inpt, expected, actual in zip(inputs, expected_outputs, predictions):
    print("input: {}, expected: {}, result: {}".format([int(inpt[0]), int(inpt[1])], expected, actual))


input: [0, 0], expected: [0], result: [[0.0401224]]
input: [0, 1], expected: [1], result: [[0.95343181]]
input: [1, 0], expected: [1], result: [[0.9526447]]
input: [1, 1], expected: [0], result: [[0.06110162]]


It works! Our Hand-coded neural network has correctly learned how to classify our inputs into the desired outputs! Maybe this doesn't seem like a huge deal, but lets scale up the project a bit.

### The MNIST Dataset

The MNIST dataset is very famous dataset that contains a bunch of labled hand-written digits. Lets use the same classes we created to help us classify these digits.

The MNIST dataset in included in our "data" folder, but it can also be downloaded from [this link](http://yann.lecun.com/exdb/mnist/)

All the image files and labels of the MNIST dataset is encoded into these 4 files. We need to be able to extract the images from the files to work with them.

### File descriptions
Four files are provided (the .gz files are zipped versions of these files):

* Test Images : t10k-images-idx3-ubyte
* Test Labels :  t10k-labels-idx1-ubyte
* Train Images : train-images-idx3-ubyte
* Train Labels :  train-labels-idx1-ubyte

The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.

#### The basic format for labels
  
|Offset | Type               | Value           |   Description                   |
|-------|--------------------|-----------------|---------------------------------|
|0000   |4 byte integer      |0x00000801(2049) |magic number (MSB first)         |
|0004   |4 byte integer      |10000 or 60000   |number of items (test or train)  |
|0008   |unsigned byte       |??               |label                            |
|0009   |unsigned byte       |??               |label                            |
|...    |...                 |...              |...                              |
|xxxx   |unsigned byte       |??               |label                            |


#### The basic format for images

|Offset | Type               | Value           |   Description                   |
|-------|--------------------|-----------------|---------------------------------|
|0000   |4 byte integer      |0x00000801(2051) |magic number (MSB first)         |
|0004   |4 byte integer      |10000 or 60000   |number of images (test or train) |
|0008   |4 byte integer      |28               |number of rows                   |
|0012   |4 byte integer      |28               |number of columns                |
|0016   |unsigned byte       |??               |pixel intensity (0-255)          |
|0017   |unsigned byte       |??               |pixel intensity (0-255)          |
|...    |...                 |...              |...                              |
|xxxx   |unsigned byte       |??               |pixel intensity (0-255)          |


### Converting the ubyte files to numpy arrays for easy processing
The following code converts the ubyte files into four numpy n dimensional arrays and stores them in a dictionary called `data_dict` which has four key, value pairs.

| Key           |  Type        |Shape         |
|---------------|--------------|--------------|
|*train_images* |numpy ndarray |[60000,28,28] |
|*train_labels* |numpy ndarray |[60000]       |
|*test_images*  |numpy ndarray |[10000,28,28] |
|*test_labels*  |numpy ndarray |[10000]       |


In [890]:
# PROVIDE YOUR DIRECTORY WITH THE EXTRACTED FILES HERE
datapath = 'data/MNIST/raw/uncompressed/'

files = os.listdir(datapath)
print(files)

def get_int(b):   # CONVERTS 4 BYTES TO A INT
    return int(codecs.encode(b, 'hex'), 16)

data_dict = {}
for file in files:
    if file.endswith('ubyte'):  # FOR ALL 'ubyte' FILES
        print('Reading ',file)
        with open (datapath+file,'rb') as f:
            data = f.read()
            type = get_int(data[:4])   # 0-3: THE MAGIC NUMBER TO WHETHER IMAGE OR LABEL
            length = get_int(data[4:8])  # 4-7: LENGTH OF THE ARRAY  (DIMENSION 0)
            if (type == 2051):
                category = 'images'
                num_rows = get_int(data[8:12])  # NUMBER OF ROWS  (DIMENSION 1)
                num_cols = get_int(data[12:16])  # NUMBER OF COLUMNS  (DIMENSION 2)
                parsed = np.frombuffer(data,dtype = np.uint8, offset = 16)  # READ THE PIXEL VALUES AS INTEGERS
                parsed = parsed.reshape(length,num_rows,num_cols)  # RESHAPE THE ARRAY AS [NO_OF_SAMPLES x HEIGHT x WIDTH]           
            elif(type == 2049):
                category = 'labels'
                parsed = np.frombuffer(data, dtype=np.uint8, offset=8) # READ THE LABEL VALUES AS INTEGERS
                parsed = parsed.reshape(length)  # RESHAPE THE ARRAY AS [NO_OF_SAMPLES]                           
            if (length==10000):
                set = 'test'
            elif (length==60000):
                set = 'train'
            data_dict[set+'_'+category] = parsed  # SAVE THE NUMPY ARRAY TO A CORRESPONDING KEY   

['t10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte', 'train-images-idx3-ubyte', 'train-labels-idx1-ubyte']
Reading  t10k-images-idx3-ubyte
Reading  t10k-labels-idx1-ubyte
Reading  train-images-idx3-ubyte
Reading  train-labels-idx1-ubyte


In [891]:
print(data_dict["train_images"].shape)
print(data_dict["train_labels"].shape)
print(data_dict["test_images"].shape)
print(data_dict["test_labels"].shape)

(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


Lets make a NN that'll help us classify these digits

In [892]:
mnist_net = NeuralNetwork()

input_layer = Layer(28*28, 1000)
input_layer.initalize_random_weights()
input_layer.initalize_random_biases()
mnist_net.add(input_layer)                

mnist_net.add(ActivationLayer())

first_hidden_layer = Layer(1000, 50)
first_hidden_layer.initalize_random_weights()
first_hidden_layer.initalize_random_biases()
mnist_net.add(first_hidden_layer)   

mnist_net.add(ActivationLayer())

second_hidden_layer = Layer(50, 10)
second_hidden_layer.initalize_random_weights()
second_hidden_layer.initalize_random_biases()
mnist_net.add(second_hidden_layer)   

mnist_net.add(ActivationLayer())

# 784 -> 100 -> activation -> 50 -> activation -> 10 -> activation

reshaped_train_image_data = np.reshape(data_dict["train_images"], (60000, 28*28, 1))
reshaped_train_label_data = np.reshape(data_dict["train_labels"], (60000, -1, 1))

In [893]:
# this is a little complex, but in order to change our labels from a single value like 1 or 2 to an array
# like [0,0,1,0,0,0,0,0,0,0], we need to create a "one-hot" vector/array.
def one_hot(classes, data):
    vector = np.zeros(classes)
    vector[int(data)] = 1
    vector = np.reshape(vector, (-1,1))
    return vector
    
reshaped_train_label_data_one_hot = []
for data in reshaped_train_label_data:
    reshaped_train_label_data_one_hot.append(one_hot(10, data))
    

reshaped_train_label_data = np.array(reshaped_train_label_data_one_hot)

shuffler = np.random.permutation(len(reshaped_train_label_data))
reshaped_train_image_data = reshaped_train_image_data[shuffler]
reshaped_train_label_data = reshaped_train_label_data[shuffler]


In [894]:
# we divide by 255 to normalize the data i.e. making all the data between 0 and 1 instead of between 0 and 255
mnist_net.fit((reshaped_train_image_data / 255)[:1000], (reshaped_train_label_data)[:1000], 50, .1)

epoch 0  error = 0.09388195733263206
epoch 1  error = 0.08102232354572304
epoch 2  error = 0.07272317978363252
epoch 3  error = 0.06505121880134793
epoch 4  error = 0.05816477156825603
epoch 5  error = 0.05215967320331103
epoch 6  error = 0.047016079262734774
epoch 7  error = 0.042584596156359046
epoch 8  error = 0.03875818927804898
epoch 9  error = 0.03543753196769752
epoch 10  error = 0.032544779455834345
epoch 11  error = 0.030015770489381983
epoch 12  error = 0.027791856636822192
epoch 13  error = 0.025821739780090124
epoch 14  error = 0.024063246716153293
epoch 15  error = 0.02248235848474939
epoch 16  error = 0.021051779688596946
epoch 17  error = 0.01975196431756762
epoch 18  error = 0.018570055181225256
epoch 19  error = 0.017492293926278657
epoch 20  error = 0.01650415094983572
epoch 21  error = 0.015593565209480369
epoch 22  error = 0.014750702579691664
epoch 23  error = 0.013967099235149534
epoch 24  error = 0.013235459462936182
epoch 25  error = 0.01254990057752228
epoch 26

In [895]:

print(input_layer.output)

[[ 8.40576437e-01]
 [-3.16746020e+00]
 [-4.54835351e+00]
 [-1.93873391e+00]
 [ 2.40579135e+00]
 [-5.65265752e+00]
 [ 1.10778292e+00]
 [-2.00070684e+00]
 [-4.48966947e-01]
 [ 1.48647092e-01]
 [-6.22998201e-01]
 [-1.18130555e+00]
 [-3.68958628e-01]
 [ 3.65818342e+00]
 [-4.48439220e+00]
 [-4.71311929e+00]
 [ 6.58971258e-01]
 [ 2.06526909e+00]
 [-2.69010377e-01]
 [-2.11263712e+00]
 [-6.27751959e-01]
 [-2.99821086e+00]
 [-1.60352211e+00]
 [-2.19073121e+00]
 [-8.24743809e-01]
 [ 4.19477334e-01]
 [-2.88549186e+00]
 [ 1.55220009e+00]
 [ 2.90582586e+00]
 [ 9.80140703e-01]
 [-3.02918931e-01]
 [ 2.45873762e-01]
 [ 3.13923959e+00]
 [-1.35233677e+00]
 [ 5.33726629e-01]
 [-3.69242774e-01]
 [-2.85827047e+00]
 [ 3.52574039e+00]
 [ 1.44718491e+00]
 [-1.55782491e+00]
 [-1.63901477e+00]
 [-1.05822528e+00]
 [-1.49248814e+00]
 [ 2.41366472e-01]
 [-2.18574559e+00]
 [-2.47725777e+00]
 [-4.60722300e+00]
 [ 3.33269021e-02]
 [ 1.90277304e+00]
 [-6.22835997e-04]
 [-4.08892270e-01]
 [ 3.93050756e+00]
 [-2.2759811