# Introduction

This is a intuitive guide made for those who are just now getting into neural networks, the objective is to go step by step into how to build a neural network, making sure each and every step is fully understandable. We will be following the book made by Michael Nielsen (link to the book: http://neuralnetworksanddeeplearning.com/index.html), and will try to make the code easier to understand, since a few steps might get lost when going from the book into the code

We will be working with the Hello World of neural Networks, The MNIST Dataset, which is a collection of handwritten digits as well as the respective labels (1, 2, 3...). The network will try to correctly classify the digits, and the accuracy will be displayed and used as a measure to how well the network did.


# Necessary libraries:

If you dont have any of the libraries necessary, just run the following commands on your windows/linux terminal:
* Numpy: !pip install numpy 

In [106]:
import math
import numpy as np

# First Step: Creating the network

In order to create a network, we will be taking advantage of python's classes.

If you dont know what a class is, I suggest you take a look at the following link: https://www.w3schools.com/python/python_classes.asp

## Why use classes?

Classes are just an easy form to make the code easier to use, aggregating meaningful attributes into a single object.
In this manner, we are able to more easily track each attribute of the network.


### Explaining necessary attributes:

In order to create a neural network, a few attributes will be necessary:

* number of inputs (nInputs): How many input neurons will the network create. For this simple NN (Neural Network), each pixel will represent an input neuron. The MNIST images have each 28x28 pixels, thus, we will have 28x28 input neurons
* number of HiddenLayers (nHiddenLayers): How many hiddend layers should the network have. For a simple NN, only 1 hidden layer will suffice, but for more complex projects, multiple hidden layers may be used (a network with multiple hidden layers is called a *deep neural network* ). The purpose of the hidden layers is to try and "catch" patterns of the image, with each subsequent layer bulding upon the last; so for example, a network with 3 hidden layers could work like the following: the first hl (hidden layer) catches the edges, the second turn the edges into shapes (like circles, squares, triangles), and the third turn the shapes into more complex shapes (like eyes, mouth, etc).
* number of Neurons per Hidden Layer: How many neurons will each hidden layer have. This is kind of an arbitrary choice, for this network we will be using 30 hidden neurons, simply because this is what Michael Nielsen did in his book. 
* number of output neurons (nOutputs): How many output Neurons should the network have. It's a good idea for each outcome to have a neuron specific to it, and since we are trying to classify 10 different digits (0 to 9), we will be using 10 output neurons

Aside from those attributes which the user will give to the network, other attributes will be made based on those initial ones.
### Weights: 
The weights between two neurons from different layers. For simplicity, we are using numpy to build the matrices with each index assuming a random value between -5 and 5. 
The matrix we need to build is a 3D matrix. 3D matrix might be scary at first, but our code was structured to have an intuitive manner of accessing each component:
* The first index is which layer the "weight layer" we are dealing with. A "weight layer" is simply the layer of weights between two neurons
* The second index is from what neuron the weight is exiting from
* The third index is to what neuron the weight is arriving

### Bias: 
For the bias, a 2D matrix will suffice. Like with the weights, the biases will also be started randomly
* The first index tells which layer the bias belongs to
* The second index tells which neuron the bias belongs to

The following code will create the class with all atributes necessary

### Activation:
For the activation of the neurons, a 2D matrix will also suffice 

In [107]:
#we will be using a fixed seed as to make our network behave equally between runs
np.random.seed(9)
class Network:
    def __init__(self, nInputs, nHiddenLayers, nNeuronsPerHL, nOutputs):
        totalLayers = nHiddenLayers + 2 # total layers = (one input+ nHiddenLayers+ one output)
        self.totalLayers = totalLayers
        self.nInputs = nInputs
        self.nHiddenLayers = nHiddenLayers
        self.nNeuronsPerHL = nNeuronsPerHL
        self.nOutputs = nOutputs
        
        #initializing the weights and biases randomly using a gaussian distribution with mean 0 and standard deviation 1
        sizes = [nInputs]
        for i in range(nHiddenLayers):
            sizes.append(nNeuronsPerHL)
        sizes.append(nOutputs)
        self.biases = [np.random.normal(loc=0, scale=1, size=s) for s in sizes[1:None]]                    #which layer, which neuron
        
        self.weights =[np.random.normal(loc=0, scale=1, size=(x,y)) for x,y in zip(sizes[:None], sizes[1:])]
        #which layer, which neuron, which weight
        #to access the 1st weight layer 5th exiting neuron 10th arriving neuron: network.weights[0][4][9]
        #+1 because the number of weight layers is the total number of layers - 1
        self.zActivations = [np.zeros(s) for s in sizes[1:None]]
        self.activations = [np.zeros(s) for s in sizes]

# The sigmoid function
In order to our activation neurons on the hidden layers to not explode (meaning, not have an exagerated value, which in turn would mean an exagerated importance); we implement the sigmoid function. It has a few nice traits that are of use to us and is simple enough that can be implement easily, the book has more details about our choice (chapter 1 in the section sigmoid neurons)

In [108]:
def sigmoid(number):
    sigNumber = 1/(1 + np.exp(-number))
    return sigNumber  

# Feeding Forward
Now, for the bread and butter of neural networks, the feeding forward step. Feeding forward is one of the most fundamental steps of neural networks, and consists of passing information from previous layers to later layers. Combining weights, activations and biases, our neurons will be able to try and find patterns and relative importances on the input, with each passing layer finding (hopefuly) increasingly complex patterns.
In pratical terms, the logic is fairly simple: Starting from the input layer, all neurons from the previous layer will pass information to ALL neurons of the next layer, making the iconic shape that represents neural networks. 
If you dont understand (or wanta a nice visual explanation), I suggest this video of 3blue1brown explaining how it works: https://youtu.be/aircAruvnKk?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&t=274

## Explaining the logic of our code:
The logic is as follow:
* Starting from the input layer, find the number of neurons of the giving layer and of the receiving (next) layer
* Then, loop through every neuron of the receiving layer in the following manner: First, get the bias by inputing the layer and position of the  neuron. Second, loop through every neuron of the previous layers, combining the activation with its respective weight, and summing the results.
* Finally, change layers, with the receiving layer now being the giving layer, and the receiving layer being the next one after the current layer.
* Repeat until there are no more receiving layers

In [109]:
def feedforward(network: Network):
    givingLayer = 0
    # looping until the last layer
    while givingLayer < network.totalLayers-1:                 
        if givingLayer == 0:                           #if its the input layer
            nGivingNeurons = network.nInputs
            nReceivingNeurons = network.nNeuronsPerHL
        elif givingLayer == network.totalLayers-2: #if its the layer before the output layer
            nGivingNeurons = network.nNeuronsPerHL
            nReceivingNeurons = network.nOutputs
        else:                                          #if its any layer inbetween
            nGivingNeurons = network.nNeuronsPerHL
            nReceivingNeurons = network.nNeuronsPerHL
        currentLayer = 0
        #for each neuron in the layer being fed    
        for receivingNeuron in range(nReceivingNeurons):
            #the activation of the current neuron is its own biases + all the weights*activations of the previous layer
            activation = network.biases[currentLayer][receivingNeuron]
            for givingNeuron in range(nGivingNeurons):
                activation += network.weights[givingLayer][givingNeuron][receivingNeuron]*network.activations[givingLayer][givingNeuron]
            network.activations[givingLayer+1][receivingNeuron] = sigmoid(activation)
            
        givingLayer += 1
        currentLayer += 1
    
    return network

In [110]:
def classify(network: Network):
    lastLayer = network.totalLayers-1
    maxIndex = np.argmax(network.activations[lastLayer])
    return maxIndex, network.activations[lastLayer][maxIndex]

# Getting the MNIST Dataset
There are several ways of getting the MNIST Dataset, I used the keras library just because it seemed the easiest, but feel free to choose what you prefer.
If you dont really want to look for an alternative way, use the following commands to install the keras library (keep in mid that the tensorflow library needed is fairly large):
* !pip install tensorflow
* !pip install keras.

Using keras, our data will already be split, between training and testing. If you did it any other way, just make sure that roughly 25% of the images goes to the test set

In [111]:
from keras.datasets import mnist
(trainX, trainY), (testX, testY) = mnist.load_data()

# Passing images through the network
Note that it may take a few minutes, depending on your hardware

In [112]:
network = Network(28*28, 1, 30, 10)
hits = 0
misses = 0
nTrainingImages = trainX.shape[0]
#for all training images
for currentImage in range(nTrainingImages):
    #passing the inputs to our network
    for i in range(28):
        for j in range(28):
            """ In order to go through all input neurons, we have to transform both loops 
            in just one index, we do that by using the formula 28*i + j"""
            network.activations[0][28*i + j] = trainX[currentImage][i][j]
    network = feedforward(network)
    numberNetworkThinks, certainty  = classify(network) 
    if numberNetworkThinks == trainY[currentImage]:
        hits += 1
    else:
        misses += 1
    #print(f"Network Classified as : {numberNetworkThinks}, certainty: {certainty}, real number: {trainY[currentImage]}")
acc = hits/misses
print(acc)

  sigNumber = 1/(1 + np.exp(-number))


0.1111111111111111


# Congratulations!
You know have a fully functional, guess generator. Since our network does not have any learning, it's output is no better than a random a guess; in turn, ou accuracy reflects that line of thought (roughly, our network guessed 10% of tests righ). In the next notebook, we will implement learning, and our network will finaly be of use!