# Artificial Neural Network (Optional)

Neural networks are parallel computing devices, which is basically an attempt to make a computer model of the brain. The main objective is to develop a system to perform various computational tasks faster than the traditional systems. These tasks include pattern recognition and classification, approximation, optimization, and data clustering.

## What is Artificial Neural Network?

Artificial Neural Network ANN

is an efficient computing system whose central theme is borrowed from the analogy of biological neural networks. ANNs are also named as “artificial neural systems,” or “parallel distributed processing systems,” or “connectionist systems.” ANN acquires a large collection of units that are interconnected in some pattern to allow communication between the units. These units, also referred to as nodes or neurons, are simple processors which operate in parallel.

Every neuron is connected with other neuron through a connection link. Each connection link is associated with a weight that has information about the input signal. This is the most useful information for neurons to solve a particular problem because the weight usually excites or inhibits the signal that is being communicated. Each neuron has an internal state, which is called an activation signal. Output signals, which are produced after combining the input signals and activation rule, may be sent to other units.

### A Brief History of ANN

- 1940s - The beginning of Neural Networks (Electronic Brain)
- 1950s and 1960s - The first golden age of Neural Networks (Perceptron)
- 1970s - The winter of Neural Networks (XOR problem)
- 1980s - Renewed enthusiasm (Multilayered Perceptron, backpropagation)
- 1990s - Subfield of Radial Basis Function Networks was developed
- 2000s - The power of Neural Networks Ensembles & Support Vector Machines is apparent
- 2006 - Hinton presents the Deep Belief Network (DBN)
- 2009 - Deep Recurrent Neural Network
- 2010 - Convolutional Deep Belief Network (CDBN)
- 2011 - Max-Pooling CDBN

<img src='./images/ann_history.jpg' width="800">

[image source] http://qingkaikong.blogspot.com/2016/11/machine-learning-3-artificial-neural.html

## Biological Neurons: An Overly Simplified Illustration

<img src='./images/neuron_1.png' width="500">

[image source] https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1

- **Dendrite**: Receives signals from other neurons

- **Soma**: Processes the information

- **Axon**: Transmits the output of this neuron

- **Synapse**: Point of connection to other neurons

Basically, a neuron takes an input signal (dendrite), processes it like the CPU (soma), passes the output through a cable like structure to other connected neurons (axon to synapse to other neuron’s dendrite). Now, this might be biologically inaccurate as there is a lot more going on out there but on a higher level, this is what is going on with a neuron in our brain — takes an input, processes it, throws out an output.

Our sense organs interact with the outer world and send the visual and sound information to the neurons. Let's say you are watching Friends. Now the information your brain receives is taken in by the “laugh or not” set of neurons that will help you make a decision on whether to laugh or not. Each neuron gets fired/activated only when its respective criteria (more on this later) is met like shown below.

<img src='./images/neuron_2.png' width="1000">

[image source] https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1

## McCulloch-Pitts Neuron

The first computational model of a neuron was proposed by Warren MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943.

<img src='./images/m_pitts_neuron.png' width="500">

### A Simple  Example

<img src='./images/perceptron.png' width="500">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

3 things are happening here. First, each input is multiplied by a weight: 

$x_1 -> x_1 * w_1$

$x_2 -> x_2 * w_2$

Next, all the weighted inputs are added together with a bias b:

$(x_1*w_1) + (x_2*w_2) + b$

Finally, the sum is passed through an activation function: 

$y = f(x_1*w1 + x_2*w_2 + b)$

The activation function is used to turn an unbounded input into an output that has a nice, predictable form. A commonly used activation function is the sigmoid function:

<img src='./images/sigmoid.png' width="500">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

The sigmoid function only outputs numbers in the range (0,1). You can think of it as compressing (−∞,+∞)to (0,1) - big negative numbers become ~0, and big positive numbers become ~1.

Assume we have a 2-input neuron that uses the sigmoid activation function and has the following parameters:

$w = [0, 1]$

$b = 4$

$w = [0, 1]$ is just a way of writing w1=0, w2=1 in vector form. Now, let’s give the neuron an input of x=[2,3]. We’ll use the dot product to write things more concisely:


$(w*x) + b  = ((w_1* x_1) + (w_2*x_2)) + b $

            = 0 * 2 + 1 * 3 + 4 
           
            = 7
            
$y = f(w*x + b) = f(7) = 0.999$

The neuron outputs 0.999 given the inputs x=[2,3]. 

That’s it! This process of passing inputs forward to get an output is known as **feedforward**.

## Coding a Neuron

Time to implement a neuron! We’ll use NumPy, a popular and powerful computing library for Python, to help us do math:

In [37]:
import numpy as np

def sigmoid(x):
    # Our activation function: f(x) = 1 / (1 + e^(-x))
    return 1 / (1 + np.exp(-x))

class Neuron:
    def __init__(self, weights, bias):
        self.weights = weights
        self.bias = bias

    def feedforward(self, inputs):
        # Weight inputs, add bias, then use the activation function
        total = np.dot(self.weights, inputs) + self.bias
        return sigmoid(total)

weights = np.array([0, 1]) # w1 = 0, w2 = 1
bias = 4                   # b = 4
n = Neuron(weights, bias)

x = np.array([2, 3])       # x1 = 2, x2 = 3
print(n.feedforward(x))    # 0.9990889488055994

0.9990889488055994


Recognize those numbers? That’s the example we just did! We get the same answer of 0.999

## Combining Neurons into a Neural Network

A neural network is nothing more than a bunch of neurons connected together. Here’s what a simple neural network might look like:

<img src='./images/network.png' width="500">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

This network has 2 inputs, a hidden layer with 2 neurons (h1 and h2), and an output layer with 1 neuron (o1). Notice that the inputs for o1 are the outputs from h1 and h2 - that’s what makes this a network.

A **hidden layer** is any layer between the input (first) layer and output (last) layer. **There can be multiple hidden layers!** 

The new coined concept **Deep Learning** is just a neural network with tens of or even hundreds of hidden layers.

### An Example: Feedforward

Let’s use the network pictured above and assume all neurons have the same weights w=[0,1], the same bias b=0, and the same sigmoid activation function. Let h1,h2,o1 denote the outputs of the neurons they represent.

What happens if we pass in the input x=[2,3]?

$h1 = h2 = f(w*x + b)$

    = f((0*2) + (1*3) + 0)
    
    = f(3)
    
    = 0.9526
    
$o1 = f(w[h1, h2] + b)$

    = f((0*h1) + (1*h1) + 0)
    
    = f(0.9526)
    
    = 0.7216
    
    
The output of the neural network for input x=[2,3] is 0.7216. Pretty simple, right?
     

A neural network can have any number of layers with any number of neurons in those layers. The basic idea stays the same: feed the input(s) forward through the neurons in the network to get the output(s) at the end. For simplicity, we’ll keep using the network pictured above for the rest of this post.

### Coding a Neural Network: Feedforward

In [38]:
class OurNeuralNetwork:
    '''
    A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)
    Each neuron has the same weights and bias:
    - w = [0, 1]
    - b = 0
    '''
    def __init__(self):
        weights = np.array([0, 1])
        bias = 0

        # The Neuron class here is from the previous section
        self.h1 = Neuron(weights, bias)
        self.h2 = Neuron(weights, bias)
        self.o1 = Neuron(weights, bias)

    def feedforward(self, x):
        out_h1 = self.h1.feedforward(x)
        out_h2 = self.h2.feedforward(x)

        # The inputs for o1 are the outputs from h1 and h2
        out_o1 = self.o1.feedforward(np.array([out_h1, out_h2]))

        return out_o1

network = OurNeuralNetwork()
x = np.array([2, 3])
print(network.feedforward(x)) # 0.7216325609518421

0.7216325609518421


We got 0.7216 again! Looks like it works.

### Training a neural network Part1

<img src='./images/gender.png' width="800">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

Let’s train our network to predict someone’s gender given their weight and height:

<img src='./images/network2.png' width="500">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

We’ll represent Male with a 0 and Female with a 1, and we’ll also shift the data to make it easier to use:

<img src='./images/table2.png' width="800">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

*I arbitrarily chose the shift amounts (135 and 66) to make the numbers look nice. Normally, you’d shift by the mean.*

### Loss

Before we train our network, we first need a way to quantify how “good” it’s doing so that it can try to do “better”. That’s what the loss is.

We’ll use the **mean squared error (MSE)** loss:

$MSE = \frac{1}{n}\sum_{i=1}^n(y_{true} - y_{pred})^2$

Let’s break this down:

- n is the number of samples, which is 4 (Alice, Bob, Charlie, Diana).
- y represents the variable being predicted, which is Gender.
- $y_{true}$ is the true value of the variable (the “correct answer”). For example, $y_{true}$ for Alice would be  (Female).

- $y_{pred}$ is the predicted value of the variable. It’s whatever our network outputs.


$(y_{true} - y_{pred})^2$ is known as the **squared error**. Our loss function is simply taking the average over all squared errors (hence the name mean squared error). The better our predictions are, the lower our loss will be!

Better predictions = Lower loss.

**Training a network = trying to minimize its loss. **

### An Example Loss Calculation 

Let’s say our network always outputs 0 - in other words, it’s confident all humans are Male 🤔. What would our loss be?

<img src='./images/table3.png' width="800">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

$MSE = \frac{1}{4}(1 + 0 + 0 + 1) = 0.5$

### Code: MSE Loss

Here’s some code to calculate loss for us:

In [3]:
import numpy as np

def mse_loss(y_true, y_pred):
    # y_true and y_pred are numpy arrays of the same length.
    return ((y_true - y_pred) ** 2).mean()

y_true = np.array([1, 0, 0, 1])
y_pred = np.array([0, 0, 0, 0])

print(mse_loss(y_true, y_pred)) # 0.5

0.5


### Training a neural network Part 2  (Math!!!)

**We now have a clear goal: minimize the loss of the neural network**. We know we can change the network’s weights and biases to influence its predictions, but how do we do so in a way that decreases loss?

For simplicity, let’s pretend we only have Alice in our dataset:

<img src='./images/table_alice.png' width="800">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

Then the mean squared error loss is just Alice’s squared error:

$MSE = \frac{1}{1}\sum_{i=1}^1(y_{true} - y_{pred})$

$     = (y_{true} - y_{pred})^2$

$     = (1 - y_{pred})^2 $

Let’s label each weight and bias in our network:

<img src='./images/network3.png' width="500">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

Then, we can write loss as a multivariable function:

$L(w_1, w_2, w_3, w_4, w_5, w_6, b_1, b_2, b_3)$

In the previous lectures, we mentioned use **gradient descent** to calculate the partial derivatives.  

more info : https://victorzhou.com/blog/intro-to-neural-networks/

In [40]:
def sigmoid(x):
    # Sigmoid activation function: f(x) = 1 / (1 + e^(-x))
    return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):
    # Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))
    fx = sigmoid(x)
    return fx * (1 - fx)

def mse_loss(y_true, y_pred):
    # y_true and y_pred are numpy arrays of the same length.
    return ((y_true - y_pred) ** 2).mean()

class OurNeuralNetwork:
    '''
    A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)

    *** DISCLAIMER ***:
    The code below is intended to be simple and educational, NOT optimal.
    Real neural net code looks nothing like this. DO NOT use this code.
    Instead, read/run it to understand how this specific network works.
    '''
    def __init__(self):
        # Weights
        self.w1 = np.random.normal()
        self.w2 = np.random.normal()
        self.w3 = np.random.normal()
        self.w4 = np.random.normal()
        self.w5 = np.random.normal()
        self.w6 = np.random.normal()

        # Biases
        self.b1 = np.random.normal()
        self.b2 = np.random.normal()
        self.b3 = np.random.normal()

    def feedforward(self, x):
        # x is a numpy array with 2 elements.
        h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
        h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
        o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
        return o1

    def train(self, data, all_y_trues):
        '''
        - data is a (n x 2) numpy array, n = # of samples in the dataset.
        - all_y_trues is a numpy array with n elements.
          Elements in all_y_trues correspond to those in data.
        '''
        learn_rate = 0.1
        epochs = 1000 # number of times to loop through the entire dataset

        for epoch in range(epochs):
            for x, y_true in zip(data, all_y_trues):
                # --- Do a feedforward (we'll need these values later)
                sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
                h1 = sigmoid(sum_h1)

                sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
                h2 = sigmoid(sum_h2)

                sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
                o1 = sigmoid(sum_o1)
                y_pred = o1

                # --- Calculate partial derivatives.
                # --- Naming: d_L_d_w1 represents "partial L / partial w1"
                d_L_d_ypred = -2 * (y_true - y_pred)

                # Neuron o1
                d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
                d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
                d_ypred_d_b3 = deriv_sigmoid(sum_o1)

                d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
                d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)

                # Neuron h1
                d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
                d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
                d_h1_d_b1 = deriv_sigmoid(sum_h1)

                # Neuron h2
                d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
                d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
                d_h2_d_b2 = deriv_sigmoid(sum_h2)

                # --- Update weights and biases
                # Neuron h1
                self.w1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1
                self.w2 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2
                self.b1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1

                # Neuron h2
                self.w3 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3
                self.w4 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4
                self.b2 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2

                # Neuron o1
                self.w5 -= learn_rate * d_L_d_ypred * d_ypred_d_w5
                self.w6 -= learn_rate * d_L_d_ypred * d_ypred_d_w6
                self.b3 -= learn_rate * d_L_d_ypred * d_ypred_d_b3

            # --- Calculate total loss at the end of each epoch
            if epoch % 10 == 0:
                y_preds = np.apply_along_axis(self.feedforward, 1, data)
                loss = mse_loss(all_y_trues, y_preds)
                print("Epoch %d loss: %.3f" % (epoch, loss))

# Define dataset
data = np.array([
  [-2, -1],  # Alice
  [25, 6],   # Bob
  [17, 4],   # Charlie
  [-15, -6], # Diana
])

all_y_trues = np.array([
  1, # Alice
  0, # Bob
  0, # Charlie
  1, # Diana
])

# Train our neural network!
network = OurNeuralNetwork()
network.train(data, all_y_trues)

Epoch 0 loss: 0.336
Epoch 10 loss: 0.208
Epoch 20 loss: 0.132
Epoch 30 loss: 0.090
Epoch 40 loss: 0.066
Epoch 50 loss: 0.051
Epoch 60 loss: 0.041
Epoch 70 loss: 0.034
Epoch 80 loss: 0.028
Epoch 90 loss: 0.025
Epoch 100 loss: 0.022
Epoch 110 loss: 0.019
Epoch 120 loss: 0.017
Epoch 130 loss: 0.016
Epoch 140 loss: 0.014
Epoch 150 loss: 0.013
Epoch 160 loss: 0.012
Epoch 170 loss: 0.011
Epoch 180 loss: 0.011
Epoch 190 loss: 0.010
Epoch 200 loss: 0.009
Epoch 210 loss: 0.009
Epoch 220 loss: 0.008
Epoch 230 loss: 0.008
Epoch 240 loss: 0.007
Epoch 250 loss: 0.007
Epoch 260 loss: 0.007
Epoch 270 loss: 0.007
Epoch 280 loss: 0.006
Epoch 290 loss: 0.006
Epoch 300 loss: 0.006
Epoch 310 loss: 0.006
Epoch 320 loss: 0.005
Epoch 330 loss: 0.005
Epoch 340 loss: 0.005
Epoch 350 loss: 0.005
Epoch 360 loss: 0.005
Epoch 370 loss: 0.005
Epoch 380 loss: 0.004
Epoch 390 loss: 0.004
Epoch 400 loss: 0.004
Epoch 410 loss: 0.004
Epoch 420 loss: 0.004
Epoch 430 loss: 0.004
Epoch 440 loss: 0.004
Epoch 450 loss: 0.004

<img src='./images/loss.png' width="500">
[image source] https://victorzhou.com/blog/intro-to-neural-networks/

In [41]:
# Make some predictions
emily = np.array([-7, -3]) # 128 pounds, 63 inches
frank = np.array([20, 2])  # 155 pounds, 68 inches
print("Emily: %.3f" % network.feedforward(emily)) # 0.951 - F
print("Frank: %.3f" % network.feedforward(frank)) # 0.039 - M

Emily: 0.964
Frank: 0.039


In [None]:
# Tensoflow/keras (Google)
# Pytorch   (Facebook)