### An Introduction to Neural Networks

What makes people smarter than computers?

If you were to ask this question in 1980s you might have gotten the answer that humans are able to perceive objects in natural scenes and note their relationships, understand language and retrieve contextually appropriate information from memory, make plans and carry out contextually appropriate actions as well as carry out a wide range of other natural cognitive tasks.

Today though you might have a different answer, as computers are increasingly able to perform tasks that were originally thought to be the sole domain of man. Computers can now dig through all your photos and tell you which ones have your face in them², they can understand natural language and hold a conversation, they can translate between languages, they can make art, they can drive you from one point to another through busy city streets, they can play games exceptionally well and defeat our greatest champions, and perform a host of other tasks. With each passing day the capabilities of computers keeps expanding and things that are strictly the providence of humans keep shrinking.

The cause of this Cambrain explosion in cognitive capabilities can be traced to an invention that changed how computers process information; the perceptron - the simplest learning machine.

#### The Perception

The perceptron was first developed in 1943 by Warren McCulloch a neuropsychiatrist and Walter Pitts a mathematician, as a result of their research into the behaviour of the neurons in the brain.
 
Unlike biological neurons which have complicated features and complex firing patterns and timings, the perceptron is a simple model of a neuron which thresholds a weighted sum of inputs to get an output.
 
It is able to classify an input into two possible categories and adjust the weights assigned to these inputs depending on whether it's prediction was correct or not. When this is done repeatedly, the perceptron is slowly able to learn what the correct classifications are and make more accurate predictions.

The simplicity of the model made it possible to implement in a digital computer which Frank Rosenblatt did in 1957.

The first implementation of the perceptron was a program that ran on the IBM 704, a computer the size of a room. The computer was tasked with distinguishing between a series of punch cards. At first the computer could not tell them apart, but after running 50 trials it had taught itself to correctly identify cards marked with a square on the left and the other on the right. Rosenblatt improved on this feat by building the Mark I; a visual pattern classifier implemented in custom-built hardware with the ability to handle even more complex patterns.

This mimics the computational architecture the brain uses to process information and complete tasks that require the simultaneous consideration of many pieces of constraints.

#### Understanding Perceptrons

The perceptron is composed of:
1. Input vector
2. Weight vector
3. Neuron function
4. Output

To model the biological neuron phenomenon, the perceptron performs two consecutive functions: it calculates the weighted sum of the inputs to represent the total strength of the input signals, and it applies a step function to the result to determine whether to fire the output `1` if the signal exceeds a certain threshold or `0` if the signal doesn’t exceed the threshold.

Not all input features are equally useful or important and to represent that, each input node is assigned a weight value, called its connection weight, to reflect its importance.

Inputs assigned greater weight have a greater effect on the output. If the weight is high, it amplifies the input signal; and if the weight is low, it diminishes the input signal. In common representations of neural networks, the weights are represented by lines or edges from the input node to the perceptron.

##### Weighted Sum Function

Also known as a linear combination, the weighted sum function is the sum of all inputs multiplied by their weights, and then added to a bias term. This function produces a straight line represented in the following equation:

In [9]:
inputs = [1, 2, 3]
weights = [0.2, 0.8, -0.5]
bias = 1

output = (inputs[0]*weights[0] +
            inputs[1]*weights[1] +
            inputs[2]*weights[2] + bias)

print(output)

1.3


In [10]:
import numpy as np

weights = np.array(weights)
output = np.dot(weights.T,inputs) + bias

print(output)

1.3


#### What is a bias in the perceptron, and why do we add it?

The function of a straight line is represented by the equation (y = mx + b), where b is the y-intercept. To be able to define a line, you need two things: the slope of the line and a point on the line. The bias is that point on the y-axis. Bias allows you to move the line up and down on the y-axis to better fit the prediction with the data. Without the bias (b), the line always has to go through the origin point (0,0), and you will get a poorer fit.

The input layer can be given biases by introducing an extra input node that always has a value of 1, as you can see in the next figure. In neural networks, the value of the bias (b) is treated as an extra weight and is learned and adjusted by the neuron to minimize the cost function.

#### Step Activation Function


In both artificial and biological neural networks, a neuron does not just output the bare input it receives. Instead, there is one more step, called an activation function; this is the decision-making unit of the brain. In ANNs, the activation function takes the same weighted sum input from before and activates (fires) the neuron if the weighted sum is higher than a certain threshold. This activation happens based on the activation function calculations. 

The simplest activation function used by the perceptron algorithm is the step function that produces a binary output (0 or 1). It basically says that if the summed input ≥ 0, it “fires” (output = 1); else (summed input < 0), it doesn’t fire (output = 0)

In [11]:
def step_function(output): 
  if output <= 0:
    return 0
  else:
    return 1
  
prediction = step_function(output)

print(prediction)

1


#### How does the perceptron learn?

The perceptron uses trial and error to learn from its mistakes. It uses the weights as knobs by tuning their values up and down until the network is trained.

The perceptron’s learning logic goes like this:
1. The neuron calculates the weighted sum and applies the activation function to make a prediction. This is called the feedforward process.
2. It compares the output prediction with the correct label to calculate the error: `error = label – prediction`
3. It then updates the weight. If the prediction is too high, it adjusts the weight to make a lower prediction the next time, and vice versa.
4. Repeat!

This process is repeated many times, and the neuron continues to update the weights to improve its predictions until step 2 produces a very small error (close to zero), which means the neuron’s prediction is very close to the correct value. At this point, we can stop the training and save the weight values that yielded the best results to apply to future cases where the outcome is unknown.