# Neural Networks and Deep Learning

## 1. History

**Where does intelligence and learning come from?** \
In order to mimic human intelligence, researchers started looking at the neuronal circuitry in the brain. Human brain has close to 90 billion neurons, interconnected in neuronal pathways. The cells in brain are different from other cells in the body. 

The so called neurons look like this:
| Image of human neuron | Image of neurons in macaque |
| --- | --- |
|![Photo by Bob Jacobs, Colorado College.](./resources/golgi2.jpg)| ![Photo by BrainMaps.org](./resources/smi32-macaque.jpg) |
| This is the picture of an actual human neuron (neocortical pyramidal neuron, stained via Golgi technique). <br/> [Photo by Bob Jacobs, Colorado College.](https://en.wikipedia.org/wiki/Golgi%27s_method#/media/File:GolgiStainedPyramidalCell.jpg) <br/> Golgi's method stains a limited number of cells at random using silver chromate reaction.  It is still not known why a certain cell would undergo the reaction while a cell right next to it would not. The result is that the morphology of the cells can be clearly seen without contamination from nearby dendrites from other cells. Read more at [1](https://cellularscale.blogspot.com/2012/03/seeing-cells-nissl-and-golgi-together.html),[2](https://embryo.asu.edu/pages/neuron-doctrine-1860-1895). | SMI32-immunoreactive pyramidal neuron in medial prefrontal cortex of macaque. <br/> [Photo by BrainMaps.org](https://brainmaps.org/index.php?p=screenshots)|

The structure of a neuron is the following: \
![Photo by BruceBlaus](./resources/blausen_multipolarneuron.png) \
The neuron is essentially a computation unit which performs the following operations: 
1. **Input reception**: receives input from other neurons the synapses located on its dendrites, these inputs can be excitatory (increases the likelihood of neuron firing) or inhibitory (decreasing the likelihood).
2. **Signal integration**: the cell body integrates these incoming inputs and computes a response. 
3. **Signal transmission**: If this response exceeds a certain threshold, the neuron generates an electrical signal called action potential which travels via its axon.
4. **Output release**: When the action potential reaches the end of the axon, it triggers the release of neurotransmitters in the synaptic gap. These neurotransmitters then bind to the receptors on the dendrites of adjascent neurons, thereby transmitting the signal.

It is important to note that this is a simplification. The neurons used in neural networks today are modelled after this simplified explanation, which the neuroscientists from 1950s were familiar with. Still, the neural networks perform really well.

**How to model an artificial neuron after a biological neuron?** \
An artificial neuron should perform these 4 operations as well. 
![Photo by ashishbhatti](./resources/artificial_neuron.png)

An artificial neuron does the following:
1. It receives input from multiple other neurons, say n.
   It multiplies each input with a weight, increasing / decreasing input significance.
   (Trying to mimic the excitatory / inhibitory effect).
3. It sums these weighed inputs and adds a bias, also known as threshold.
   $$result, z = \sum_{i=1}^{n} w_ix_i  + b$$
4. The result is passed on to an activation function, which produces an output.
   $$y = f(z)$$
6. The output is released to other neurons or as general output.

**The weights are learned during the training process.**

Note that, if you use logistic function (sigmoid function) as the activation function, the above neuron is exactly same as logistic regression. As a matter of fact, the logistic regression model, or rather its generalization for multiclass classification, called the softmax regression model, is a standard unit in a neural network.

## 2. Perceptron

A neural network with a single neuron is called a perceptron. In this section we will develop the mathematical model of a single neuron and we will implement it in code.

**Mathematics of a single neuron**
$$z = \sum_{i=1}^{n} w_ix_i  + b$$
$$y = f(z)$$

In the above figure we define a neuron which performs above 2 operations, where:
- $x_i \in \Real $ are inputs at different synapses.
- $w_i \in \R$ are the synaptic weights.
- $b \in \R$ is the neuron's bias or threshold.
- $z = w_1x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n$ is the summed value of weighted inputs at the neuron's body. Note $z \in \R$.
- $f : \R \arrow \R$ is an activation function for the axon, which takes weighted sum as input. The output produced by this activation function $y$, is the output at the axon terminal.

In vector form we can write the above equations as:
$$y = f(z) = f(\textbf{w}.\textbf{x} + b)$$
where $\textbf{w} \in \R3$ is the synaptic weight vector, $\textbf{x} \in \R3$ is the input vector, and $.$ is the dot product.

**What is an activation function?** \
The activation function is essentially a mathematical function which takes weighted sum as input and computes a response based on that. This tries to mimic the generation of action potential in a neuron, depending on the threshold.

We can use any kind of function as activation function. Some popular ones are:

| S.No | Activation Function | Description |
| --- | --- | --- |
|1| Identity | This is when there is no activation function. <br/> Here nothing is done to the weighted input, it is passed as the output as it is. |
|2| Binary Step | The output of this function is either 0 or 1. <br/> With this our neuron can be used as an ON-OFF switch. |
|3| Logistic (aka Sigmoid or Soft Step) | Restricts the output between [0,1]. |
|4| tanh | Hyperbolic tangent. Restricts the output between [-1,1]. | 
|5| ReLU | Rectified Linear Unit <br/>   |
|6| Gaussian |  |
|7| Softmax |  |



**How much a single neuron is useful?** \
This is similar to asking the question that how much a single logic gate is useful. We know that a single logic gate can perfom simple calculation, but when arranged in a network, it becomes a computer.

We can see that our neuron takes inputs and produces outputs, and that is it. If the weights and bias are set to certain values, depending on the problem, the output can be useful, otherwise we just have a random number generator.

Thus in order to make a neuron useful, we need to train it. In order to train our neuron we need a separate trainer program which can do the following:
- evaluate the neuron's performance throughout the training
- can improve neuron's performance if it makes mistakes

**How to train a single neuron?**