<h1><center>Lecture 11 Introduction to Artificial Neural Networks (ANN)</center></h1>

<h3> References </h3>

1. https://theclevermachine.wordpress.com/2014/09/6/
2. https://en.wikipedia.org/wiki/Activation_function
3. https://www.geeksforgeeks.org/activation-functions-neural-networks/
4. https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0


<h2>11.1 What is an Artificial Neural Network?</h2>


Artificial Neural Networks are the computational models that are inspired by the human brain. Many of the recent advancements have been made in the field of Artificial Intelligence, including Voice Recognition, Image Recognition, Robotics using Artificial Neural Networks. Artificial Neural Networks are the biologically inspired simulations performed on the computer to perform certain specific tasks like

1. Clustering
2. Classification
3. Pattern Recognition

Artificial Neural Networks, in general – is a biologically inspired network of artificial neurons configured to perform specific tasks.



<h2>11.2 Parts of Neuron and their Functions</h2>

The typical nerve cell of the human brain comprises of four parts

1. **Function of Dendrite**. It receives signals from other neurons.
2. **Soma (cell body)**. It sums all the incoming signals to generate input.
3. **Axon Structure**. When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the other neurons.
4. **Synapses Working**. The point of interconnection of one neuron with other neurons. The amount of signal transmitted depend upon the strength (synaptic weights) of the connections. The connections can be inhibitory (decreasing strength) or excitatory (increasing strength) in nature.

So, neural network, in general, is a highly interconnected network of billions of neuron with trillion of interconnections between them.

In [6]:
from IPython.display import Image
Image(filename="D:/home/Machine Learning Course/images/Structure-Of-Neurons-In-Brain.jpg")

<IPython.core.display.Image object>

**Figure 1.** *Structure of a biological neuron*

<h2>11.3  Comparison Between Artificial and Biological Neural Networks </h2>

The dendrites in the Biological Neural Network are analogous to the weighted inputs based on their synaptic interconnection in the Artificial Neural Network.

The cell body is comparable to the artificial neuron unit in the Artificial Neural Network which also comprises of summation and threshold unit.

Axon carries output that is analogous to the output unit in case of Artificial Neural Network. So, ANN is modeled using the working of basic biological neurons.


In [7]:
from IPython.display import Image
Image(filename="D:/home/Machine Learning Course/images/Analogy_of_Biological_Network_with_Artificial_Neural_Network.jpg")

<IPython.core.display.Image object>

**Figure 2.** *Comparison between artifical and biological neural networks.

<h2><span class="header-section-number">11.4</span> How does an artificial neuron works? </h2>

Figure 2 shows how an artificial neuron works as part of an ANN. The Artificial Neural Network receives information from the external world in the form of a pattern  in a vector form, say $x=(x_1,\ldots,x_n)^t$. In this case the input layer has $n$ artificial neurons.


Each component $x_i$ of the input is multiplied by a corresponding weight $w_{i}$. The Weights are the information used by the neural network to solve a problem. These weights must be learned (fitted) in the training step. The weights represent the knowledge that the ANN have about the problem to be solved. From the 
biological metaphor, the weights represents the strength of the interconnection between neurons inside the Neural Network.

Inputs and weights are combined and  summed up inside the computing unit (artificial neuron), and a bias  is added, as figure 2 shows. 

The sum is a  real number: $z = \sum_i x_iw_i + b$, $z \in\mathcal{R}$. This sum is transformed through an activation function, say $g(\cdot)$, to obtain the net output $x^* = g(z)$. The activation function determines the behavior of the neuron. For details see the next section.

In [8]:
from IPython.display import Image
Image(filename="D:/home/Machine Learning Course/images/Working-Of-Artificial-Neuron.jpg")

<IPython.core.display.Image object>

**Figure 3.** *Conceptual structure of an artificial neuron in a ANN*.


<h2><span class="header-section-number">11.5</span> Neural Network Architecture Types </h2>

1. **Single layer Perceptron Model**. Neural Network is having two input units and one output units with no hidden layers. These are also known as ‘single layer perceptrons.
2. **Radial Basis Function Neural Network**. These networks are similar to the feed-forward Neural Network except radial basis function is used as the activation function of these neurons. They are artificial neural networks that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. 
3. **Multilayer Perceptron Neural Network**. These networks use more than one hidden layer of neurons, unlike single layer perceptron. These are also known as Deep Feedforward Neural Networks.
4. **Recurrent Neural Network**. Type of Neural Network in which hidden layer neurons has self-connections. Recurrent Neural Networks possess memory. At any instance, hidden layer neuron receives activation from the lower layer as well as its previous activation value.
5. **Long Short-Term Memory Neural Network (LSTM)**. Type of Neural Network in which memory cell is incorporated into hidden layer neurons is called LSTM network.
6. **Hopfield Network**. A fully interconnected network of neurons in which each neuron is connected to every other neuron. The network is trained with input pattern by setting a value of neurons to the desired pattern. Then its weights are computed. The weights are not changed. Once trained for one or more patterns, the network will converge to the learned patterns. It is different from other Neural Networks. The units in Hopfield nets are binary threshold units, i.e. the units only take on two different values for their states and the value is determined by whether or not the units' input exceeds their threshold. Hopfield nets normally have units that take on values of 1 or -1. This model uses the concept of associative memory, and it has been used to explain the human memory work.
7. **Boltzmann Machine Neural Network**. These networks are similar to the Hopfield network except some neurons are input, while others are hidden in nature. The weights are initialized randomly and learn through backpropagation algorithm. They are theoretically intriguing because of the locality and Hebbian nature of their training algorithm (being trained by Hebb's rule), and because of their parallelism and the resemblance of their dynamics to simple physical processes. Boltzmann machines with unconstrained connectivity have not proven useful for practical problems in machine learning or inference, but if the connectivity is properly constrained, the learning can be made efficient enough to be useful for practical problems.
8. **Convolutional Neural Network**. CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. The "fully-connectedness" of these networks makes them prone to overfitting data. Convolutional networks were inspired by biological processes[3][4][5][6] in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
9. **Modular Neural Network**. It is the combined structure of different types of the neural network like multilayer perceptron, Hopfield Network, Recurrent Neural Network, etc. which are incorporated as a single module into the network to perform independent subtask of whole complete Neural Networks.

In [9]:
from IPython.display import Image
Image(filename="D:/home/Machine Learning Course/images/Popular-Neural-Network-Architecture.jpg")

<IPython.core.display.Image object>

**Figure 4.** *ANN arquitecture types*

<h2><span class="header-section-number">11.6</span> Perceptron multilayer: Deep learning ANN </h2>

Perceptron multilayer is currently the more used arquitecture. Deep learning networks (DNN) are preceptron multilayer with one ore more hidden layer. Deep is determined by the number of hidden layers.

Neural Networks can be viewed as weighted directed graphs in which artificial neurons are nodes, and directed edges with weights are connections between neuron outputs and neuron inputs. Figure 1,  shows 
a multilayer perceptron with two hidden layers.  These kind of ANN are also known as Deep Feedforward Neural Networks. 

In [10]:
from IPython.display import Image
Image(filename="D:/home/Machine Learning Course/images/Artificial-Neural-Network-Architecture.jpg")

<IPython.core.display.Image object>

**Figure 5.** *Arquitecture of a multilayer perceptron with two hidden layers*.

<h3><span class="header-section-number">11.7</span> Mathematical Approach of a DNN  </h3>

The first hidden layer of a ANN could be a dimension reduction. However, it is more practial to reduce the data previously. In general. So, from this point we assume that the training data is a reduced data (if it is necessary the reduction).

In this section we consider only one hidden layer.
It is assumed that:

1. The input layer has $n$ neurons. So the input values are $n$-vectors.
2. The hidden layer has $q$ neurons. This implies that there exist  $q$ conexions from each input neuron to the hidden layer. In total there are $n\times q$ conexions between the input layer to the hidden layer. Each  conexion has a weight $w^{1}_{ij}$, which representing the strength of the conexion between the neuron $i$ in the input layer and the neuron $j$ in the hidden layer.
3. The output layer has $L$ neurons. This implies that there exist  $L$ conexions from each hidden neuron to the output  layer. In total there are $q\times L$ conexions between the hidden layer to the output layer. Each  conexion has a weight $w^{2}_{jk}$,  representing the strength of the conexion between the neuron $j$ in the hidden layer and the neuron $k$ in the output layer.


$\leadsto$ **Vector notation**. *For ease we will denote denote the vectors in a row format, as is customary in mathematics. In statistics it is common the column notation*.


<h2> Mathematical modeling of a DNN  with a hidden layer</h2>


<h3> From the input layer to the hidden layer</h3>


Let $X_{N\times n}$ the matrix of the $N$  input training data. 

<h4> Affin transformation of the data</h4>

Let $W^{1}$ be a $n\times q$  matrix whose rows are the weigths vectors $w^{1}_{ij}$, which conceptually connect the input layer with the hidden layer. Let $b^{1}$ be the $q$-vector of the corresponding biases. Suppose that $b=(b^1_1,\ldots, b^1_j,\ldots,b^1_q)$,. Thus , $b^1_j$ is the bias in the neuron $j$ of the hidden layer. 


Given a un input vector $x$, a row of matrix $X$, the complete input to the hidden layer is obtained as 


$$
z^{1} = xW^{1} + b^{1} \quad (1)
$$

$\leadsto$ Note that have assume that there are $n$ neurons in the input layer. If $q<n$, $W^{1}$ makes a proyection onto a subspace of of reduced dimension. If $q>n$, $W^{1}$  makes an embedding is a space of higher dimension. 

Equation (1) is an **affin transformation**, which can expressed in homogeneous coordinates as follows.

The homogenous coordinates are obtained as $\tilde{x} = (x,1)$, $\tilde{z}^1 = (z^1,1)$. Let $\tilde{W}$ defined as:

$$
\tilde{W}^1 = \begin{pmatrix} W^1 & 0\\ b^1 & 1\end{pmatrix}
$$

thus we have that,

$$
\tilde{z}^1 = \tilde{x}\tilde{W}^1.
$$

This equation means that an affin transformation can be expressed as a linear transformation in homogeneous coordinates. For details, see the next lecture, about affin transformations.


<h4>Activation of the neurons in the hidden layer</h4>


Let $f^1$ the activation function in the hidden layer. Thus, $f^1$ is apply to eaxh element of $z^1$. Let $x^1 = (x_1^1,\ldots,x_j^1,\ldots, x_q^1)$. The effect of the activation function  is written as

$$
x^1 = f^1(z^1),
$$
where $x^1_j = f^1(z^1_j)$, for $j=1,\ldots,q$.


<h3> From the hidden layer to the output layer</h3>

<h4> Affin transformation of the data</h4>


Let $W^{2}$ be a $q\times L$  matrix whose rows are the weigths vectors $w^{2}_{jk}$, which conceptually connect the hidden layer with the output layer. Let $b^{2}$ be the $L$-vector of the corresponding biases. 

$\leadsto$ In the  application of a DNN to classication problem, $L$ correspond to the number of clases.


Given $x^1$ the output vector froidden layer, the complete input to the output layer is obtained as 


$$
z^{1} = x^1W^{2} + b^{2}. \quad (2)
$$

In homogeneous coordinates we have that

$$
\tilde{z}^2 = \tilde{x}^1\tilde{W}^2.
$$

<h4>Activation of the neurons in the output layer</h4>


Let $f^2$ the activation function in the output layer. Thus, $f^2$ is apply to each element of $z^2$. Let $y = (y_1,\ldots,y_k,\ldots, y_L)$. The effect of the activation function  is written as

$$
y = f^2(z^2),
$$
where $y_k = f^2(z^2_k)$, for $k=1,\ldots,L$.



<h2> Why do we need non-linear activation functions? </h2>

A neural network without  linear activation functions is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.

To see this fact, suppose that $\tilde{G}^1$ and $\tilde{G}^2$ repressent the matrices associated to the linear activation functions in homogeneous coordinates. Thus, we have that

$$
\tilde{y} =\tilde{x}\tilde{W}^1\tilde{G}^1\tilde{W}^2\tilde{G}^2 = x\tilde{W}
$$,
where $\tilde{W}=\tilde{W}^1\tilde{G}^1\tilde{W}^2\tilde{G}^2$.

$\leadsto$ Thus, in this case the DNN reduces to a simple linear model, that is not very useful in practice.


<h2> A DNN is a vectorial function</h2>

From the equation in the previous section we have that a DNN is a function $f:\mathcal{R}^n \to \mathcal{R}^L$, defined as

$$
y = f(x)  = f^2(f^1( x W^1 + b^1 ) W^2 + b^2).
$$

$\leadsto$ As can be see, if the DNN has more then a hidden layer, the function $f$ can be   extended recursively directly .