# Neural Networks

![Real NN](https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Forest_of_synthetic_pyramidal_dendrites_grown_using_Cajal%27s_laws_of_neuronal_branching.png/440px-Forest_of_synthetic_pyramidal_dendrites_grown_using_Cajal%27s_laws_of_neuronal_branching.png)

## Introduction

Neural networks, often referred to as artificial neural networks (ANNs) or simply neural nets, are computational models inspired by the structure and functionality of the human brain. They are used in computer science and artificial intelligence (AI) to enable machines to learn from data and perform tasks such as pattern recognition, classification, decision making, and prediction. Neural networks have been instrumental in the success of deep learning, a subfield of machine learning.

A neural network consists of interconnected nodes or artificial neurons, organized into layers. There are typically three types of layers in a neural network:

* Input layer: This is the layer that receives input data. Each input node corresponds to a feature of the input data.
* Hidden layer(s): These are the layers between the input and output layers. They perform the bulk of the computations in the network. There can be multiple hidden layers in a deep neural network, which allows the network to learn complex, hierarchical features.
* Output layer: This is the final layer that produces the result or prediction. The number of output nodes depends on the task being performed, such as the number of classes in a classification problem.

![ANN](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/500px-Colored_neural_network.svg.png)

The neurons in each layer are connected to the neurons in the adjacent layers via weighted connections. During the training process, the neural network adjusts these weights to minimize the error between its predictions and the actual target values. This is typically done using a technique called backpropagation, which involves computing gradients of the error with respect to the weights and updating the weights accordingly.

Activation functions play a crucial role in neural networks, introducing non-linearity into the network and enabling it to learn complex patterns.

## History

The history of neural networks can be traced back to several key milestones and developments. Here's a brief overview of how neural networks evolved over time:

* Charles Scott Sherrington and later Edgar Adrian. They hypothesized that the brain was composed of interconnected cells, which they called "neurons," and that these neurons communicated with each other via synapses.

![Rats](https://neurosciencenews.com/files/2019/04/emotional-mirror-neurons-rats-neuroscienews-public.jpg)

https://www.nobelprize.org/prizes/medicine/1932/summary/

* 1943: Warren McCulloch and Walter Pitts introduced the concept of artificial neurons in their paper "A Logical Calculus of the Ideas Immanent in Nervous Activity." They proposed a simplified computational model of biological neurons, which laid the groundwork for future research in neural networks.

* 1957-1958: Frank Rosenblatt developed the Perceptron, the first neural network model with learning capability. The Perceptron is a simple, single-layer neural network that can learn linearly separable patterns. Although it had limited capabilities, it was a significant milestone in neural network research.

* 1969: Marvin Minsky and Seymour Papert published their book "Perceptrons," which highlighted the limitations of single-layer neural networks and hampered further research on neural networks for a while. They showed that simple Perceptrons were unable to solve some basic problems, such as the XOR problem.

* 1974-1986: The "AI Winter" occurred during this period, marked by reduced funding and interest in neural network research due to the limitations of early models.

* 1986: The backpropagation algorithm was popularized by Geoffrey Hinton, David Rumelhart, and Ronald Williams.
[Paper at Nature](https://www.nature.com/articles/323533a0)
 Backpropagation enabled the training of multi-layer neural networks, overcoming some of the limitations highlighted by Minsky and Papert. This led to a resurgence in neural network research and marked the beginning of the "connectionist" era.

* Late 1980s and early 1990s: Recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) networks were introduced. RNNs, proposed by John Hopfield in 1982 and further developed by others, allowed for the processing of sequences of data, which was crucial for tasks like speech and language processing. LSTM, introduced by Sepp Hochreiter and Jürgen Schmidhuber in 1997, addressed the vanishing gradient problem in RNNs, enabling them to learn long-range dependencies more effectively.

* 1998: Yann LeCun and his colleagues introduced the LeNet-5 convolutional neural network (CNN) architecture for digit recognition. This was an important milestone for computer vision and laid the foundation for more complex CNN architectures.

* 2006-2012: The "Deep Learning" era began, marked by breakthroughs in training deep neural networks. In 2006, Geoffrey Hinton and his colleagues introduced the concept of "greedy layer-wise pretraining," which made it possible to train deeper networks. In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton developed the AlexNet architecture, which significantly outperformed other models in the ImageNet Large Scale Visual Recognition Challenge, highlighting the potential of deep learning.

* 2013 onwards: Neural networks started dominating various AI and machine learning tasks. Researchers introduced advanced architectures such as the Transformer by Vaswani et al in 2017, which revolutionized natural language processing and led to the development of models like BERT, GPT, and T5. These models have achieved state-of-the-art performance across a wide range of NLP tasks, such as machine translation, sentiment analysis, and question-answering.

* 2018-2021: Pre-trained language models based on the Transformer architecture, such as BERT, GPT-2, GPT-3, and other variants, have become increasingly powerful and ubiquitous. These models leverage massive amounts of data and computational resources to achieve remarkable performance in various NLP tasks, even with little or no fine-tuning.

Throughout this timeline, advances in hardware, especially GPUs (Graphics Processing Units), have been crucial in enabling the training and deployment of increasingly large and complex neural network models. In addition, the development of software libraries and frameworks, such as TensorFlow, PyTorch, and Keras, has made it easier for researchers and practitioners to experiment with and implement neural networks.

As a result of these advances, neural networks have become the backbone of many AI applications, from computer vision and natural language processing to reinforcement learning and robotics. The field continues to evolve rapidly, with ongoing research into new architectures, optimization techniques, and applications.

## Main ideas behind neural networks

Neural network algorithms are inspired by the structure and functioning of the human brain. They consist of interconnected artificial neurons organized into layers, which work together to learn from data and perform tasks like pattern recognition, classification, and prediction. The main ideas behind neural network algorithms include:

* Artificial neurons: The basic building block of neural networks, artificial neurons, or nodes, are designed to mimic biological neurons. They receive input from other neurons (or external data), process it, apply an activation function to introduce non-linearity, and transmit the output to the next layer of neurons.

* Network architecture: Neural networks are organized into layers, typically consisting of an input layer, one or more hidden layers, and an output layer. Different architectures, such as feedforward networks, recurrent networks, and convolutional networks, can be designed to address specific problems or data types.

* Connection weights: The connections between neurons in adjacent layers have associated weights, which determine the strength of the connections. During the learning process, these weights are adjusted to minimize the error between the network's predictions and the actual target values.

* Activation functions: To introduce non-linearity into the network and enable it to learn complex patterns, activation functions are applied to the output of each neuron. 

## Shallow vs Deep Neural Networks

The main difference between shallow and deep neural networks lies in the number of hidden layers they contain. Both types of networks consist of input and output layers, but the complexity of their internal structures varies.

### Shallow Neural Networks:

1. Shallow neural networks have a limited number of hidden layers, typically just one or two.
2. They can learn simple patterns and representations in the input data.
3. Shallow networks are less computationally intensive and can be trained more quickly.
4. They may not be suitable for complex tasks or for learning high-level abstractions, as they lack the capacity to capture intricate relationships in the data.

### Deep Neural Networks:

1. Deep neural networks consist of multiple hidden layers, which can range from a few to hundreds or even thousands in some architectures.
2. They can learn complex, hierarchical patterns and representations in the input data.
3. Deep networks are more computationally intensive and may require specialized hardware (like GPUs) and optimization techniques to train efficiently.
4. They are suitable for a wide range of tasks, including image recognition, natural language processing, and speech recognition, where they often outperform shallow networks.

In summary, the main difference between shallow and deep neural networks is the depth of their architectures, with deep networks having more hidden layers. This added depth allows deep networks to learn more complex patterns and representations, making them more suitable for a wide range of challenging tasks. However, they can also be more computationally intensive and may require more sophisticated training techniques.

## Backpropagation

Backpropagation (short for "backward propagation of errors") is a widely used optimization algorithm for training feedforward artificial neural networks. It is a supervised learning technique that adjusts the weights of the network by minimizing the error between the network's predictions and the actual target values. The backpropagation algorithm consists of two main steps: forward pass and backward pass.

* Forward pass: In this step, the input is passed through the network to compute the output. The input data is fed into the input layer, processed through the hidden layers, and then transformed into an output in the output layer. This step involves computing the weighted sum of the inputs and applying the activation functions for each neuron in the network.

* Backward pass: In this step, the error between the network's output and the actual target values is calculated and propagated backward through the network. The error is used to update the weights in each layer. This process involves the following steps:

  1. Compute the output error: Calculate the difference between the network's output and the actual target values. This can be measured using a loss function, such as mean squared error or cross-entropy loss.

  2. Calculate gradients: Compute the gradient of the error with respect to the weights in the network using the chain rule of calculus. This step involves calculating the partial derivatives of the error with respect to each weight in the network, which represents how much the error would change if a particular weight were adjusted.

  3. Update weights: Adjust the weights in the network based on the calculated gradients. This is done using an optimization algorithm, such as gradient descent or its variants like stochastic gradient descent, Adam, or RMSprop. The weights are updated by a small fraction (learning rate) of their corresponding gradients to minimize the overall error.

The forward and backward passes are repeated for each data point (or batch of data points) in the training dataset, iteratively adjusting the weights until the error converges to a minimum value or a stopping criterion is reached. This process of updating the weights using backpropagation is typically performed over multiple epochs (i.e., complete passes through the training dataset) to ensure that the network learns the underlying patterns in the data.

In summary, backpropagation is a crucial learning algorithm for neural networks that involves computing the error between the network's predictions and actual target values, calculating gradients, and updating the weights to minimize the error. It is an efficient and widely used technique for training feedforward neural networks in supervised learning tasks.

## Activation Functions

An activation function in a neural network is a mathematical function applied to the output of an artificial neuron or node. It serves to introduce non-linearity into the network, allowing the model to learn and represent complex patterns and relationships in the input data. Without activation functions, neural networks would be limited to modeling linear relationships, which would severely restrict their ability to solve a wide range of problems.

The activation function takes the weighted sum of the inputs (along with a bias term) as its input and transforms it into an output value. This output value is then passed on to the neurons in the subsequent layer. By applying non-linear activation functions, the network can learn to approximate any continuous function, making it a powerful tool for various machine learning tasks.

![Activ Func](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*LsOuo_SPh7wq4tqH.png)
src: https://3344.medium.com/activation-functions-1dae8fedd951

Activation functions play a crucial role in neural networks by introducing non-linearity, enabling the network to learn complex patterns and representations. Some common activation functions used in neural networks are:

* Sigmoid (logistic) function: This function maps the input values to a range between 0 and 1. It is commonly used for binary classification tasks and in the output layer for producing probabilities. The sigmoid function is defined as:

`f(x) = 1 / (1 + exp(-x))`

* Hyperbolic Tangent (tanh) function: The tanh function is similar to the sigmoid function but maps the input values to a range between -1 and 1. It is often preferred in hidden layers as it has a zero-centered output, which can help with gradient optimization. The tanh function is defined as:

`f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))`

* Rectified Linear Unit (ReLU) function: ReLU is a popular activation function for deep neural networks, as it is computationally efficient and helps mitigate the vanishing gradient problem. It outputs the input value if it's positive and zero otherwise. The ReLU function is defined as:

`f(x) = max(0, x)`

* Leaky Rectified Linear Unit (Leaky ReLU) function: Leaky ReLU is a variant of the ReLU function that allows a small, non-zero output for negative input values, which can help prevent dead neurons during training. The Leaky ReLU function is defined as:

`f(x) = max(alpha * x, x)`

where alpha is a small positive constant (e.g., 0.01)

* Exponential Linear Unit (ELU) function: ELU is another variant of the ReLU function that has a smooth curve for negative input values, which can help with the vanishing gradient problem. The ELU function is defined as:

`f(x) = x if x > 0, alpha * (exp(x) - 1) if x <= 0`

where alpha is a positive constant

Softmax function: The softmax function is commonly used in the output layer of a neural network for multi-class classification problems. It converts the output values into a probability distribution over the classes. The softmax function is defined as:

`f_i(x) = exp(x_i) / sum(exp(x_j) for j in range of output neurons)`

These are just a few of the many activation functions that can be employed in neural networks. The choice of the activation function depends on the specific problem, the network architecture, and the type of data being used.