# Chapter 1: Neural Networks

<br>

(1.1)=
## 1.1 Deep Learning

**Artificial Intelligence (AI)** is a branch of computer science focused on creating systems that perform tasks requiring **human-like intelligence**, such as language comprehension, pattern recognition, problem-solving, and decision-making. AI aims to enable machines to perform complex tasks in ways that mimic human reasoning and adaptability.

**Machine Learning (ML)** is a subset of AI that involves **training algorithms on data** to identify patterns and make predictions. ML models learn from data and improve their accuracy over time, typically using one of three main approaches:
- **Supervised Learning**: The model is trained on labeled data, where each input is paired with a known output. The model learns to associate inputs with outputs, making it well-suited for tasks such as classification (e.g., image recognition) and regression (e.g., predicting prices).
- **Unsupervised Learning**: The model is trained on unlabeled data, without predefined outputs. This approach is used to discover hidden patterns or groupings within the data, commonly applied in clustering and association tasks.
- **Reinforcement Learning**: The model learns through feedback from rewards and penalties based on its actions. Reinforcement learning is often applied in environments where decision-making is complex, such as strategic games (e.g., chess) and robotics, where learning occurs via trial and error.

**Deep Learning (DL)** is a specialized area within ML that uses **neural networks** to recognize complex patterns in large datasets.

```{figure} ../images/deep-learning.png
---
width: 140px
name: deep-learning
---
Deep Learning Overview
```

<br>

(1.2)=
## 1.2 Neural Networks

**Neural networks** are computational models inspired by the structure of the human brain, designed to recognize patterns and make predictions. They consist of **layers of interconnected nodes** (often called neurons) that process information through mathematical operations. 

A basic neural network has the following structure: 
- **Input Layer**: This first layer receives raw data, like images, text, or numerical values. Each node in this layer represents an **input feature**.
- **Hidden Layers**: These intermediate layers between the input and output layers **process information**. Each hidden layer transforms data from the previous layer, allowing the network to progressively learn and recognize patterns.
- **Output Layer**: This final layer provides the network’s **output**, such as classifying an image or predicting a value.

```{figure} ../images/neural-network.png
---
width: 340px
name: neural-network
---
Basic Structure of a Neural Network
```

<br>


(1.3)=
## 1.3 Neurons

In a neural network, each **neuron** is a fundamental unit that takes in **multiple inputs** and processes them to produce a **single output**. As shown in [*Fig. 2 Basic Structure of a Neural Network*](neural-network), each neuron in the hidden and output layers connects to all the neurons in the previous layer. These connections have associated values known as **weights** that represent the strength of the connection between the neurons. 

When an input reaches a neuron, it is multiplied by the weight of its connection, and the results are combined (summed up). An additional value, called a **bias** (b), may be added to adjust the sum. The result may also be passed through an **activation function** (σ), which determines the neuron's output by introducing non-linearity.

```{figure} ../images/neuron.png
---
width: 250px
name: neuron
---
Structure of a Neuron
```

<br>

The output of the above neuron would be:

$$
\text{output} = \sigma\left(w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} + b\right)
$$

Thus, in general, the output of a neuron can be expressed as:

$$
\text{output} = \sigma\left(Σ_{j} w_{j} x_{j} + b\right)
$$

Using matrix multiplication, the above expression can be represented as:

$$
\text{output} = \sigma({w} \cdot {x} + b)
$$

```{important}
The output of the above neuron would be one of the inputs for the neurons in the next layer, and so on, allowing the neural network to learn complex patterns in the data through layers of transformations.
```

```{note}
Plso note that a larger weight indicates a stronger connection between the neurons, while the bias term allows the activation of a neuron to be adjusted independently of its inputs, enabling the model to better fit the training data and capture more complex patterns.
```

<br>

(1.4)=
## 1.4 PyTorch

We will use **PyTorch** to build our neural networks. PyTorch is an open-source machine learning library widely used for building and training deep learning models due to its flexibility, ease of use, and efficient computation. It provides multi-dimensional arrays, known as **tensors**, which are similar to NumPy arrays but optimized for GPU processing. Depending on their dimensions, we will refer to the tensors differently.

A 1-dimensional tensor is called a **vector**. A vector with shape (3) would look like this:

```python
tensor([1, 2, 3])
```

A 2-dimensional tensor is called a **matrix**: A matrix with shape (2, 3) would look like this:

```python
tensor([[1, 2, 3],
        [4, 5, 6]])
```

A tensor with 3 or more dimensions is called an **n-dimensional tensor**. A 3D tensor with shape (2, 2, 3) would look like this:

```python
tensor([[[1, 2, 3],
         [4, 5, 6]],
        [[7, 8, 9],
         [10, 11, 12]]])
```

The following code will set up the environment for working with PyTorch:

In [None]:
# import PyTorch library
import torch

# seed the random number generator for reproducibility
g = torch.Generator().manual_seed(2)

# set print options to avoid scientific notation
torch.set_printoptions(sci_mode=False)


<br>

(1.5)=
## 1.5 Understanding How Neural Networks Learn

Neural networks use supervised learning to **fine-tune their parameters**. In other words, neural network learn by adjusting thier weights and biases by iteratively performing four key steps:
- **Forward pass**: The input data flows through the network layer by layer, producing an output.
- **Loss Function**: The network's output is compared to the true target, and a loss function is used to measure the prediction error.
- **Backpropagation**: The network calculates gradients by propagating the error backward through the layers, determining how much each parameter contributed to the error.
- **Update Parameters**: The calculated gradients are used to adjust the weights and biases through an optimization algorithm (like gradient descent or Adam), updating the parameters to minimize the loss.

<br>

(1.6)=
## 1.6 Forward Pass

We are going to implement a **forward pass** using 5 input examples for the neural network presented in [*Fig. 2 Basic Structure of a Neural Network*](neural-network). This network had 3 input features, 2 hidden layers with 6 neurons each, and 2 output neurons. For simplicity, we are not going to add any bias or activation functions yet.

```{note}
- The `torch.randn` function generates a tensor filled with random numbers drawn from a standard normal distribution (mean = 0, standard deviation = 1).
- The `requires_grad=True` argument indicates that the tensor should track gradients for operations. This enables to run `loss.backward()` during backpropagation so that PyToch computes all the gradients of the loss function with respect to each parameter in the model.
```

In [None]:
# intitialize randomly 5 input examples
x = torch.randn((5, 3), generator=g)                  # (num_examples, input_features)

# intitialize randomly weight matrices
w1 = torch.randn((3, 6), generator=g, requires_grad=True) # (input_to_layer, output_from_layer)
w2 = torch.randn((6, 2), generator=g, requires_grad=True) # (input_to_layer, output_from_layer)

# matrix multiplication
h1 = x @ w1     # (5,6) = (5,3) x (3,6)
h2 = h1 @ w2    # (5,2) = (5,6) x (6,2)

# print input and output matrices
print(f"Input:\n{x}\n")
print(f"Output:\n{h2}")

Input:
tensor([[-2.574, -0.343, -1.281],
        [-0.447,  0.061,  1.326],
        [-1.069, -0.536, -1.345],
        [-0.833,  0.910,  1.061],
        [-0.582,  0.698, -1.133]])

Output:
tensor([[10.319,  5.496],
        [ 8.038, -0.801],
        [ 1.571,  3.432],
        [ 7.749, -0.519],
        [-2.501,  1.623]], grad_fn=<MmBackward0>)


```{important}
Please note that using matrix multiplication, we can efficiently **evaluate in parallel** the outputs for the 5 input examples at the same time.
```

## 1.7 Loss Function

Right now the neural network is outputing a pair of positive and negative values for each input example. To transform these outputs into a probability distribution, we will apply the **Softmax** function, which ensure that the output for each example sums to one, representing the likelihood of each class.

In [None]:
probs = torch.softmax(h2, dim=1)
probs

tensor([[    0.277,     0.723],
        [    0.711,     0.289],
        [    0.963,     0.037],
        [    0.998,     0.002],
        [    0.000,     1.000]], grad_fn=<SoftmaxBackward0>)

In [110]:
# indices of the correct class for each example
targets = torch.tensor([1, 0, 1, 0, 1])
targets

tensor([1, 0, 1, 0, 1])


A **loss function** is a mathematical representation that quantifies how well a machine learning model's predictions match the actual target values. It measures the discrepancy between the predicted outputs and the true outputs in a dataset. The goal of training a model is to minimize this loss function, which helps improve the model's accuracy and performance.


Purpose: The loss function provides feedback on the model's performance during training. It helps in guiding the optimization process by indicating how much the model's predictions deviate from the true values.

Types: There are various types of loss functions, each suitable for different tasks:

Regression Tasks: For predicting continuous values, common loss functions include Mean Squared Error (MSE) and Mean Absolute Error (MAE).
Classification Tasks: For predicting discrete classes, loss functions like Cross-Entropy Loss or Hinge Loss are often used.
Optimization: During training, algorithms like gradient descent use the loss function to update the model's parameters (weights and biases) in order to minimize the loss. The gradients of the loss function with respect to the model parameters are calculated, indicating how to adjust each parameter to reduce the loss.



```{note}
The [`tensor.item()`](https://pytorch.org/docs/stable/generated/torch.Tensor.item.html) method returns the value of the single-element tensor.
```

In [116]:
loss = -torch.log(probs[range(probs.shape[0]), targets]).mean()
print(f"Loss: {loss.item():.4f}")

Loss: 0.7931
