# <span style="color:green"> Numerical Simulation Laboratory </span>
## <span style="color:brown"> Python Exercise 11 </span>
## <span style="color:orange"> Keras - Neural Network regression </span>

### Overview 

In this notebook our task will be to perform machine learning regression on noisy data with a Neural Network (NN).

We will explore how the ability to fit depends on the structure of the NN. The goal is also to build intuition about why prediction is difficult.

### The Prediction Problem

Consider a probabilistic process that gives rise to labeled data $(x,y)$. The data is generated by drawing samples from the equation

$$
    y_i= f(x_i) + \eta_i,
$$

where $f(x_i)$ is some fixed, but (possibly unknown) function, and $\eta_i$ is a Gaussian, uncorrelate noise variable such that

$$
\langle \eta_i \rangle=0
$$
$$
\langle \eta_i \eta_j \rangle = \delta_{ij} \sigma
$$

We will refer to the $f(x_i)$ as the **true features** used to generate the data. 

To make predictions, we will consider a NN that depends on its parameters, weights and biases. The functions that the NN can model respresent the **model class** that we are using to try to model the data and make predictions.

To learn the parameters of the NN, we will train our models on a **training data set** and then test the effectiveness of the NN on a *different* dataset, the **validation data set**. The reason we must divide our data into a training and test dataset is that the point of machine learning is to make accurate predictions about new data we have not seen.

To measure our ability to predict, we will learn our parameters by fitting our training dataset and then making predictions on our test data set. One common measure of predictive  performance of our algorithm is to compare the predictions,$\{y_j^\mathrm{pred}\}$, to the true values $\{y_j\}$. A commonly employed measure for this is the sum of the mean square-error (MSE) on the test set:
$$
MSE= \frac{1}{N_\mathrm{test}}\sum_{j=1}^{N_\mathrm{test}} (y_j^\mathrm{pred}-y_j)^2
$$

We will try to get a qualitative picture by examining plots on validation and training data.

# Exercise 11.1

In this section, we attempt to fit a linear function:

$f(x) = 2x + 1$

for $x \in [-1, 1]$. For this task, a single neuron is sufficient. Let's evaluate how the quality of the fit depends on a series of parameters:

- The number of training epochs of the model $N_{epoch}$
- The number of training data points $N_{train}$
- The noise in the data $\sigma$

To explore the dependence on these parameters, we started with the values $N_{epoch} = 30$, $N_{train} = 1000$, $\sigma = 0.3$ and increased and decreased them one by one, keeping the others fixed. The evaluation of the result is based on a cost function (mean squared error). We can also compare the model's predictions with the original line and its respective parameters (weight and bias of the neuron, angular coefficient, and y-intercept of the line). In all cases, we maintained a ratio of 10:1 between the number of training and validation data points.

Let's report the results obtained with the initial parameters:

![](Dati_iniziali.png)

## Varying the number of epochs ##

With $N_{epochs} = 300$ we obtain $m=2.0014355$ and $b=1.0067096$.

![](Nepochs_300.png)

With $N_{epochs} = 5$ we obtain $m=1.636246$ and $b=0.96392345$.

![](Nepochs_5.png)

## Varying $N_{train}$ and $N_{valid}$ 

With $N_{train}=100$ and $N_{valid}=10$ we obtain $m=1.7317797$ and $b=0.90938854$.

![](Ntrain_100.png)

With $N_{train}=2500$ and $N_{valid}=250$ we obtain $m=1.9870311$ and $b=0.9919341$.

![](Ntrain_2500.png)

## Varying $\sigma$

With $\sigma= 0.01$ we obtain $m=1.9991245$ and $b=1.0001116$. 

![](Sigma_001.png)

With $\sigma= 2$ we obtain $m=1.9992996$ and $b=1.0387775$. 

![](Sigma_2.png)

We observed that varying $N_{train}$ and $N_{valid}$ impacts the variation of the results more significantly than changing the number of epochs. One might assume that increasing the number of epochs would lead to better results; however, this is not necessarily the case. In fact, increasing the number of epochs can lead to overfitting rather than improved performance. Additionally, varying $\sigma$ greatly affects the results, which is expected because changing the value of $\sigma$ significantly alters the initial data distribution. Lastly, it is worth noting that we achieved reasonable results even for a very noisy system ($\sigma = 2$).


# Exercise 11.2

We want to perform the fit of a more complex polynomial function:

$f(x) = 4 - 3x - 2x^2 + 3x^3$

for $x \in [-1, 1]$. For this purpose, we experiment with different neuron arrangements that could be more or less effective. We have chosen $N_{train} = 5000$, $N_{valid} = 500$, $\sigma = 0.3$, $N_{epoch} = 80$. The points of the validation set are represented here:

![](Dataset_112.png)

## First Option: Single Hidden Layer

For our initial approach, we considered a neural network with only one hidden layer. We varied the number of neurons in this layer, testing with $N = 6$, $50$, and $100$. The structure of these neural networks can be represented as $1|N|1$, where:

- The first "1" represents the input layer with one neuron (for $x$)
- $N$ represents the number of neurons in the hidden layer (varying as 6, 50, or 100)
- The last "1" represents the output layer with one neuron (for $f(x)$)

Below, graphs illustrating:

1. The cost function's evolution during training
2. The model's predictions compared to the true function

These visualizations will help us assess the performance of each network configuration.

## 1 layer and 6 neurons

![](1layer6neuroni.png)

## 1 layer and 50 neurons

![](1layer50neuroni.png)

## 1 layer and 100 neurons

![](1layer100neuroni.png)

Certainly, increasing the number of neurons in a single layer improves the model's fit. However, we observed that the neural network's behavior remains quite similar when increasing from 50 to 100 neurons. Moreover, its ability to predict function values outside the original range is very limited.

## Second Option: Four Hidden Layers

For our next approach, we explore a deeper neural network architecture. This structure consists of five hidden layers, each containing four neurons. The complete architecture can be represented as:

$1|10|10|10|10|1$

This notation describes:
- An input layer with 1 neuron
- Four hidden layers, each with 10 neurons
- An output layer with 1 neuron

This deeper structure allows for more complex feature extraction and potentially better approximation of our polynomial function.

Below, we present graphs similar to those in the first option:

1. The evolution of the cost function during training
2. The model's predictions compared to the true polynomial function

These visualizations will help us assess whether this deeper architecture provides any advantages over the simpler models from our first option.

![](4layer10neuroni.png)

In this scenario, the model fits the data quite well, although not better than when using just one hidden layer. Also, it still faces challenges when predicting values outside the original range.

### Exercise 11.3
  
Try to extend the model to fit a simple trigonometric 2D function such as $f(x,y) = \sin(x^2+y^2)$ in the range $x \in [-3/2,3/2]$ and $y \in [-3/2,3/2]$.

After analyzing the results of the previous exercises, we decided to use a neural network structured with 3 hidden layers, each containing a different number of neurons. After experimenting with various configurations, we settled on the following structure: $2|35|25|20|1$. In earlier exercises, we observed that increasing the number of neurons in a layer improves the fitting. We believe that, given the complexity of the function to evaluate, using more than one hidden layer will also be beneficial. However, it is worth noting that in exercise 11.2, adding more layers did not necessarily result in better fitting. We also set $N_{train}=10000$, $N_{epoch}=100$ and $\sigma=0.2$.

We show in the following graph the function we want to fit and the validation data.

![](Dataset113.png)

Next, we present the graphs of the cost function and the model's predictions.

![](Lossandpredict113.png)

We can deduce that the fit is quite good, with the exception of the points at the "corners" of the domain.

All the images and data shown in this file are obtained by running the code in LSN_Exercises_11.ipynb.