# Neural Networks: Introduction to FCNs

In this tutorial, we will train our first neural network (NN) using PyTorch. Like in the previous examples, our goal is to predict the housing prices in the California Housing dataset.

## Basic imports

In [None]:
from sklearn.datasets import fetch_california_housing
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tqdm import trange
import matplotlib.pyplot as plt
import random

In [None]:
# To ensure reproducability
np.random.seed(42)

In [None]:
def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

## Fetch the california housing dataset

In [None]:
dataset = fetch_california_housing(as_frame=True)

In [None]:
housing_df = dataset['data']
target_df = dataset['target']

In [None]:
# Insert the housing prices in the housing df
housing_df['HousePrice'] = target_df

## Prepare the training and testing set


### Splitting the dataframe (using sklearn)

In [None]:
train_df, test_df = train_test_split(housing_df, test_size=0.2)

In [None]:
# Briefly check whether we have the correct set sizes
print('Test ratio: ', len(test_df) / (len(train_df) + len(test_df)))

In [None]:
feature_columns = ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
target_column = 'HousePrice'

x_train = train_df[feature_columns].values
y_train = train_df[['HousePrice']].values

x_test = test_df[feature_columns].values
y_test = test_df[['HousePrice']].values

### Build a neural network with PyTorch

We will now build our first neural network that is able to predict housing prices based on the eight features provided in the dataset.

The network architecture should look as follows:

<img src="imgs/linear_model_1.png" width="600px"/>

In [None]:
# TODO: Define the network shown above

In [None]:
# TODO: Check whether the network has been defined correctly by
# (a) printing the network
# (b) feeding a sample through the network

## Preprocess the samples 

Before training the neural network, input features should be normalized (very often to the range -1 to 1) or standardized, and converted to a PyTorch tensor.

### Standardize the input features

In this example, we will use Scikit's StandardScaler to standardized the input features. In future examples, we will start to use PyTorch's "on-board capabilities" to preprecess the input features (e.g., https://pytorch.org/vision/main/generated/torchvision.transforms.Normalize.html)

In [None]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

### Convert all samples to a PyTorch tensor

The entire dataset (`x_train` and `x_test`) is stored as numpy array at this point and we still have convert it to a PyTorch tensor.

Tensors are similar to numpy ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and NumPy arrays can often share the same underlying memory, eliminating the need to copy data (see Bridge with NumPy). Tensors are also optimized for automatic differentiation.

In [None]:
# TODO: Convert the dataset to a PyTorch tensor

## Train the neural network

In [None]:
# Fix seed to ensure reproducibility
np.random.seed(42)
torch.manual_seed(0)

### Helper function that computes for RMSE for a given dataset

The *evaluate_model_performance()* function receives a set of features and their target levels as its input, and evaluates the model performance using the root mean square error. In order to do this, the network must be switched to *inference mode* (`model.eval()`).

In [None]:
def evaluate_model_performance(x, y, batch_size=128):
    
    assert x.shape[0] == y.shape[0], 'Feature and target labels a different number of samples'
    
    # During training, the batch norm layer keeps a running estimate of its computed mean and variance. 
    # At inference time, the mean/variance must not be updated. The estimed mean/variance is used for normalization.
    model.eval()
    
    total_error = 0.
    
    num_samples = x.shape[0]
    
    # No gradient have to be computed at inference time. We can disable gradient computation by means of the
    # no_grad() context manager. This will reduce the memory consumption.
    with torch.no_grad():
    
        for indices in batch(range(num_samples), batch_size):
            
            # Convert the sample to a Pytorch tensor
            sample_x = x[indices]
            sample_y = y[indices]
            
            pred = model(sample_x)
            
            total_error += torch.sum((pred - sample_y) **2)
            
    rmse = torch.sqrt(total_error / num_samples) * 100000
    
    # Switch the network back to training mode since want to continue training.
    model.train()
 
    return rmse.numpy()


### Training loop

The training procedure for a neural network looks as follows: First, sample a random set of samples from the training set. This randomly drawn subset is referred to as a **batch**.

The training batch is fed through the network to obtain the prediction (output of the network). Next, we need to update the parameters (weights) of the network. To accomplish this, we first have to compute the loss (a.k.a. cost function) and then compute the gradient with respect to every parameter in the network. Once we obtain the individual gradients, we can update the parameters. For updating the parameters, we use an optimizer such as ADAM or SGD.

In [None]:
# TODO: Implement network training

#### Defining the training duration via its epochs

In the previous section, we introduced the concept of 'iterations' in the context of training a neural network. Specifically, a 'training iteration' is a discrete step in the training process during which the model's parameters, namely weights and biases, are updated to improve its performance. During each iteration, we randomly select a subset of the training data, known as a 'batch', and use this batch to refine the network's parameters. By changing the number of iterations, we controlled the training duration.

In PyTorch, the training duration is typically specified by setting the number of training epochs. An **epoch** is defined as **one complete traversal through the entire training dataset**, where **the neural network processes each sample once**. 

The relationship between epochs and iterations is intricately linked to the size of the training dataset and the batch size. The batch size determines how many data samples the network processes in a single iteration. For instance, if your training dataset comprises 1000 samples and you set a batch size of 100, it will require 10 iterations (batches) to process the entire dataset, thereby completing one epoch. This is because the 1000 samples are divided into 10 batches of 100 samples each.

Let's take a closer specifying training via epochs looks like in practice ...

In [None]:
# TODO: Control the duration of the training process via its epochs

### Performance evaluation

In [None]:
train_rmse = evaluate_model_performance(x_train, y_train)
test_rmse = evaluate_model_performance(x_test, y_test)

In [None]:
print('Train RMSE:', train_rmse)
print('Test RMSE:', test_rmse)

Congrats, you just trained our first neural network! As you can see, the RMSE obtained on the test set isn't great. In order to reduce the RMSE, we have to alter the architecture (add more layers, change the activation function, etc.).