<a href="https://colab.research.google.com/github/andreacini/coesi-ml/blob/master/02_nonlinear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Excercise 02: Neural networks in Python

To build our neural network we will use [TensorFlow](https://www.tensorflow.org/), one of the most popular deep learning libraries for Python (the other being [PyTorch](https://pytorch.org/)). 
TensorFlow provides a huge number of functions, like Numpy, that can be used to manipulate arrays, but offers two great advantages w.r.t. Numpy: 

1. the computation can be accelerated on GPU via the CUDA library;
2. the library implements __automatic differentiation__, meaning that the most analytically complex step of training, the computation of the gradient, is handled for you.

While TensorFlow is a very powerful library that offers a fine-coarsened control over what you build, for this course we will skip the low level details and instead use the official high-level API for TensorFlow: [Keras](https://keras.io).

# Introduction to TensorFlow/Keras

![alt text](https://www.tensorflow.org/images/tf_logo_social.png)

(most of what follows is adapted from [the introduction on the TensorFlow website](https://www.tensorflow.org/guide/low_level_intro))

The most important things to understand about building neural networks with Tensorflow, are the concepts of __tensor__ and __computational graph__. 

## Tensors
A tensor consists of a set of primitive values shaped into an array of any number of dimensions. A tensor's __rank__ is its number of dimensions, while its __shape__ is a tuple of integers specifying the array's length along each dimension. 

Intuitively, tensors are the TensorFlow version of Numpy arrays. In fact, TF and Numpy are heavily intertwined and arrays can often be fed to TF models without modifications. 

## Computational graph
A computational graph is a series of TensorFlow operations arranged into a graph. The graph is composed of two types of objects.

- `tf.Operation` (or "ops"): The nodes of the graph. Operations describe calculations that consume and produce tensors.
- `tf.Tensor`: The edges in the graph. These represent the values that will flow through the graph. Most TensorFlow functions return tf.Tensors.


For this exercise session (and most of the course, probably), you will only need to keep in mind one thing: when using TF, __you first define the computation, and then you provide the data__. 

This means that all of your operations will be defined on symbolic objects, and only at the end you will actually run the computation.

Don't worry if you don't get this at first, it will become clearer by doing. 

# Keras

![alt text](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)



Keras offers collections of TF operations already arranged to implement neural networks with little to no effort. 
For instance, building a layer of 4 neurons like the one we saw above is as easy as calling `Dense(4)`. That's it. 

Moreover, Keras offers a high-level API for doing all the usual steps that we usually do when training a neural network, like training on some data, evaluating the performance, and predicting on unseen data. 

The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the `Sequential` model, a linear stack of layers.

---

In this exercise we will see how to build and train a simple neural network from scratch, in Keras.

# Boston dataset

Let's build a model to predict the prices of houses in the Boston area in the 70ties.

**Data Set Characteristics:**  

1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
import keras

dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.dropna(inplace=True)
dataset.tail()

In [None]:
dataset['Origin'] = dataset['Origin'].map(lambda x: {1: 'USA', 2: 'Europe', 3: 'Japan'}.get(x))
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
dataset.tail()

In [None]:
corr = dataset.corr()
plt.figure(figsize=(10,10))
sns.heatmap(
    corr, 
    vmin=-1, vmax=1, center=0,
    cmap=sns.diverging_palette(20, 220, n=200),
    square=True
)

Notice how the values of the features are not commensurable with one another.

While this in principle is not a problem for our machine learning models, in practice it can lead to issues in the training procedure.

To standardize the data, we compute the following transformation: 

$$
X_{\textrm{standardized}} = \frac{X - \textrm{mean}(X)}{\textrm{std}(X)}
$$

In [None]:
# Extract features
X = dataset[dataset.columns[1:]].values

# Normalize features
X -= np.mean(X, axis=0)
X /= np.std(X, axis=0)

# Extact targets
y = dataset['MPG'].copy()

In order to train our network, we will spliit the data into three main sets:

- training, which we will use to train the mode
- validation, which we use to monitor the performance of the model while training
- test, which we use to evaluate the model at the end

In [None]:
from sklearn.model_selection import train_test_split
np.random.seed(20)
# Split train / test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Now that we have loaded and pre-processed our data, we only need to build the neural network that we will train. Using keras, this is achieved in a few lines of code!

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

# Define the network
network = Sequential()
network.add(Dense(32, activation='tanh', input_shape=X.shape[1:]))
network.add(Dense(1))

# Choose an optimizer

optim = keras.optimizers.SGD(lr=0.01)

# Prepare the computational graph and training operations
network.compile(optimizer='sgd', 
                loss='mse',
                metrics=['mae'])

# Train the network
network.fit(X_train, y_train, batch_size=64, epochs=100)

In [None]:
# Evaluate the performance
eval_results = network.evaluate(X_test, y_test)
print('Test mae: {}'.format(eval_results[-1]))

Let's check the error distribution.

In [None]:
y_hat = network.predict(X_test)
err = y_test.values - y_hat.squeeze()

In [None]:
plt.figure(figsize=(10,5))
# plot error distribution
sns.distplot(err, hist=True, norm_hist=False, label='error', bins=20)
plt.xlim(-11,11)
plt.vlines(0, 0., 0.2, linestyles='--')
plt.legend()