# Building neural networks for regression

In this activity we will build a neural network for regression for the medical data set on diabetes progression. Furthermore, we will introduce the learning rate.

## Dataset

We use the same dataset as before. We can shorten our code a bit:

In [None]:
##### added line to ensure plots are showing
%matplotlib inline
#####

import pandas as pd
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import StandardScaler

dataset = load_diabetes()

X = pd.DataFrame(data=dataset['data'],columns=dataset['feature_names'])

y = pd.DataFrame(data=dataset['target'],columns=['progression'])

## Building a neural network regressor

Again, we prepare our training and test sets:

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

# Scale the training and the test data
X_train = StandardScaler().fit_transform(X_train)
X_test = StandardScaler().fit(X_train).transform(X_test)


This time we obtain a metric that can be used for regression, the mean squared error, which is also used for the loss function. We now use a different optimiser (instead of stochastic gradient descent) which works better in this instance and run 10 epochs:

In [None]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from sklearn.metrics import mean_squared_error as mse
from tensorflow.keras.optimizers import RMSprop, Adam, SGD

input_dim = X_train.shape[1]
# We only have 1 output dimension, as our regression output a real number
output_dim = 1

model = Sequential()
model.add(Dense(50,input_dim=input_dim))

model.add(Dense(output_dim))

# We now use a dedicated optimizer instance - this allows us to input the learning rate later
model.compile(optimizer=Adam(),loss='mean_squared_error',metrics=['mean_squared_error'])

model.summary()

# We add the number of epochs as a parameter to our fit method
model.fit(X_train,y_train,epochs=10)

prediction = model.predict(X_test)

print('RMSE:', np.sqrt(mse(y_test,prediction)))

## Hyperparameters

Now, we will try to do a small hyperparameter optimisation exercise where we try to find the best regression model by altering:
- The number of neurons in the hidden layer
- The activation function
- The learning rate
- The number of epochs

We can use ```GridSearchCV``` from scikit-learn. However, we need to make an instance of a neural network we can feed to the grid search. Hence, we first create a neural network with the hyperparameters as inputs:

In [None]:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from tensorflow.keras.optimizers import Adam

def nn_model(no_neurons,learning_rate,kernel='relu'):
    model = Sequential()
    model.add(Dense(no_neurons,input_dim=X_train.shape[1]))
    model.add(Activation(kernel))

    # Extra hidden layer
    model.add(Dense(no_neurons))
    model.add(Activation(kernel))

    # Output
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    
    # Here, we can add the learning rate to the optimiser
    model.compile(optimizer=Adam(learning_rate=learning_rate),loss='mean_squared_error',metrics=['mean_squared_error'])
        
    return model

Now, we add that model to our grid search as follows. Notice also how we setup the parameters to match the inputs of the model we just created.

In [None]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

# We create a dictionary again, with keys matching our neural network function we create above 
parameters = {'no_neurons':[5,20],'kernel':['relu','linear'],'learning_rate':[0.0001,0.01],'epochs':[5,10],'verbose':[0]} 

# We wrap our model into KerasClassifier to bridge the gap between scikit-learn and Keras
grid_search = GridSearchCV(KerasClassifier(nn_model), parameters, cv=5,scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train.values.ravel())

means = grid_search.cv_results_['mean_test_score']
stds = grid_search.cv_results_['std_test_score']

print('Mean RMSE (+/- standard deviation), for parameters')
for mean, std, params in zip(means, stds, grid_search.cv_results_['params']):
    print("%0.3f (+/- %0.03f) for %r"
          # The MSE is return as a negative, so we multiple it with -1 before squaring it
          % (np.sqrt(-1*mean), np.sqrt(std), params))

It seems there is very little difference in terms of RMSE. We cannot say which hyperparameters are working better than others. Perhaps we should do a wider search, but this takes even more time. Later on, we will see how different results can be given the hyperparameters.
A good hyperparameter search can result in very different networks more or less suitable for the data at hand.