<a href="https://colab.research.google.com/github/cagBRT/IntroToDNNwKeras/blob/master/Choosing_Loss_Functions_LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Regression Loss Functions**

This notebook discusses how to choose and implement loss functions for linear regression models

In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/IntroToDNNwKeras.git cloned-repo
%cd cloned-repo

Three loss functions will be discussed and implemented in this notebook<br>
>Mean Squared Error Loss<br>
Mean Squared Logarithmic Error Loss<br>
Mean Absolute Error Loss<br>

These loss functions can be used when working with CNN and RNN regression models

**Import the libraries**

In [None]:
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
from sklearn.model_selection import train_test_split

In [None]:
#This constant is used in two places
#Creating the dataset and in defining the model
NUM_FEATURES = 20

**Create a synthetic dataset for doing regression**

In [None]:
# generate regression dataset
X, y = make_regression(n_samples=1000, n_features= NUM_FEATURES, 
                       noise=0.2)

In [None]:
# normalize the dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]

**Split the dataset into training and test sets**

In [None]:
# split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.10)

**The model**<br>
The model expects NUM_FEATURES as its input. <br>
The model has one hidden layer with 25 nodes and uses the ReLU (rectified linear activation function).<br>
The model predicts one value, so the output layer has 1 node and uses the linear activation function.

In [None]:
# define model
model = Sequential()
model.add(Dense(25, input_dim=NUM_FEATURES, activation='relu', 
                kernel_initializer='he_uniform'))
#For regression, use one node on the outout and linear activation function
model.add(Dense(1, activation='linear'))
opt = SGD(learning_rate=0.01, momentum=0.9)

In [None]:
# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='loss', patience=10)

## **Mean Squared Error Loss**<br>
This is the default loss to use for regression problems<br>
This is the default loss function and should be evaluated first.  
*Only change it if you have a good reason*.

Mean squared error is the average of the squared differences between the predicted and actual values. <br>
*The result is always positive* <br>
A perfect value is 0.0.<br>
Squaring the losses results in larger mistakes creating more error than smaller mistakes. <br>
In other words, **the model is punished for larger mistakes**

In [None]:
model.compile(loss='mean_squared_error', optimizer=opt,metrics=['mse'])
# train the model
historyMSE = model.fit(X_train, y_train, validation_data=(X_test, y_test),
                       epochs=400, verbose=0,callbacks=[early_stop])

In [None]:
# evaluate the model
_,train_mse = model.evaluate(X_train, y_train, verbose=0)
_,test_mse = model.evaluate(X_test, y_test, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

In [None]:
# plot loss during training
pyplot.title('Loss / Mean Squared Error')
pyplot.plot(historyMSE.history['loss'], label='train')
pyplot.plot(historyMSE.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

The model quickly converged and the training and tests performance are equal. <br>
It is fair to say MSE is a good choice for this data and model.  

## **Mean Squared Logarithmic Error Loss**<br>
MSLE is used when the target value has a spread of values and when predicting large values.<br>
MSLE does not punish the model has much as MSE for errors

MSLE first calculates the log of each predicted value. <br>
The calculates the mean squared error<br>
This reduces the effect of large differences when the predicted values are large. 

In [None]:
#The model is the same as above
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', 
                kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(learning_rate=0.01, momentum=0.9)

The metrics 

In [None]:
model.compile(loss='mean_squared_logarithmic_error', optimizer=opt,
              metrics=['msle'])
# fit model
historyMSLE = model.fit(X_train, y_train, validation_data=(X_test, y_test),
                        epochs=400, verbose=0,callbacks=[early_stop])

In [None]:
# evaluate the model
_,train_msle = model.evaluate(X_train, y_train, verbose=0)
_,test_msle = model.evaluate(X_test, y_test, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_msle, test_msle))

In [None]:
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss / Mean Squared Logarithmic Error')
pyplot.plot(historyMSLE.history['loss'], label='train')
pyplot.plot(historyMSLE.history['val_loss'], label='test')
pyplot.legend()
# plot mse during training
pyplot.subplot(212)
pyplot.title('Mean Squared Error')
pyplot.plot(historyMSE.history['loss'], label='train')
pyplot.plot(historyMSE.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

The model has a greater loss difference between the training and test set. <br>
It also stopped training at 13 epochs, so it tends to overfit.

## **Mean Absolute Error Loss**<br>
Use the Mean Absolute Error Loss when the dataset has outliers. <br>

In [None]:
#The same model as the MSE
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', 
                kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(learning_rate=0.01, momentum=0.9)

In [None]:
model.compile(loss='mean_absolute_error', optimizer=opt, metrics=['mse'])
# fit model
historyMAE = model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                       epochs=400, verbose=0,callbacks=[early_stop])

In [None]:
# evaluate the model
_,train_mae = model.evaluate(X_train, y_train, verbose=0)
_,test_mae = model.evaluate(X_test, y_test, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

In [None]:
# plot loss during training
pyplot.subplot(211)
pyplot.title('Mean Absolute Error Loss')
pyplot.plot(historyMAE.history['loss'], label='train')
pyplot.plot(historyMAE.history['val_loss'], label='test')
pyplot.legend()
# plot mse during training
pyplot.subplot(212)
pyplot.title('Mean Squared Error')
pyplot.plot(historyMSE.history['loss'], label='train')
pyplot.plot(historyMSE.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()