The aim of this notebook is to explore combinations of hyper-parameters to narrow down which could **possibly** perform better in future tests. 

The hyper-parameters to be tested here are:
* Loss Function: MSE, MAPE
* Activation Function: ReLU, Leaky ReLU, Parametric ReLU
* Gradient Descent Optimizer: Adam, Adadelta, AMSGrad
* Dropout Rate: 0.0, 0.1, 0.2

This makes 54 models to be trained and evaluated according to a standard error. This error will be MSPE (Mean Square Percentage Error) because in all previous tests it has been **extremely** high.

All the models try to predict HC.

In [72]:
from keras.models import Sequential, load_model, Model
from keras.layers import Input, Dense, Dropout, advanced_activations, BatchNormalization, LeakyReLU, PReLU
from keras import losses, optimizers, activations
import keras.backend as K

import h5py

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.externals import joblib
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

import time
import datetime
import os

In [2]:
output_path = os.path.join('.','output')

## Load Data

In [3]:
# This dataset was scaled using MinMax
data_scaled_shuffled = pd.read_csv('Dataset_Scaled_Shuffled.csv')
print('Shuffled dataset loaded.')

Shuffled dataset loaded.


## Prepare Data

In [4]:
# Get number of data points
data_points = data_scaled_shuffled.shape[0]

# Set sizes for train, dev, test sets
train_percent = 0.8
train_size = round(train_percent*data_points)

if (data_points-train_size)%2 == 0:
    dev_size = int((data_points-train_size)/2)
    test_size = dev_size
    print('Train Size = {}'.format(train_size))
    print('Dev Size = {}'.format(dev_size))
    print('Test Size = {}'.format(test_size))
    print('Remainder = {}'.format(train_size+dev_size+test_size-data_points))
    
else:
    train_size = train_size-1
    dev_size = int((data_points-train_size)/2)
    test_size = dev_size 
    print('Train Size = {}'.format(train_size))
    print('Dev Size = {}'.format(dev_size))
    print('Test Size = {}'.format(test_size))
    print('Remainder = {}'.format(train_size+dev_size+test_size-data_points))

Train Size = 62511
Dev Size = 7814
Test Size = 7814
Remainder = 0


In [5]:
# Divide data into train, dev, and test sets
train_set = data_scaled_shuffled[:train_size]
dev_set = data_scaled_shuffled[train_size:train_size+dev_size]
test_set = data_scaled_shuffled[train_size+dev_size:train_size+dev_size+test_size]

# Reset index for all sets
train_set = train_set.reset_index(drop=True)
dev_set = dev_set.reset_index(drop=True)
test_set = test_set.reset_index(drop=True)

# Get values
train_set_values = train_set.values
dev_set_values = dev_set.values
test_set_values = test_set.values

# Number of emissions: HC, CO, CO2, NOX
n_out = 4

# SLICING: [start row:end row , start column:end column]
# Split into inputs and outputs
x_train = train_set_values[:,:-n_out]
x_dev = dev_set_values[:,:-n_out]
x_test = test_set_values[:,:-n_out]

HC_train = train_set_values[:,-n_out]
#CO_train = train_set_values[:,-n_out+1]
#CO2_train = train_set_values[:,-n_out+2]
#NOX_train = train_set_values[:,-n_out+3]

y_train = HC_train

HC_dev = dev_set_values[:,-n_out]
#CO_dev = dev_set_values[:,-n_out+1]
#CO2_dev = dev_set_values[:,-n_out+2]
#NOX_dev = dev_set_values[:,-n_out+3]

y_dev = HC_dev

HC_test = test_set_values[:,-n_out]
#CO_test = test_set_values[:,-n_out+1]
#CO2_test = test_set_values[:,-n_out+2]
#NOX_test = test_set_values[:,-n_out+3]

y_test = HC_test

## Inverse Scaling of Data

* This will be used later in the code to evaluate models

#### Import scalers

In [6]:
# Create an empty list to put all the scalers
scalers = []

for i in range(np.size(data_scaled_shuffled.columns)):
    
    scaler_filename = "Scalers/scaler{}.save".format(i)
    scaler = joblib.load(scaler_filename)
    
    scalers.append(scaler)

#### Inverse Scale Data

In [7]:
# First, inverse transform all original values from the test_set
test_set_inverse = test_set.copy()

for i in range(np.size(data_scaled_shuffled.columns)):
    
    col_name = data_scaled_shuffled.columns[i]
    
    values = test_set_inverse[col_name].values
    values = values.astype('float64')
    values = values.reshape(values.shape[0],1)
    
    test_set_inverse[col_name] = scalers[i].inverse_transform(values)
    
    print('Success with feature: {}'.format(col_name))

Success with feature: Year
Success with feature: Vehicle_Code
Success with feature: Manufacturer_Code
Success with feature: Displacement
Success with feature: Fuel_System
Success with feature: Gears
Success with feature: Transmission_Code
Success with feature: ETW
Success with feature: HP
Success with feature: Drive_System_Code
Success with feature: Fuel_Code
Success with feature: V_avg
Success with feature: V_max
Success with feature: V_std
Success with feature: a_pos
Success with feature: a_neg
Success with feature: Peak_pos
Success with feature: Peak_neg
Success with feature: HC
Success with feature: CO
Success with feature: CO2
Success with feature: Nox


-----------------
## Models

#### Basics

In [8]:
# Mini-batch size, epochs
batch_size = 64
epochs = 300

#### Variables

In [9]:
# Loss functions to try
names_losses = ['MSE','MAPE']    
dict_losses ={'MSE': losses.mean_squared_error,
              'MAPE': losses.mean_absolute_percentage_error}

#--------------------------------------------------------------------------------- 

# Activation functions to try
names_activations = ['ReLU', 'LReLU', 'PReLU']
    # A function has to be called so that a new instance of the function can be created in each layer
def get_activation(name):
    
    if name == 'ReLU':
        function = advanced_activations.ReLU()
    if name == 'LReLU':
        function = advanced_activations.LeakyReLU()
    if name == 'PReLU':
        function = advanced_activations.PReLU()
        
    return function

#---------------------------------------------------------------------------------     

# Optimizers to be tried out
names_optimizers = ['Adadelta', 'Adam', 'AMSGrad']
dict_optimizers ={'Adadelta': optimizers.Adadelta(),
                  'Adam': optimizers.Adam(amsgrad=False),
                  'AMSGrad':optimizers.Adam(amsgrad=True)}   

#--------------------------------------------------------------------------------- 

# Dropout rate to be tried
dropouts = [0.0, 0.1, 0.2]

#--------------------------------------------------------------------------------- 

print('Loss Functions = {}'.format(len(names_losses)))
print('Activation Functions = {}'.format(len(names_activations)))
print('Optimizers = {}'.format(len(names_optimizers)))
print('Dropout Rates = {}'.format(len(dropouts)))
print('--------------------------------')
print('Number of models for test = {}'.format(len(names_losses)*len(names_activations)*len(names_optimizers)*len(dropouts)))

Loss Functions = 2
Activation Functions = 3
Optimizers = 3
Dropout Rates = 3
--------------------------------
Number of models for test = 54


#### Build Model

In [10]:
def build_model(number, loss_func, activation_name, optimizer, dd):
    
    # Create model
    model = Sequential(name='Model_{}'.format(number))

    model.add(Dense(256, input_dim=x_train.shape[1]))
    model.add(get_activation(activation_name))
    model.add(Dropout(dd))
    model.add(BatchNormalization())

    model.add(Dense(128))
    model.add(get_activation(activation_name))
    model.add(Dropout(dd))
    model.add(BatchNormalization())

    model.add(Dense(64))
    model.add(get_activation(activation_name))
    model.add(Dropout(dd))
    model.add(BatchNormalization())

    model.add(Dense(32))
    model.add(get_activation(activation_name))
    model.add(Dropout(dd))
    model.add(BatchNormalization())

    model.add(Dense(16))
    model.add(get_activation(activation_name))
    model.add(Dropout(dd))
    model.add(BatchNormalization())

    model.add(Dense(1))

    #Compile model
    model.compile(loss=loss_func, optimizer=optimizer, metrics = ['accuracy'])
    
    print('{} Created'.format(model.name))
    print('----------------------------------')
    
    return model

#### Train Model

In [11]:
def train_models(model):
    
    print('{} - Training'.format(model.name))
    print('- Started on {} at {}'.format(str(datetime.datetime.now())[5:-16], str(datetime.datetime.now())[11:-10]))
    # Start timer
    start_time = time.time()

    # fit network
    history = model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, 
                        validation_data=(x_dev, y_dev), verbose=0, shuffle=True)

    # End timer
    end_time = time.time() - start_time
    print('{} - Training Complete'.format(model.name))
    print('- Time: {:.3f} min'.format(end_time/60))
    print('- Loss = {:.5f}'.format(history.history['loss'][-1]))
    print('- Val Loss = {:.5f}'.format(history.history['val_loss'][-1]))
    print('----------------------------------')
        
    return history

#### Make Predictions and Calculate Error

In [12]:
# Function to define MSPE
def msp_error(true,pred):
    error = 100*np.sum(((true-pred)/true)**2)/np.size(true)
    return error

In [13]:
def predict_get_error(model):
    
    print('Predicting with {}'.format(model.name))
    predictions = model.predict(x_test)
    
    print('Inverse Scaling Operation')        
    # Inverse the scaling operation on the predictions (only HC is being predicted)
    HC_predict = scalers[-4].inverse_transform(predictions)
    
    print('Calculating Error')
    mspe = msp_error(test_set_inverse['HC'].values, HC_predict)
    print('- HC Error  = {:.2e}'.format(mspe))
    print('----------------------------------')
    
    return mspe

#### Process Models and Rank with MSPE

In [14]:
def process_models():
    
    count = 1
    model_list = []
    history_list = []
    HC_error_list = []

    for loss_name in names_losses:
        loss = dict_losses[loss_name]

        for activation_name in names_activations:

            for optimizer_name in names_optimizers:
                optimizer = dict_optimizers[optimizer_name]

                for dd in dropouts:

                    # Print model variables
                    print('Model_{} Variables:'.format(count))
                    print('- Loss: {}'.format(loss_name))
                    print('- Activation: {}'.format(activation_name))
                    print('- Optimizer: {}'.format(optimizer_name))
                    print('- Dropout: {}%'.format(dd*100))
                    print('----------------------------------')

                    # Create model
                    model = build_model(count,loss,activation_name,optimizer,dd)
                    model_list.append(model)

                    # Train model
                    history = train_models(model)
                    history_list.append(history)

                    # Make predictions and calculate error
                    error = predict_get_error(model)
                    HC_error_list.append([model.name, loss_name, activation_name, optimizer_name, dd, error])

                    # Announce one model process ended
                    print('============== MODEL {} PROCESS END =============='.format(count))
                    print(' ')

                    # Increase counter by 1
                    count = count+1

    print('Creating DataFrame')                
    HC_error = pd.DataFrame(HC_error_list)

    print('Changing DataFrame column names')
    HC_error.columns = ['Model', 'Loss', 'Activation', 'Optimizer', 'Dropout', 'MSPE']

    print('Ranking Models')
    HC_error.sort_values(by=['MSPE'], inplace=True)

    count = 0
    
    return HC_error, model_list, history_list

In [16]:
ranking, models, histories = process_models()

Model_1 Variables:
- Loss: MSE
- Activation: ReLU
- Optimizer: Adadelta
- Dropout: 0.0%
----------------------------------
Model_1 Created
----------------------------------
Model_1 - Training
- Started on 03-23 at 18:31
Model_1 - Training Complete
- Time: 39.336 min
- Loss = 0.00216
- Val Loss = 0.00195
----------------------------------
Predicting with Model_1
Inverse Scaling Operation
Calculating Error
- HC Error  = 1.63e+11
----------------------------------
 
Model_2 Variables:
- Loss: MSE
- Activation: ReLU
- Optimizer: Adadelta
- Dropout: 10.0%
----------------------------------
Model_2 Created
----------------------------------
Model_2 - Training
- Started on 03-23 at 19:11
Model_2 - Training Complete
- Time: 41.556 min
- Loss = 0.00253
- Val Loss = 0.00212
----------------------------------
Predicting with Model_2
Inverse Scaling Operation
Calculating Error
- HC Error  = 1.33e+11
----------------------------------
 
Model_3 Variables:
- Loss: MSE
- Activation: ReLU
- Optimizer

- HC Error  = 1.52e+11
----------------------------------
 
Model_17 Variables:
- Loss: MSE
- Activation: LReLU
- Optimizer: AMSGrad
- Dropout: 10.0%
----------------------------------
Model_17 Created
----------------------------------
Model_17 - Training
- Started on 03-24 at 05:38
Model_17 - Training Complete
- Time: 44.917 min
- Loss = 0.00316
- Val Loss = 0.00284
----------------------------------
Predicting with Model_17
Inverse Scaling Operation
Calculating Error
- HC Error  = 1.82e+11
----------------------------------
 
Model_18 Variables:
- Loss: MSE
- Activation: LReLU
- Optimizer: AMSGrad
- Dropout: 20.0%
----------------------------------
Model_18 Created
----------------------------------
Model_18 - Training
- Started on 03-24 at 06:23
Model_18 - Training Complete
- Time: 45.098 min
- Loss = 0.00334
- Val Loss = 0.00284
----------------------------------
Predicting with Model_18
Inverse Scaling Operation
Calculating Error
- HC Error  = 1.50e+11
---------------------------

Model_32 - Training Complete
- Time: 49.190 min
- Loss = 806.24804
- Val Loss = 228.15783
----------------------------------
Predicting with Model_32
Inverse Scaling Operation
Calculating Error
- HC Error  = 1.15e+13
----------------------------------
 
Model_33 Variables:
- Loss: MAPE
- Activation: ReLU
- Optimizer: Adam
- Dropout: 20.0%
----------------------------------
Model_33 Created
----------------------------------
Model_33 - Training
- Started on 03-24 at 18:51
Model_33 - Training Complete
- Time: 49.632 min
- Loss = 630.52380
- Val Loss = 2425.23381
----------------------------------
Predicting with Model_33
Inverse Scaling Operation
Calculating Error
- HC Error  = 2.61e+17
----------------------------------
 
Model_34 Variables:
- Loss: MAPE
- Activation: ReLU
- Optimizer: AMSGrad
- Dropout: 0.0%
----------------------------------
Model_34 Created
----------------------------------
Model_34 - Training
- Started on 03-24 at 19:40
Model_34 - Training Complete
- Time: 49.502 m

Model_48 Created
----------------------------------
Model_48 - Training
- Started on 03-25 at 08:08
Model_48 - Training Complete
- Time: 61.229 min
- Loss = 203.33634
- Val Loss = 87.78839
----------------------------------
Predicting with Model_48
Inverse Scaling Operation
Calculating Error
- HC Error  = 1.11e+09
----------------------------------
 
Model_49 Variables:
- Loss: MAPE
- Activation: PReLU
- Optimizer: Adam
- Dropout: 0.0%
----------------------------------
Model_49 Created
----------------------------------
Model_49 - Training
- Started on 03-25 at 09:10
Model_49 - Training Complete
- Time: 56.922 min
- Loss = 674.43140
- Val Loss = 175.92957
----------------------------------
Predicting with Model_49
Inverse Scaling Operation
Calculating Error
- HC Error  = 4.95e+08
----------------------------------
 
Model_50 Variables:
- Loss: MAPE
- Activation: PReLU
- Optimizer: Adam
- Dropout: 10.0%
----------------------------------
Model_50 Created
-------------------------------

In [17]:
ranking

Unnamed: 0,Model,Loss,Activation,Optimizer,Dropout,MSPE
41,Model_42,MAPE,LReLU,Adam,0.2,47582620.0
40,Model_41,MAPE,LReLU,Adam,0.1,58247250.0
39,Model_40,MAPE,LReLU,Adam,0.0,81934570.0
48,Model_49,MAPE,PReLU,Adam,0.0,495301500.0
50,Model_51,MAPE,PReLU,Adam,0.2,618747200.0
49,Model_50,MAPE,PReLU,Adam,0.1,883310200.0
47,Model_48,MAPE,PReLU,Adadelta,0.2,1110413000.0
44,Model_45,MAPE,LReLU,AMSGrad,0.2,1641920000.0
30,Model_31,MAPE,ReLU,Adam,0.0,1788013000.0
29,Model_30,MAPE,ReLU,Adadelta,0.2,1857675000.0


In [56]:
epoch_vector = np.linspace(1,epochs,epochs)

for i in [39,40,41,48,49,50]:
    
    model = models[i]
    history = histories[i]
    
    model.save(os.path.join(output_path,'{}.h5'.format(model.name)))
    
    
    hist_data = [epoch_vector, history.history['loss'], history.history['val_loss']]
    
    hist_data = pd.DataFrame(hist_data).transpose()
    
    # Change column names
    hist_data.columns = ['Epochs','loss','val_loss']
    
    hist_data.to_csv(os.path.join(output_path,'Training_History_{}.csv'.format(model.name)), 
                     encoding='utf-8', index=False)

In [57]:
ranking.to_csv(os.path.join(output_path,'Model_Ranking.csv'), encoding='utf-8', index=False)

## Next Steps

All the data will be moved from the *output* folder to the *Gen_3* folder. Top six models were saved for reference. The error is so big they probably won't be used again.

Pick the best combinations (with some flexibility - all errors are through the roof):
* MAPE + Leaky ReLU + Adam + 20% Dropout
* MAPE + Leaky ReLU + Adadelta + 20% Dropout
* MAPE + PReLU + Adam + 20% Dropout
* MAPE + PReLU + Adadelta + 20% Dropout

The outputs being predicted by the neural networks are **NEGATIVE**, which makes no sense. The problem could lie within the **linear** activation function in the output neuron. Train those configurations using a different function like ReLU or Sigmoid

#### Other possible next steps
* Try predicting another pollutant besides HC
* Try the R2 loss function
* Try varying the input variables

In [None]:
# Print model description
#print('{} Description:'.format(model.name))
#print('- Loss: {}'.format(str(model.loss)[10:-23]))
#print('- Optimizer: {}'.format(str(model.optimizer)[18:-29]))



#total_count = model.count_params()
#trainable_count = int(np.sum([K.count_params(p) for p in set(model.trainable_weights)]))
#non_trainable_count = int(np.sum([K.count_params(p) for p in set(model.non_trainable_weights)]))
#print('- Total params: {:,}'.format(total_count))
#print('- Trainable params: {:,}'.format(trainable_count))
#print('- Non-trainable params: {:,}'.format(non_trainable_count))