# Regression models with Keras

### Tasks & Objectives:

We will use the concrete dataset to built a deep regression model. The models task is to use the numeric input features of the concrete composition Cement, Blast Furnance Slag, Fly Ash, Water, Superplasticizer, Coarse Aggregate, Fine Aggregate, Age to predict the Strength of the concrete.

The performance of four different model architectures is compared:

#### A)  Baseline model

This model will use the input features as is, without any preprocessing.

-Feature preprocessing: None

-Architecture: 1 Hidden layer of 10 nodes, ReLu activation function.

-Numer of training epochs = 50.

-Optimizer: Adam.

-Loss: mean squared error.

#### B)  Normalized data model

We will preprocess the data using a standard scaler normalization on the features.

-Feature preprocessing: Standard Scaler

-Architecture: 1 Hidden layer of 10 nodes, ReLu activation function.

-Numer of training epochs = 50.

-Optimizer: Adam.

-Loss: mean squared error.

#### C) Increas training

This time we increase the training epochs to 100.

-Feature preprocessing: Standard Scaler

-Architecture: 1 Hidden layer of 10 nodes, ReLu activation function.

-Numer of training epochs = 100.

-Optimizer: Adam.

-Loss: mean squared error.

#### D) Increase complexity

The model architecture will change s.t. the complexity of the model increases. This is achived by increasing the number of hidden layers from 1 to 3.

-Feature preprocessing: Standard Scaler

-Architecture: 3 Hidden layer of 10 nodes, ReLu activation function.

-Numer of training epochs = 50.

-Optimizer: Adam.

-Loss: mean squared error. 


 


### Importing modules

In [1]:
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
import pandas as pd

### Downloading data as pandas dataframe

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
print('\nShape of dataset: {}'.format(concrete_data.shape))
print('\nCheck for missing values in the dataset: \n{}'.format(concrete_data.isnull().sum()))
print('\nHeader of dataset: ')
concrete_data.head()


Shape of dataset: (1030, 9)

Check for missing values in the dataset: 
Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Header of dataset: 


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


### Feature selection

Use the column Strength as label and all other columns as features.

In [3]:
concrete_data_columns = concrete_data.columns

features = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
n_features = features.shape[1]
label = concrete_data['Strength'] # Strength column

### Defining train - test split

In [4]:
def data_split(features,label) :
    X_train, X_test, y_train, y_test = train_test_split(features, label, test_size=0.3)
    
    # Create a list containing X_train, X_test, Y_train, Y_test and return the list
    splits = [X_train, X_test, y_train, y_test] 
    return splits

### Defining the model

The definition of the model is done s.t. the architecture of the model is created according to the number of hidden layers, number of nodes per layer and dimensionality of the feature vector.

In [5]:
def regression_model(n_hidden_layers,n_nodes,n_features):
    model = Sequential()
    model.add(Dense(n_nodes, activation='relu', input_shape=(n_features,))) #input layer
    
    for i in range(int(n_hidden_layers)):                                   #hidden layers
        model.add(Dense(n_nodes, activation='relu'))
        
    model.add(Dense(1))                                                     #output layer
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model

### Model training and scoring

The specifications of the preprocessing procedure and model arhitecture are summerized in the variable parts. The model definition allows to use this variable as input and train and evaluate each part (A,B,C,D) in a consecutive order. 

Every model is trained and evaluated seperately 50 times in a row to increase the statistical confidence in the scoring of each model.  

In [6]:
parts = [
    [0,1,10,50,False], #part,n_hidden_layers,n_nodes,n_epochs,normalize
    [1,1,10,50,True],
    [2,1,10,100,True],
    [3,3,10,50,True],
]
repeat = 50
mse = np.zeros((len(parts),repeat))

for part, n_hidden_layers, n_nodes, n_epochs, normalize in parts:  
    print('\n\nPart {}'.format(part) +' Model architecture: \n\t#Hidden layers = {}'.format(n_hidden_layers)+'\n\t#Nodes per layer = {}'.format(n_nodes) \
          +'\n\t#Training epochs = {}'.format(n_epochs)+ '\n\tFeature noramlzation: {}'.format(normalize))
    for j in range(repeat):
        print('Evaluation #{}'.format(j)+' of model {}'.format(part))
        
        X_train, X_test, y_train, y_test = data_split(features, label) #train test split
        
        if normalize == True: #normalize features if normalize is true (part 1,2,3)
            X_train = (X_train - X_train.mean()) / X_train.std()
            X_test  = (X_test - X_test.mean()) / X_test.std()
            
        model = regression_model(n_hidden_layers,n_nodes,n_features)                    #initialize model
        model.fit(X_train,y_train, validation_split=0.3, epochs=n_epochs, verbose=1)    #fit model
        y_predicted = model.predict(X_test)                                             #predict with test set
        mse[part,j] = mean_squared_error(y_test,y_predicted)                            #track performance as mean squared error 
        



Part 0 Model architecture: 
	#Hidden layers = 1
	#Nodes per layer = 10
	#Training epochs = 50
	Feature noramlzation: False
Evaluation #0 of model 0
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Evaluation #1 of model 0
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epo

### Model Evaluation

The performance measure for the four model architectures under consideration is the mean squared error. 
Training and evaluating each model 50 times seperately the mean and standard deviation of each prediction gives gives a good intuition about the performance of every model.

In [7]:
col_names =  ['Score','Baseline (A)', 'Normalized (B)','Increased Training (C)', 'Increased Complexity (D)']
evaluation_df = pd.DataFrame(columns = col_names)
score_names = ['mean of MSE', 'std of MSE']
scores = [mse.mean(axis=1),mse.std(axis=1)]


for score_method, scoring in zip(score_names,scores):
    evaluation_df.loc[len(evaluation_df)] = [score_method,*scoring]
    
evaluation_df

Unnamed: 0,Score,Baseline (A),Normalized (B),Increased Training (C),Increased Complexity (D)
0,mean of MSE,205.992079,178.106949,133.570489,135.9944
1,std of MSE,200.711925,16.846957,13.192868,14.386215


### Remarks

The table above compares the mean of the mean squared error for every model respectively. It can be concluded that normalizing the features reduces variance of the model. Increasing training by increasing the number of training epochs and increasing the complexity of the model by increasing the number of hidden layers leads to the best overall results.