<h1 align=center><font size = 5>Assignment: Build a Regression Model in Keras</font></h1>

## Data Preprocessing

Before we build any model, we will first downlaod the dataset and prepare it for learning.

<a id="item31"></a>

In [8]:
import pandas as pd
import numpy as np

Let's download the data and read it into a <em>pandas</em> dataframe.

In [9]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [10]:
# check for null values in all the attributes
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

#### Split data into predictors and target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [11]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

<a id="item2"></a>

Let's do a quick sanity check of the predictors and the target dataframes.

In [12]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [13]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

The data is ready to be used for training! Let us begin building the neural networks.

Let's save the number of predictors to *n_cols* since we will need this number when building our network.

In [15]:
n_cols = predictors.shape[1] # number of predictors

## A. Baseline Model

In this part of the assignment, we will build a neural network with Keras of following specifications:


1. One hidden layer of 10 nodes, and a ReLU activation function

2. The adam optimizer and the mean squared error as the loss function.

In [27]:
import keras
from keras.models import Sequential
from keras.layers import Dense

def baseline_model():
    model = Sequential()
    #add hidden layer
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    #add output layer
    model.add(Dense(1))
    #compile model
    model.compile(optimizer='adam',loss='mean_squared_error')
    
    return model

#### Train and Test the Network

In [44]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

#Define a function that trains the model with given number of epochs and test it; repeating the whole process iters times
def trainAndTest(model, predictors,target,test_size,epochs,iters):
    errors = np.zeros(iters)
    
    #run iters number of times
    for i in range(iters):
        X_train,X_test,y_train,y_test = train_test_split(predictors,target,test_size=test_size,random_state=12)
        #fit the model
        model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs=epochs, verbose=0)
        # predict and evaluate the model
        y_hat = model.predict(X_test)
        errors[i] = mean_squared_error(y_hat,y_test)
    return (errors.mean(),errors.std())

In [51]:
# build the model
model = baseline_model()
#train and test the model
mean,std = trainAndTest(model, predictors,target,test_size=0.3,epochs=50,iters=50)
print('The mean and standard deviation of the mean squared errors are: ', mean, std)

The mean and standard deviation of the mean squared errors are:  88.85512900804568 65.07359666203364


## B. Normalize Data and retrain the Baseline model

In [46]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()

In [47]:
model = baseline_model()
mean,std = trainAndTest(model,predictors_norm,target,test_size=0.3,epochs=50,iters=50)
print('The mean and standard deviation of the mean squared errors are: ', mean, std)

The mean and standard deviation of the mean squared errors are:  61.64758816698005 50.180386306279686


<b>Observation: </b> The mean of the mean squared error is much lesser than that of part A, making it a better trained model for the same number of epochs and iterations.

## C. Repeat part B with 100 epochs

In [48]:
model = baseline_model()
mean,std = trainAndTest(model,predictors_norm,target,test_size=0.3,epochs=100,iters=50)
print('The mean and standard deviation of the mean squared errors are: ', mean, std)

The mean and standard deviation of the mean squared errors are:  52.8761383623337 17.610748240165567


<b>Observation: </b> The mean and deviation of the mean squared error is less than that of part B, showing that model has learnt more and improved its predictions with more number of epochs.

## D. Increase the number of hidden layers & repeat part B

In [52]:
def deeper_model():
    model = Sequential()
    model.add(Dense(10,activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10,activation='relu'))
    model.add(Dense(10,activation='relu'))
    model.add(Dense(1))
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [54]:
model = deeper_model()
mean, std = trainAndTest(model, predictors_norm,target,test_size=0.3,epochs=50,iters=50)

In [55]:
print('The mean and standard deviation of the mean squared errors are: ', mean, std)

The mean and standard deviation of the mean squared errors are:  40.57299172575729 14.970495444694443


<b>Observation: </b> The mean of the mean squared error is much lesser than that of part B as the network is dense compared to the model in part B.