
<div>
    &nbsp;
    <br>
</div>


# Regression in Keras
We will be using the dataset provided in the assignment

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate


## Load and Clean Dataset
<br>
<b>Import python required libraries.

In [71]:
import pandas as pd
import numpy as np

<b>Import Keras

In [72]:
import keras

from keras.models import Sequential
from keras.layers import Dense

<b>Import sklearn libs

In [73]:
from sklearn.model_selection import train_test_split

<b>Let's read the dataset into a pandas dataframe.

In [74]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa

<b>Check how many data pointsiin the dataset

In [75]:
concrete_data.shape

(1030, 9)

Let's check missing values in the dataset

In [76]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data in the dataset looks clean and is ready to be used to build our model.

<b>Summary of the dataset

In [77]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


<b>Splitting data into predictors and target</b>
    
The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [78]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick check of the predictors and the target dataframes.

In [79]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [80]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## Normalize the data 

In [81]:
#Normalize the data by substracting the mean and dividing by the standard deviation
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [82]:
n_cols = predictors_norm.shape[1] # number of predictors
n_cols

8

<b>Building a our regressoin model.</b>
<br>It has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

In [83]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

<b>Splitting the data into a training and test sets by holding 30% of the data for testing</b>

In [84]:
predictors_train, predictors_test, target_train, target_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

Let's call the function now to create our model.

In [85]:
# build the model
model = regression_model()

Next, we will train the model for 50 epochs.

In [86]:
# fit the model
epochs = 100
model.fit(predictors_train, target_train, epochs=epochs, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x289ca7c77f0>

Next we need to evaluate the model on the test data

In [87]:
loss_val = model.evaluate(predictors_test, target_test)
target_pred = model.predict(predictors_test)
print("Loss value from test data : {}".format(loss_val))

Loss value from test data : 153.32810816101272


Next, we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [88]:
from sklearn.metrics import mean_squared_error

In [89]:
mse = mean_squared_error(target_test, target_pred)
mean = np.mean(mse)
std = np.std(mse)
print("Mean : {}, Standard Deviation : {}".format(mean, std))

Mean : 153.3281121907033, Standard Deviation : 0.0


Ceate a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [90]:
no_of_mses = 50
epochs = 100
mse_list = []
for i in range(0, no_of_mses):
    predictors_train, predictors_test, target_train, target_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(predictors_train, target_train, epochs=epochs, verbose=0)
    loss_val = model.evaluate(predictors_test, target_test, verbose=0)
    print("{}. Mean squared error {} ".format(i+1, loss_val))
    target_pred = model.predict(predictors_test)
    mse = mean_squared_error(target_test, target_pred)
    mse_list.append(mse)

mse_array = np.array(mse_list)
mean = np.mean(mse_array)
std = np.std(mse_array)

print('\n*************************************************\n')
print("Mean and Standard deviation of {} mean squared errors with normalized data. Total number of epochs for each training is : {} ".format(no_of_mses, epochs))
print('\n *************************************************\n')
print("Mean : {} and Standard Deviation : {} ".format(mean,std))
print('\n=================================================\n')


1. Mean squared error 100.4554979515693 
2. Mean squared error 87.22405220698384 
3. Mean squared error 58.71979922692753 
4. Mean squared error 55.85762780692585 
5. Mean squared error 52.14265061659334 
6. Mean squared error 53.08115111662732 
7. Mean squared error 53.77659606933594 
8. Mean squared error 37.864895854567244 
9. Mean squared error 42.18528288313486 
10. Mean squared error 41.91870140643567 
11. Mean squared error 40.05650178977201 
12. Mean squared error 36.950105327618544 
13. Mean squared error 42.72513770439864 
14. Mean squared error 46.36302796150874 
15. Mean squared error 38.85120532582107 
16. Mean squared error 33.23610239738785 
17. Mean squared error 40.18738254682918 
18. Mean squared error 37.22529705134024 
19. Mean squared error 37.54354211504791 
20. Mean squared error 39.39298466802801 
21. Mean squared error 34.04186973448324 
22. Mean squared error 36.31869930896944 
23. Mean squared error 31.377133563884254 
24. Mean squared error 34.74372169963750