<h1 align=center><font size = 5>Regression Model with Keras</font></h1>

Let's start by importing the pandas and the Numpy libraries.

In [2]:
import pandas as pd
import numpy as np

The predictors in the data of concrete strength include:

1.Cement

2.Blast Furnace Slag

3.Fly Ash

4.Water

5.Superplasticizer

6.Coarse Aggregate

7.Fine Aggregate

In [3]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Let's check how many data points we have.


In [4]:
concrete_data.shape

(1030, 9)

No null values present, data is clean

In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

In [6]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.


In [7]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [8]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [9]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [10]:
n_cols = predictors.shape[1] # number of predictors

#### Let's go ahead and import the Keras library and the rest of the packages from the Keras library that we will need to build our regressoin model.


In [11]:
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

Lets define our regression model

In [12]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Split the data into testing and training samples randomly

In [13]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3,random_state=42)
X_train.shape

(721, 8)

Lets build and test the model

In [14]:
# build the model
model = regression_model()
# fit the model
model.fit(X_train, y_train,epochs=10, validation_data=(X_test, y_test) , verbose=2)
# evaluate the model
y_pred = model.predict(X_test)

Epoch 1/10
23/23 - 0s - loss: 433223.7188 - val_loss: 275509.5312
Epoch 2/10
23/23 - 0s - loss: 191546.0781 - val_loss: 106806.5469
Epoch 3/10
23/23 - 0s - loss: 69622.8984 - val_loss: 33559.1328
Epoch 4/10
23/23 - 0s - loss: 21455.6797 - val_loss: 9591.6436
Epoch 5/10
23/23 - 0s - loss: 7072.4458 - val_loss: 4147.3994
Epoch 6/10
23/23 - 0s - loss: 4022.1714 - val_loss: 3446.7778
Epoch 7/10
23/23 - 0s - loss: 3591.7012 - val_loss: 3390.5459
Epoch 8/10
23/23 - 0s - loss: 3507.9109 - val_loss: 3340.2209
Epoch 9/10
23/23 - 0s - loss: 3434.9424 - val_loss: 3272.1716
Epoch 10/10
23/23 - 0s - loss: 3370.8130 - val_loss: 3193.2896


Let's print the accuracy and the corresponding error.

In [15]:
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("Root Mean Squared Error: {}".format(rmse))

Root Mean Squared Error: 56.509199652875395


A. Repeating the above process 50 times and storing Mean Squared Error in a list

In [30]:
rmse_list =  []

for i in range(0,50):
    print("Iteration : "+str(i))
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3,random_state=42)
    # build the model
    model = regression_model()
    # fit the model
    model.fit(X_train, y_train,epochs=50, validation_data=(X_test, y_test) , verbose=2)
    # evaluate the model
    y_pred = model.predict(X_test)
    rmse_list.append(np.sqrt(mean_squared_error(y_test, y_pred))) 


ch 26/50
23/23 - 0s - loss: 192.7719 - val_loss: 156.4818
Epoch 27/50
23/23 - 0s - loss: 191.8211 - val_loss: 156.5783
Epoch 28/50
23/23 - 0s - loss: 188.1168 - val_loss: 155.3496
Epoch 29/50
23/23 - 0s - loss: 186.9978 - val_loss: 152.6521
Epoch 30/50
23/23 - 0s - loss: 186.1342 - val_loss: 150.8150
Epoch 31/50
23/23 - 0s - loss: 184.8246 - val_loss: 154.5755
Epoch 32/50
23/23 - 0s - loss: 182.7115 - val_loss: 148.9712
Epoch 33/50
23/23 - 0s - loss: 178.9999 - val_loss: 148.3594
Epoch 34/50
23/23 - 0s - loss: 177.9026 - val_loss: 148.2332
Epoch 35/50
23/23 - 0s - loss: 175.2612 - val_loss: 147.1378
Epoch 36/50
23/23 - 0s - loss: 173.8643 - val_loss: 147.7946
Epoch 37/50
23/23 - 0s - loss: 172.2210 - val_loss: 144.5125
Epoch 38/50
23/23 - 0s - loss: 170.1053 - val_loss: 144.3433
Epoch 39/50
23/23 - 0s - loss: 168.9363 - val_loss: 144.4762
Epoch 40/50
23/23 - 0s - loss: 167.5355 - val_loss: 141.9494
Epoch 41/50
23/23 - 0s - loss: 165.8962 - val_loss: 141.1074
Epoch 42/50
23/23 - 0s - lo

In [31]:
rmse_mean = np.mean(rmse_list)
rmse_std = np.std(rmse_list)
print("Mean : {} & Standard Deviation: {}".format(rmse_mean,rmse_std))

Mean : 15.37417347209806 & Standard Deviation: 5.046770673194134


B. Normalize the data - subtracting the mean and dividing by the standard deviation.
repeat the same process with normalized predictors


In [18]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [32]:
rmse_list_b =  []

for i in range(0,50):
    print("Iteration : "+str(i))
    # Get test and training data from new predictor normalized values
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3,random_state=42)
    # build the model
    model = regression_model()
    # fit the model
    model.fit(X_train, y_train,epochs=50, validation_data=(X_test, y_test) , verbose=2)
    # evaluate the model
    y_pred = model.predict(X_test)
    rmse_list_b.append(np.sqrt(mean_squared_error(y_test, y_pred))) 


793.7026
Epoch 28/50
23/23 - 0s - loss: 791.4631 - val_loss: 761.7501
Epoch 29/50
23/23 - 0s - loss: 757.5076 - val_loss: 730.7083
Epoch 30/50
23/23 - 0s - loss: 724.4641 - val_loss: 699.4601
Epoch 31/50
23/23 - 0s - loss: 691.5662 - val_loss: 669.6746
Epoch 32/50
23/23 - 0s - loss: 660.2375 - val_loss: 640.3816
Epoch 33/50
23/23 - 0s - loss: 630.0383 - val_loss: 611.8934
Epoch 34/50
23/23 - 0s - loss: 600.4311 - val_loss: 585.5704
Epoch 35/50
23/23 - 0s - loss: 573.0035 - val_loss: 559.0477
Epoch 36/50
23/23 - 0s - loss: 546.3494 - val_loss: 534.1003
Epoch 37/50
23/23 - 0s - loss: 521.1490 - val_loss: 509.8691
Epoch 38/50
23/23 - 0s - loss: 497.0578 - val_loss: 487.7932
Epoch 39/50
23/23 - 0s - loss: 474.9696 - val_loss: 466.1114
Epoch 40/50
23/23 - 0s - loss: 453.6747 - val_loss: 445.8498
Epoch 41/50
23/23 - 0s - loss: 433.8264 - val_loss: 426.9724
Epoch 42/50
23/23 - 0s - loss: 415.6124 - val_loss: 408.7473
Epoch 43/50
23/23 - 0s - loss: 398.1325 - val_loss: 392.3541
Epoch 44/50
23/

In [33]:
rmse_mean_b = np.mean(rmse_list_b)
rmse_std_b = np.std(rmse_list_b)
print("(Normalized) Mean: {} & Standard Deviation: {}".format(rmse_mean_b,rmse_std_b))

(Normalized) Mean: 18.444794671078675 & Standard Deviation: 2.5573457286630488


C. Lets increase the epochs to 100

In [21]:
rmse_list_c =  []

for i in range(0,50):
    print("Iteration : "+str(i))
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3,random_state=42)
    # build the model
    model = regression_model()
    # fit the model
    model.fit(X_train, y_train,epochs=100, validation_data=(X_test, y_test) , verbose=2)
    # evaluate the model
    y_pred = model.predict(X_test)
    rmse_list_c.append(np.sqrt(mean_squared_error(y_test, y_pred))) 


och 81/100
23/23 - 0s - loss: 176.0107 - val_loss: 164.4622
Epoch 82/100
23/23 - 0s - loss: 174.6428 - val_loss: 163.0094
Epoch 83/100
23/23 - 0s - loss: 173.1603 - val_loss: 161.8493
Epoch 84/100
23/23 - 0s - loss: 171.8309 - val_loss: 160.3820
Epoch 85/100
23/23 - 0s - loss: 170.5053 - val_loss: 159.0677
Epoch 86/100
23/23 - 0s - loss: 169.2324 - val_loss: 157.7787
Epoch 87/100
23/23 - 0s - loss: 167.9895 - val_loss: 156.7231
Epoch 88/100
23/23 - 0s - loss: 166.8070 - val_loss: 155.4565
Epoch 89/100
23/23 - 0s - loss: 165.5728 - val_loss: 154.4194
Epoch 90/100
23/23 - 0s - loss: 164.5314 - val_loss: 153.4077
Epoch 91/100
23/23 - 0s - loss: 163.3354 - val_loss: 152.3983
Epoch 92/100
23/23 - 0s - loss: 162.1712 - val_loss: 151.3975
Epoch 93/100
23/23 - 0s - loss: 161.1549 - val_loss: 150.3482
Epoch 94/100
23/23 - 0s - loss: 159.9953 - val_loss: 149.4117
Epoch 95/100
23/23 - 0s - loss: 159.0034 - val_loss: 148.6315
Epoch 96/100
23/23 - 0s - loss: 158.0686 - val_loss: 147.4882
Epoch 97/1

In [22]:
rmse_mean_c = np.mean(rmse_list_c)
rmse_std_c = np.std(rmse_list_c)
print("(Normalized + 100 Epochs) Mean: {} & Standard Deviation: {}".format(rmse_mean_c,rmse_std_c))

Mean (Normalized + 100 Epochs): 12.447875884499453 & Standard Deviation(Normalized + 100 Epochs): 0.4639573994718731


D.New Model with Three hidden layers, each of 10 nodes and ReLU activation function.

In [25]:
# define regression model
def regression_model_2():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [26]:
rmse_list_d =  []

for i in range(0,50):
    print("Iteration : "+str(i))
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3,random_state=42)
    # build the new model
    model = regression_model_2()
    # fit the model
    model.fit(X_train, y_train,epochs=100, validation_data=(X_test, y_test) , verbose=2)
    # evaluate the model against test data
    y_pred = model.predict(X_test)
    rmse_list_d.append(np.sqrt(mean_squared_error(y_test, y_pred))) 

908 - val_loss: 104.2542
Epoch 78/100
23/23 - 0s - loss: 96.9838 - val_loss: 102.9267
Epoch 79/100
23/23 - 0s - loss: 94.5743 - val_loss: 102.6439
Epoch 80/100
23/23 - 0s - loss: 93.1012 - val_loss: 100.5902
Epoch 81/100
23/23 - 0s - loss: 91.9971 - val_loss: 99.1869
Epoch 82/100
23/23 - 0s - loss: 90.5820 - val_loss: 98.4939
Epoch 83/100
23/23 - 0s - loss: 89.3547 - val_loss: 96.9177
Epoch 84/100
23/23 - 0s - loss: 88.2677 - val_loss: 96.0509
Epoch 85/100
23/23 - 0s - loss: 86.8790 - val_loss: 95.4020
Epoch 86/100
23/23 - 0s - loss: 85.5234 - val_loss: 93.7451
Epoch 87/100
23/23 - 0s - loss: 84.7618 - val_loss: 92.3971
Epoch 88/100
23/23 - 0s - loss: 83.0052 - val_loss: 91.2941
Epoch 89/100
23/23 - 0s - loss: 81.6826 - val_loss: 89.8130
Epoch 90/100
23/23 - 0s - loss: 80.3926 - val_loss: 88.5663
Epoch 91/100
23/23 - 0s - loss: 79.2539 - val_loss: 87.0813
Epoch 92/100
23/23 - 0s - loss: 78.3488 - val_loss: 86.5066
Epoch 93/100
23/23 - 0s - loss: 76.8622 - val_loss: 85.0911
Epoch 94/100

In [27]:
rmse_mean_d = np.mean(rmse_list_d)
rmse_std_d = np.std(rmse_list_d)
print("(Normalized + 100 Epochs+ 3 hidden layers) Mean: {} & Standard Deviation: {}".format(rmse_mean_d,rmse_std_d))

(Normalized + 100 Epochs+ 3 hidden layers) Mean: 9.472857043752427 & Standard Deviation: 1.2336552251749198


Lets summarize our findings

In [35]:
list_actions = ['Initial','Normalized','Normalized + 100 Epochs','Normalized + 100 Epochs + 3 hidden layers']
list_mean = [rmse_mean,rmse_mean_b,rmse_mean_c,rmse_mean_d]
list_std = [rmse_std,rmse_std_b,rmse_std_c,rmse_std_d]

df = pd.DataFrame(list_actions, index=['Part A','Part B','Part C','Part D'])
df.columns = ['Description']
df.insert(loc=1, column='Mean Squared Error', value=list_mean)
df.insert(loc=2, column='Standard Deviation', value=list_std)
df

Unnamed: 0,Description,Mean Squared Error,Standard Deviation
Part A,Initial,15.374173,5.046771
Part B,Normalized,18.444795,2.557346
Part C,Normalized + 100 Epochs,12.447876,0.463957
Part D,Normalized + 100 Epochs + 3 hidden layers,9.472857,1.233655


We can see reduction in Mean Squared Error and Standard Deviation through subsequent changes to the training of model.