Regression Model in Keras

In [18]:
import pandas as pd
import numpy as np

Get the dataset

In [19]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')

Prepare the data

In [20]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] 
target = concrete_data['Strength']
n_cols = predictors.shape[1]

Load Keras libary

In [21]:
import keras 
from keras.models import Sequential
from keras.layers import Dense

A. Build a baseline model

In [22]:
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [23]:
# build the model
model = baseline_model()


In [24]:
# Import tran_test_split
from sklearn.model_selection import train_test_split

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

In [25]:
x_train,x_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)

2. Train the model on the training data using 50 epochs.

In [26]:
model.fit(x_train, y_train, epochs=50, verbose=2)

Epoch 1/50
 - 0s - loss: 73730.8483
Epoch 2/50
 - 0s - loss: 14946.8661
Epoch 3/50
 - 0s - loss: 5498.2245
Epoch 4/50
 - 0s - loss: 4548.9001
Epoch 5/50
 - 0s - loss: 3559.6980
Epoch 6/50
 - 0s - loss: 2874.8914
Epoch 7/50
 - 0s - loss: 2373.2158
Epoch 8/50
 - 0s - loss: 1952.8746
Epoch 9/50
 - 0s - loss: 1626.7593
Epoch 10/50
 - 0s - loss: 1359.8177
Epoch 11/50
 - 0s - loss: 1139.5588
Epoch 12/50
 - 0s - loss: 965.1954
Epoch 13/50
 - 0s - loss: 832.9714
Epoch 14/50
 - 0s - loss: 725.7188
Epoch 15/50
 - 0s - loss: 646.4630
Epoch 16/50
 - 0s - loss: 572.8200
Epoch 17/50
 - 0s - loss: 517.3771
Epoch 18/50
 - 0s - loss: 473.7443
Epoch 19/50
 - 0s - loss: 434.4817
Epoch 20/50
 - 0s - loss: 403.5192
Epoch 21/50
 - 0s - loss: 380.4286
Epoch 22/50
 - 0s - loss: 356.4296
Epoch 23/50
 - 0s - loss: 340.2010
Epoch 24/50
 - 0s - loss: 326.1402
Epoch 25/50
 - 0s - loss: 316.2545
Epoch 26/50
 - 0s - loss: 305.8511
Epoch 27/50
 - 0s - loss: 297.8380
Epoch 28/50
 - 0s - loss: 298.3458
Epoch 29/50
 - 0

<keras.callbacks.History at 0x7f01e0216210>

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength.

In [27]:
from sklearn.metrics import mean_squared_error
y_pred=model.predict(x_test)
mean_squared_error(y_pred,y_test)

221.9238548205726

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [28]:
mse=[]

for i in range(0,50):
    x_train,x_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3,random_state=i+63)
    model.fit(x_train, y_train, epochs=50,verbose=0)
    y_pred=model.predict(x_test)
    mse.append(mean_squared_error(y_pred,y_test))

5. Report the mean and the standard deviation of the mean squared errors.

In [29]:
print('Mean of the mean squared errors: '+str(np.mean(mse)))
print('Standard deviation of the mean squared errors: '+str(np.std(mse)))

Mean of the mean squared errors: 55.196495149221455
Standard deviation of the mean squared errors: 16.52724811142092


B. Normalize the data

In [30]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [31]:
mse_norm=[]

for i in range(0,50):
    x_train_norm,x_test_norm, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3,random_state=i+63)
    model.fit(x_train_norm, y_train, epochs=50,verbose=0)
    y_pred=model.predict(x_test_norm)
    mse_norm.append(mean_squared_error(y_pred,y_test))

print('Mean of the mean squared errors (normalized data): '+str(np.mean(mse_norm)))
print('Standard deviation of the mean squared errors (normalized data): '+str(np.std(mse_norm)))

Mean of the mean squared errors (normalized data): 51.46225863628196
Standard deviation of the mean squared errors (normalized data): 50.35773656566548


Conclusion: with the normalized predictors, both the mean and standard deviation of the mean squared errors of the model predictions decrease compared to the pre-normalized trained model.

C. Increase the number of epochs

In [32]:
mse_100epcs=[]

for i in range(0,50):
    x_train_norm,x_test_norm, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3,random_state=i+63)
    model.fit(x_train_norm, y_train, epochs=100,verbose=0)
    y_pred=model.predict(x_test_norm)
    mse_100epcs.append(mean_squared_error(y_pred,y_test))

print('Mean of the mean squared errors (100 epochs): '+str(np.mean(mse_100epcs)))
print('Standard deviation of the mean squared errors (100 epochs): '+str(np.std(mse_100epcs)))

Mean of the mean squared errors (100 epochs): 27.571666599060414
Standard deviation of the mean squared errors (100 epochs): 2.1582590706067926


The mean of the mean squared errors decreases compared to that from Step B.

D. Increase the number of hidden layers

In [12]:
def enhanced_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [13]:
model_enhanced=enhanced_model()

In [14]:
mse_enh=[]

for i in range(0,50):
    x_train_norm,x_test_norm, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3,random_state=i+63)
    model_enhanced.fit(x_train_norm, y_train, epochs=50,verbose=0)
    y_pred=model_enhanced.predict(x_test_norm)
    mse_enh.append(mean_squared_error(y_pred,y_test))

print('Mean of the mean squared errors (enhanced model): '+str(np.mean(mse_enh)))
print('Standard deviation of the mean squared errors (enhanced model): '+str(np.std(mse_enh)))

Mean of the mean squared errors (enhanced model): 28.916239428805685
Standard deviation of the mean squared errors (enhanced model): 12.156192359296238
