# A. Build a baseline model 

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the 
train_test_split
helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

Submit your Jupyter Notebook with your code and comments.

# Import Keras

In [97]:
import keras #import the keras library

from keras.models import Sequential 
from keras.layers import Dense
from keras.optimizers import Adam

# Build a Neural Network

In [98]:
#define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation = 'relu')) #One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer = 'adam', loss = 'mean_squared_error') #Use the adam optimizer and the mean squared error as the loss function.
    return model

# Load data from csv file

In [99]:
import pandas as pd

In [100]:
#download the data
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [101]:
# check how many points we have
concrete_data.shape

(1030, 9)

In [102]:
#check data for any missing values
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [103]:
#check for any nulls
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

# Split data into traning and testing sets

In [104]:
# Random split the model with scikit-learn
predictors_train, predictors_test, target_train, target_test = train_test_split(predictors, target, test_size = 0.3, random_state = 42)
#30% of the data for testing.
#random_state ensures the random processes are the same across different runs of the code.

# Train and Test the Network

In [105]:
#import scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [106]:
# build the model
model = regression_model()

In [107]:
#fit the model
model.fit(predictors_train, target_train, epochs=50, verbose=2) 
#train for 50 epochs.
#verbose=2: Verbose mode with one line per epoch, showing the progress bar and some additional information.


Epoch 1/50
23/23 - 0s - loss: 486385.4375 - 280ms/epoch - 12ms/step
Epoch 2/50
23/23 - 0s - loss: 284785.3750 - 14ms/epoch - 609us/step
Epoch 3/50
23/23 - 0s - loss: 150851.6250 - 16ms/epoch - 695us/step
Epoch 4/50
23/23 - 0s - loss: 70959.6953 - 16ms/epoch - 696us/step
Epoch 5/50
23/23 - 0s - loss: 29379.3457 - 17ms/epoch - 739us/step
Epoch 6/50
23/23 - 0s - loss: 10908.5596 - 16ms/epoch - 714us/step
Epoch 7/50
23/23 - 0s - loss: 4284.2183 - 17ms/epoch - 740us/step
Epoch 8/50
23/23 - 0s - loss: 2400.7639 - 17ms/epoch - 739us/step
Epoch 9/50
23/23 - 0s - loss: 1972.5153 - 17ms/epoch - 739us/step
Epoch 10/50
23/23 - 0s - loss: 1858.8336 - 19ms/epoch - 826us/step
Epoch 11/50
23/23 - 0s - loss: 1800.2673 - 21ms/epoch - 892us/step
Epoch 12/50
23/23 - 0s - loss: 1745.6765 - 22ms/epoch - 956us/step
Epoch 13/50
23/23 - 0s - loss: 1692.2970 - 17ms/epoch - 744us/step
Epoch 14/50
23/23 - 0s - loss: 1638.6847 - 17ms/epoch - 739us/step
Epoch 15/50
23/23 - 0s - loss: 1586.5044 - 20ms/epoch - 870us/

<keras.callbacks.History at 0x26442d31070>

# Evaluate the model on the test data 

In [108]:
predictions = model.predict(predictors_test)

#mean square error
mse = mean_squared_error(target_test, predictions)



# Repeat 50 times 

In [162]:
num_iterations = 50 

In [110]:
mse_list = [] #List to store the mean squared errors.

In [111]:
for iteration in range(num_iterations):
    #repeat what done earlier:
    model.fit(predictors_train, target_train, epochs = 50, verbose = 2) #train the neural network.
    
    predictions = model.predict(predictors_test)
    mse = mean_squared_error(target_test, predictions) #evaluate the test data.
    
    mse_list.append(mse)

Epoch 1/50
23/23 - 0s - loss: 348.1536 - 16ms/epoch - 708us/step
Epoch 2/50
23/23 - 0s - loss: 335.7025 - 17ms/epoch - 738us/step
Epoch 3/50
23/23 - 0s - loss: 322.6603 - 18ms/epoch - 787us/step
Epoch 4/50
23/23 - 0s - loss: 311.6375 - 20ms/epoch - 870us/step
Epoch 5/50
23/23 - 0s - loss: 300.1060 - 21ms/epoch - 913us/step
Epoch 6/50
23/23 - 0s - loss: 290.7174 - 19ms/epoch - 826us/step
Epoch 7/50
23/23 - 0s - loss: 279.5477 - 17ms/epoch - 740us/step
Epoch 8/50
23/23 - 0s - loss: 270.6174 - 17ms/epoch - 744us/step
Epoch 9/50
23/23 - 0s - loss: 262.3084 - 17ms/epoch - 739us/step
Epoch 10/50
23/23 - 0s - loss: 254.5950 - 20ms/epoch - 870us/step
Epoch 11/50
23/23 - 0s - loss: 246.8700 - 18ms/epoch - 783us/step
Epoch 12/50
23/23 - 0s - loss: 240.2424 - 17ms/epoch - 739us/step
Epoch 13/50
23/23 - 0s - loss: 233.6895 - 19ms/epoch - 805us/step
Epoch 14/50
23/23 - 0s - loss: 227.9000 - 18ms/epoch - 783us/step
Epoch 15/50
23/23 - 0s - loss: 222.3721 - 16ms/epoch - 696us/step
Epoch 16/50
23/23 -

# Report Mean and s.d. of the mean squared errors

In [112]:
#import Numpy
import numpy as np

In [113]:
mean_mse = np.mean(mse_list)

In [114]:
std_mse = np.std(mse_list)

In [115]:
print(f'Mean of Mean Squared Errors: {mean_mse}')
print(f'Standard Deviation of Mean Squared Errors: {std_mse}')

Mean of Mean Squared Errors: 113.10754443680304
Standard Deviation of Mean Squared Errors: 9.11749022215385


# B. Normalize the data 

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

How does the mean of the mean squared errors compare to that from Step A?

# Normalize the data

In [138]:
predictors_train.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
196,194.7,0.0,100.5,165.6,7.5,1006.4,905.9,28
631,325.0,0.0,0.0,184.0,0.0,1063.0,783.0,7
81,318.8,212.5,0.0,155.7,14.3,852.1,880.4,3
526,359.0,19.0,141.0,154.0,10.9,942.0,801.0,3
830,162.0,190.0,148.0,179.0,19.0,838.0,741.0,28


In [139]:
predictors_test.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
31,266.0,114.0,0.0,228.0,0.0,932.0,670.0,365
109,362.6,189.0,0.0,164.9,11.6,944.7,755.8,7
136,389.9,189.0,0.0,145.9,22.0,944.7,755.8,28
88,362.6,189.0,0.0,164.9,11.6,944.7,755.8,3
918,145.0,0.0,179.0,202.0,8.0,824.0,869.0,28


In [140]:
#Normalize the data by substracting the mean and dividing by the standard deviation.
predictors_train_norm = (predictors_train - predictors_train.mean()) / predictors_train.std()
predictors_train_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
196,-0.827909,-0.854703,0.761173,-0.765956,0.227116,0.415257,1.67564,-0.292777
631,0.374563,-0.854703,-0.816346,0.103676,-1.013292,1.13619,0.141805,-0.633406
81,0.317346,1.567846,-0.816346,-1.233856,1.351752,-1.550115,1.357391,-0.698287
526,0.688331,-0.638098,1.39689,-1.314203,0.789434,-0.405028,0.366451,-0.698287
830,-1.12968,1.311341,1.506767,-0.132637,2.129073,-1.729712,-0.382369,-0.292777


In [141]:
predictors_test_norm = (predictors_test - predictors_test.mean()) / predictors_test.std()
predictors_test_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
31,-0.080275,0.513997,-0.919251,2.152678,-1.099856,-0.511079,-1.34739,4.803073
109,0.940408,1.418579,-0.919251,-0.737984,0.897876,-0.34408,-0.278088,-0.566292
136,1.228861,1.418579,-0.919251,-1.60839,2.688945,-0.34408,-0.278088,-0.251329
88,0.940408,1.418579,-0.919251,-0.737984,0.897876,-0.34408,-0.278088,-0.626285
918,-1.358769,-0.860966,1.85663,0.961597,0.27789,-1.931229,1.132692,-0.251329


# Compare the mean of mse in step A

In [142]:
mse_listB = []

In [143]:
#calculate the normazized mse_list in Step B
for iteration in range(num_iterations):
    model.fit(predictors_train_norm, target_train, epochs = 50, verbose = 2) #train the neural network.
    
    predictions = model.predict(predictors_test_norm)
    mse = mean_squared_error(target_test, predictions) #evaluate the test data.
    
    mse_listB.append(mse)

Epoch 1/50
23/23 - 0s - loss: 1632.5500 - 18ms/epoch - 784us/step
Epoch 2/50
23/23 - 0s - loss: 1572.4868 - 18ms/epoch - 794us/step
Epoch 3/50
23/23 - 0s - loss: 1522.5582 - 17ms/epoch - 739us/step
Epoch 4/50
23/23 - 0s - loss: 1478.3322 - 18ms/epoch - 783us/step
Epoch 5/50
23/23 - 0s - loss: 1436.2235 - 16ms/epoch - 696us/step
Epoch 6/50
23/23 - 0s - loss: 1394.7446 - 17ms/epoch - 739us/step
Epoch 7/50
23/23 - 0s - loss: 1354.5435 - 18ms/epoch - 783us/step
Epoch 8/50
23/23 - 0s - loss: 1314.7755 - 17ms/epoch - 718us/step
Epoch 9/50
23/23 - 0s - loss: 1275.8309 - 17ms/epoch - 739us/step
Epoch 10/50
23/23 - 0s - loss: 1237.3947 - 17ms/epoch - 738us/step
Epoch 11/50
23/23 - 0s - loss: 1200.1224 - 17ms/epoch - 739us/step
Epoch 12/50
23/23 - 0s - loss: 1163.6002 - 17ms/epoch - 739us/step
Epoch 13/50
23/23 - 0s - loss: 1127.3263 - 17ms/epoch - 719us/step
Epoch 14/50
23/23 - 0s - loss: 1092.0508 - 17ms/epoch - 721us/step
Epoch 15/50
23/23 - 0s - loss: 1058.5516 - 18ms/epoch - 782us/step
Epoc

In [144]:
#mean of mse in step A
print(f'Mean of Mean Squared Errors: {mean_mse}')

Mean of Mean Squared Errors: 113.10754443680304


In [145]:
#mean of mse in step B
mean_mseB = np.mean(mse_listB)
print(f'Mean of Mean Squared Errors: {mean_mseB}')

Mean of Mean Squared Errors: 61.08719832938643


# Conclusion:

The mean of Mean Squared Erros are almost half size smaller than in step A.

# C. Increate the number of epochs to 100

Repeat Part B but use 100 epochs this time for training.

How does the mean of the mean squared errors compare to that from Step B?

In [163]:
mse_listC = []

In [164]:
#calculate the normazized mse_list in Step B
for iteration in range(num_iterations):
    model.fit(predictors_train_norm, target_train, epochs = 100, verbose = 2) #train the neural network.
    
    predictions = model.predict(predictors_test_norm)
    mse = mean_squared_error(target_test, predictions) #evaluate the test data.
    
    mse_listC.append(mse)

Epoch 1/100
23/23 - 0s - loss: 27.6821 - 18ms/epoch - 771us/step
Epoch 2/100
23/23 - 0s - loss: 27.6778 - 15ms/epoch - 671us/step
Epoch 3/100
23/23 - 0s - loss: 27.6567 - 15ms/epoch - 652us/step
Epoch 4/100
23/23 - 0s - loss: 27.6153 - 18ms/epoch - 783us/step
Epoch 5/100
23/23 - 0s - loss: 27.6375 - 17ms/epoch - 739us/step
Epoch 6/100
23/23 - 0s - loss: 27.6498 - 17ms/epoch - 738us/step
Epoch 7/100
23/23 - 0s - loss: 27.6256 - 16ms/epoch - 696us/step
Epoch 8/100
23/23 - 0s - loss: 27.6429 - 17ms/epoch - 718us/step
Epoch 9/100
23/23 - 0s - loss: 27.7264 - 16ms/epoch - 695us/step
Epoch 10/100
23/23 - 0s - loss: 27.6395 - 16ms/epoch - 696us/step
Epoch 11/100
23/23 - 0s - loss: 27.6395 - 17ms/epoch - 739us/step
Epoch 12/100
23/23 - 0s - loss: 27.6469 - 16ms/epoch - 696us/step
Epoch 13/100
23/23 - 0s - loss: 27.6157 - 18ms/epoch - 784us/step
Epoch 14/100
23/23 - 0s - loss: 27.6429 - 18ms/epoch - 783us/step
Epoch 15/100
23/23 - 0s - loss: 27.6714 - 17ms/epoch - 739us/step
Epoch 16/100
23/23 

In [165]:
#mean of mse in step B
print(f'Mean of Mean Squared Errors: {mean_mseB}')

Mean of Mean Squared Errors: 61.08719832938643


In [166]:
#mean of mse in step C
mean_mseC = np.mean(mse_listC)
print(f'Mean of Mean Squared Errors: {mean_mseC}')

Mean of Mean Squared Errors: 53.45038127376013


# Conclusion

The mean square error got smaller from 61.09 to 53.45 essentially.

# D. Increase the number of hidden layers 

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

How does the mean of the mean squared errors compare to that from Step B?

In [171]:
#Build a 3 hidden layers neural network with 10 nodes and ReLU activation function for each
def regression_model_three_layer():
    # create model
    model = Sequential()
    model.add(Dense(10, activation = 'relu')) #One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(10, activation = 'relu')) #One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(10, activation = 'relu')) #One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer = 'adam', loss = 'mean_squared_error') #Use the adam optimizer and the mean squared error as the loss function.
    return model

In [172]:
mse_listD = []

In [173]:
model3 = regression_model_three_layer()

In [174]:
#calculate the normazized mse_list in Step B
for iteration in range(num_iterations):
    model3.fit(predictors_train_norm, target_train, epochs = 50, verbose = 2) #train the neural network.
    
    predictions = model.predict(predictors_test_norm)
    mse = mean_squared_error(target_test, predictions) #evaluate the test data.
    
    mse_listD.append(mse)

Epoch 1/50
23/23 - 0s - loss: 1588.8141 - 350ms/epoch - 15ms/step
Epoch 2/50
23/23 - 0s - loss: 1555.9117 - 15ms/epoch - 652us/step
Epoch 3/50
23/23 - 0s - loss: 1509.8563 - 17ms/epoch - 739us/step
Epoch 4/50
23/23 - 0s - loss: 1441.6788 - 21ms/epoch - 892us/step
Epoch 5/50
23/23 - 0s - loss: 1337.0587 - 19ms/epoch - 826us/step
Epoch 6/50
23/23 - 0s - loss: 1182.8137 - 19ms/epoch - 826us/step
Epoch 7/50
23/23 - 0s - loss: 973.3470 - 20ms/epoch - 870us/step
Epoch 8/50
23/23 - 0s - loss: 735.5505 - 16ms/epoch - 696us/step
Epoch 9/50
23/23 - 0s - loss: 524.3343 - 17ms/epoch - 739us/step
Epoch 10/50
23/23 - 0s - loss: 381.2111 - 22ms/epoch - 936us/step
Epoch 11/50
23/23 - 0s - loss: 300.6748 - 19ms/epoch - 826us/step
Epoch 12/50
23/23 - 0s - loss: 258.4716 - 18ms/epoch - 783us/step
Epoch 13/50
23/23 - 0s - loss: 235.6461 - 22ms/epoch - 957us/step
Epoch 14/50
23/23 - 0s - loss: 222.9384 - 21ms/epoch - 893us/step
Epoch 15/50
23/23 - 0s - loss: 214.3007 - 20ms/epoch - 870us/step
Epoch 16/50
2

In [175]:
#mean of mse in step B
print(f'Mean of Mean Squared Errors: {mean_mseB}')

Mean of Mean Squared Errors: 61.08719832938643


In [176]:
#mean of mse in step D
mean_mseD = np.mean(mse_listD)
print(f'Mean of Mean Squared Errors: {mean_mseD}')

Mean of Mean Squared Errors: 54.806222452022766


# Conclusion

The mean of Mean Squared Errors in Step D (for 3 hidden layers) is smaller than the value we found in step A (for one single hidden layer).