# Build a Regression Model in Keras
In this course project, you will build a regression model using the deep learning Keras library, and then you will experiment with increasing the number of training epochs and changing number of hidden layers and you will see how changing these parameters impacts the performance of the model.

## Review Criteria

This assignment will be marked by your peers and will be worth 20% of your total grade. The breakdown will be:

Part A: 5 marks

Part B: 5 marks

Part C: 5 marks

Part D: 5 marks

## Step-By-Step Assignment Instructions

### 1. Assignment Topic:

In this project, you will build a regression model using the Keras library to model the same data about concrete compressive strength that we used in labs 3.

### 2. Concrete Data:

For your convenience, the data can be found here again: https://cocl.us/concrete_data. To recap, the predictors in the data of concrete strength include:

1. Cement
2. Blast Furnace Slag
3. Fly Ash
4. Water
5. Superplasticizer
6. Coarse Aggregate
7. Fine Aggregate

In [1]:
import pandas as pd
import numpy as np

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
concrete_data.shape

(1030, 9)

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

### 3. Assignment Instructions 

#### A. Build a baseline model (5 marks)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

Split the data into training and test data.

In [6]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
p_train, p_test, t_train, t_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

In [11]:
p_train

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
196,194.7,0.0,100.5,165.6,7.5,1006.4,905.9,28
631,325.0,0.0,0.0,184.0,0.0,1063.0,783.0,7
81,318.8,212.5,0.0,155.7,14.3,852.1,880.4,3
526,359.0,19.0,141.0,154.0,10.9,942.0,801.0,3
830,162.0,190.0,148.0,179.0,19.0,838.0,741.0,28
...,...,...,...,...,...,...,...,...
87,286.3,200.9,0.0,144.7,11.2,1004.6,803.7,3
330,246.8,0.0,125.1,143.3,12.0,1086.8,800.9,14
466,190.3,0.0,125.2,166.6,9.9,1079.0,798.9,100
121,475.0,118.8,0.0,181.1,8.9,852.1,781.5,28


In [12]:
t_train

196    25.72
631    17.54
81     25.20
526    23.64
830    33.76
       ...  
87     24.40
330    42.22
466    33.56
121    68.30
860    38.46
Name: Strength, Length: 721, dtype: float64

In [13]:
n_cols = predictors.shape[1]

In [14]:
import keras
from keras.models import Sequential
from keras.layers import Dense

In [16]:
def classification_model(number_of_hidden_layers=1):
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    for _ in range(number_of_hidden_layers):
        model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [17]:
# build the model
model = classification_model()

history = model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)

Epoch 1/50
23/23 - 0s - loss: 741778.5000 - val_loss: 354278.8438
Epoch 2/50
23/23 - 0s - loss: 179133.4531 - val_loss: 30152.4922
Epoch 3/50
23/23 - 0s - loss: 10652.0801 - val_loss: 4773.3584
Epoch 4/50
23/23 - 0s - loss: 6647.0928 - val_loss: 3003.5303
Epoch 5/50
23/23 - 0s - loss: 5091.5654 - val_loss: 2994.8311
Epoch 6/50
23/23 - 0s - loss: 4586.6699 - val_loss: 2605.4287
Epoch 7/50
23/23 - 0s - loss: 4234.3901 - val_loss: 2433.7676
Epoch 8/50
23/23 - 0s - loss: 3875.4778 - val_loss: 2263.9448
Epoch 9/50
23/23 - 0s - loss: 3544.5422 - val_loss: 2093.1904
Epoch 10/50
23/23 - 0s - loss: 3231.4602 - val_loss: 1945.9033
Epoch 11/50
23/23 - 0s - loss: 2950.8416 - val_loss: 1779.7709
Epoch 12/50
23/23 - 0s - loss: 2700.3447 - val_loss: 1647.3555
Epoch 13/50
23/23 - 0s - loss: 2456.4365 - val_loss: 1545.5496
Epoch 14/50
23/23 - 0s - loss: 2241.3755 - val_loss: 1438.6107
Epoch 15/50
23/23 - 0s - loss: 2050.6543 - val_loss: 1342.8519
Epoch 16/50
23/23 - 0s - loss: 1873.2642 - val_loss: 125

In [18]:
mse = history.history['val_loss'][-1]
mse

403.3968200683594

In [None]:
mses = []
for _ in range(50):
    model = classification_model()
    mses.append(model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2).history['val_loss'][-1])

Epoch 1/50
23/23 - 0s - loss: 263.0257 - val_loss: 215.7152
Epoch 2/50
23/23 - 0s - loss: 227.9920 - val_loss: 185.2743
Epoch 3/50
23/23 - 0s - loss: 218.0305 - val_loss: 173.7897
Epoch 4/50
23/23 - 0s - loss: 210.0826 - val_loss: 164.1433
Epoch 5/50
23/23 - 0s - loss: 204.5463 - val_loss: 171.3229
Epoch 6/50
23/23 - 0s - loss: 201.1040 - val_loss: 162.8923
Epoch 7/50
23/23 - 0s - loss: 195.3816 - val_loss: 149.7326
Epoch 8/50
23/23 - 0s - loss: 190.7119 - val_loss: 163.7332
Epoch 9/50
23/23 - 0s - loss: 188.3505 - val_loss: 148.1870
Epoch 10/50
23/23 - 0s - loss: 185.9009 - val_loss: 143.4464
Epoch 11/50
23/23 - 0s - loss: 183.1894 - val_loss: 137.0417
Epoch 12/50
23/23 - 0s - loss: 182.4208 - val_loss: 138.0639
Epoch 13/50
23/23 - 0s - loss: 180.3596 - val_loss: 143.8092
Epoch 14/50
23/23 - 0s - loss: 173.3993 - val_loss: 143.3804
Epoch 15/50
23/23 - 0s - loss: 171.1282 - val_loss: 161.8365
Epoch 16/50
23/23 - 0s - loss: 173.0867 - val_loss: 153.8988
Epoch 17/50
23/23 - 0s - loss: 16

Epoch 36/50
23/23 - 0s - loss: 318.8294 - val_loss: 228.4406
Epoch 37/50
23/23 - 0s - loss: 311.4042 - val_loss: 235.4987
Epoch 38/50
23/23 - 0s - loss: 303.2945 - val_loss: 219.6166
Epoch 39/50
23/23 - 0s - loss: 295.4662 - val_loss: 212.0551
Epoch 40/50
23/23 - 0s - loss: 289.9875 - val_loss: 198.0489
Epoch 41/50
23/23 - 0s - loss: 281.3578 - val_loss: 216.8077
Epoch 42/50
23/23 - 0s - loss: 277.8694 - val_loss: 206.7755
Epoch 43/50
23/23 - 0s - loss: 269.4404 - val_loss: 196.4112
Epoch 44/50
23/23 - 0s - loss: 264.4266 - val_loss: 180.1418
Epoch 45/50
23/23 - 0s - loss: 260.4202 - val_loss: 203.6991
Epoch 46/50
23/23 - 0s - loss: 254.1015 - val_loss: 180.5992
Epoch 47/50
23/23 - 0s - loss: 248.7690 - val_loss: 166.9274
Epoch 48/50
23/23 - 0s - loss: 249.0365 - val_loss: 180.5292
Epoch 49/50
23/23 - 0s - loss: 241.9357 - val_loss: 165.8226
Epoch 50/50
23/23 - 0s - loss: 235.8055 - val_loss: 163.8498
Epoch 1/50
23/23 - 0s - loss: 22352.6250 - val_loss: 11730.6924
Epoch 2/50
23/23 - 0s

Epoch 20/50
23/23 - 0s - loss: 156.9849 - val_loss: 147.4167
Epoch 21/50
23/23 - 0s - loss: 148.3490 - val_loss: 155.1842
Epoch 22/50
23/23 - 0s - loss: 143.4302 - val_loss: 157.1603
Epoch 23/50
23/23 - 0s - loss: 140.3721 - val_loss: 152.7765
Epoch 24/50
23/23 - 0s - loss: 140.3168 - val_loss: 171.4224
Epoch 25/50
23/23 - 0s - loss: 138.3065 - val_loss: 178.2561
Epoch 26/50
23/23 - 0s - loss: 136.9516 - val_loss: 151.9977
Epoch 27/50
23/23 - 0s - loss: 134.2310 - val_loss: 144.9882
Epoch 28/50
23/23 - 0s - loss: 139.4535 - val_loss: 143.8331
Epoch 29/50
23/23 - 0s - loss: 137.9500 - val_loss: 192.5611
Epoch 30/50
23/23 - 0s - loss: 141.6259 - val_loss: 135.7545
Epoch 31/50
23/23 - 0s - loss: 132.2956 - val_loss: 136.5634
Epoch 32/50
23/23 - 0s - loss: 140.0877 - val_loss: 132.9626
Epoch 33/50
23/23 - 0s - loss: 132.9153 - val_loss: 139.6146
Epoch 34/50
23/23 - 0s - loss: 128.9726 - val_loss: 146.2348
Epoch 35/50
23/23 - 0s - loss: 127.4742 - val_loss: 133.0620
Epoch 36/50
23/23 - 0s -

Epoch 4/50
23/23 - 0s - loss: 1238.1173 - val_loss: 1290.1295
Epoch 5/50
23/23 - 0s - loss: 1109.0927 - val_loss: 1134.6658
Epoch 6/50
23/23 - 0s - loss: 1052.6129 - val_loss: 1051.8906
Epoch 7/50
23/23 - 0s - loss: 987.6514 - val_loss: 982.8220
Epoch 8/50
23/23 - 0s - loss: 929.7112 - val_loss: 909.5162
Epoch 9/50
23/23 - 0s - loss: 873.4098 - val_loss: 822.8059
Epoch 10/50
23/23 - 0s - loss: 817.6308 - val_loss: 780.4646
Epoch 11/50
23/23 - 0s - loss: 751.2393 - val_loss: 716.1453
Epoch 12/50
23/23 - 0s - loss: 689.9584 - val_loss: 650.0869
Epoch 13/50
23/23 - 0s - loss: 628.3169 - val_loss: 587.5564
Epoch 14/50
23/23 - 0s - loss: 572.6119 - val_loss: 508.1401
Epoch 15/50
23/23 - 0s - loss: 517.3846 - val_loss: 451.5414
Epoch 16/50
23/23 - 0s - loss: 466.4947 - val_loss: 484.0857
Epoch 17/50
23/23 - 0s - loss: 421.8880 - val_loss: 354.0492
Epoch 18/50
23/23 - 0s - loss: 384.5851 - val_loss: 319.8900
Epoch 19/50
23/23 - 0s - loss: 344.9807 - val_loss: 285.2241
Epoch 20/50
23/23 - 0s -

23/23 - 0s - loss: 109.8518 - val_loss: 84.9034
Epoch 40/50
23/23 - 0s - loss: 108.2628 - val_loss: 86.6326
Epoch 41/50
23/23 - 0s - loss: 107.2922 - val_loss: 81.6542
Epoch 42/50
23/23 - 0s - loss: 107.3212 - val_loss: 81.3113
Epoch 43/50
23/23 - 0s - loss: 105.2628 - val_loss: 80.7003
Epoch 44/50
23/23 - 0s - loss: 103.6167 - val_loss: 80.8068
Epoch 45/50
23/23 - 0s - loss: 102.5872 - val_loss: 81.0290
Epoch 46/50
23/23 - 0s - loss: 101.8534 - val_loss: 81.2880
Epoch 47/50
23/23 - 0s - loss: 100.7937 - val_loss: 83.7342
Epoch 48/50
23/23 - 0s - loss: 102.1872 - val_loss: 80.7075
Epoch 49/50
23/23 - 0s - loss: 98.8853 - val_loss: 76.4317
Epoch 50/50
23/23 - 0s - loss: 97.8990 - val_loss: 77.5146
Epoch 1/50
23/23 - 0s - loss: 1729.9454 - val_loss: 390.5802
Epoch 2/50
23/23 - 0s - loss: 395.2162 - val_loss: 351.6808
Epoch 3/50
23/23 - 0s - loss: 335.8435 - val_loss: 314.0500
Epoch 4/50
23/23 - 0s - loss: 311.7326 - val_loss: 308.1620
Epoch 5/50
23/23 - 0s - loss: 289.8962 - val_loss: 29

In [42]:
import statistics
mean_mses = statistics.mean(mses)
mean_mses

117.08843309052136

#### B. Normalize the data (5 marks)

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

**How does the mean of the mean squared errors compare to that from Step A?**

In [43]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [44]:
mses_b = []

for _ in range(50):
    model = classificaiton_model
    mses_b.append(model.fit(predictors_norm, target, validation_split=0.3, epochs=50, verbose=2).history['val_loss'][-1])

Epoch 1/50
23/23 - 0s - loss: 1638.6543 - val_loss: 1135.2166
Epoch 2/50
23/23 - 0s - loss: 1489.4409 - val_loss: 992.2640
Epoch 3/50
23/23 - 0s - loss: 1290.2433 - val_loss: 825.8729
Epoch 4/50
23/23 - 0s - loss: 1063.8208 - val_loss: 663.6851
Epoch 5/50
23/23 - 0s - loss: 838.8440 - val_loss: 525.3444
Epoch 6/50
23/23 - 0s - loss: 642.8145 - val_loss: 420.8675
Epoch 7/50
23/23 - 0s - loss: 482.3181 - val_loss: 348.7704
Epoch 8/50
23/23 - 0s - loss: 373.5791 - val_loss: 304.1924
Epoch 9/50
23/23 - 0s - loss: 307.2779 - val_loss: 275.7570
Epoch 10/50
23/23 - 0s - loss: 270.2526 - val_loss: 258.1033
Epoch 11/50
23/23 - 0s - loss: 249.0557 - val_loss: 246.3729
Epoch 12/50
23/23 - 0s - loss: 236.4117 - val_loss: 236.9051
Epoch 13/50
23/23 - 0s - loss: 226.6402 - val_loss: 227.7021
Epoch 14/50
23/23 - 0s - loss: 219.0582 - val_loss: 220.4647
Epoch 15/50
23/23 - 0s - loss: 212.7766 - val_loss: 213.4995
Epoch 16/50
23/23 - 0s - loss: 207.3124 - val_loss: 208.0290
Epoch 17/50
23/23 - 0s - los

Epoch 36/50
23/23 - 0s - loss: 103.7666 - val_loss: 111.1153
Epoch 37/50
23/23 - 0s - loss: 103.7094 - val_loss: 111.4944
Epoch 38/50
23/23 - 0s - loss: 103.2196 - val_loss: 111.2564
Epoch 39/50
23/23 - 0s - loss: 102.9822 - val_loss: 110.5969
Epoch 40/50
23/23 - 0s - loss: 102.8767 - val_loss: 110.6945
Epoch 41/50
23/23 - 0s - loss: 102.5226 - val_loss: 110.7307
Epoch 42/50
23/23 - 0s - loss: 102.2388 - val_loss: 110.1290
Epoch 43/50
23/23 - 0s - loss: 102.1406 - val_loss: 110.0521
Epoch 44/50
23/23 - 0s - loss: 101.8300 - val_loss: 110.0107
Epoch 45/50
23/23 - 0s - loss: 101.4843 - val_loss: 109.6741
Epoch 46/50
23/23 - 0s - loss: 101.2472 - val_loss: 110.0469
Epoch 47/50
23/23 - 0s - loss: 101.0670 - val_loss: 111.3422
Epoch 48/50
23/23 - 0s - loss: 100.9991 - val_loss: 110.3063
Epoch 49/50
23/23 - 0s - loss: 100.5489 - val_loss: 110.5194
Epoch 50/50
23/23 - 0s - loss: 100.4023 - val_loss: 109.1801
Epoch 1/50
23/23 - 0s - loss: 100.0513 - val_loss: 110.0727
Epoch 2/50
23/23 - 0s - l

Epoch 24/50
23/23 - 0s - loss: 72.1265 - val_loss: 94.7992
Epoch 25/50
23/23 - 0s - loss: 71.5884 - val_loss: 94.1176
Epoch 26/50
23/23 - 0s - loss: 71.2574 - val_loss: 94.5675
Epoch 27/50
23/23 - 0s - loss: 71.1232 - val_loss: 94.6623
Epoch 28/50
23/23 - 0s - loss: 70.7308 - val_loss: 92.9737
Epoch 29/50
23/23 - 0s - loss: 70.6098 - val_loss: 94.6382
Epoch 30/50
23/23 - 0s - loss: 70.1143 - val_loss: 92.9061
Epoch 31/50
23/23 - 0s - loss: 69.9629 - val_loss: 92.4863
Epoch 32/50
23/23 - 0s - loss: 69.7883 - val_loss: 94.6757
Epoch 33/50
23/23 - 0s - loss: 69.3578 - val_loss: 92.4138
Epoch 34/50
23/23 - 0s - loss: 69.3089 - val_loss: 93.1309
Epoch 35/50
23/23 - 0s - loss: 68.8697 - val_loss: 91.2781
Epoch 36/50
23/23 - 0s - loss: 68.5795 - val_loss: 93.0995
Epoch 37/50
23/23 - 0s - loss: 68.3584 - val_loss: 91.8172
Epoch 38/50
23/23 - 0s - loss: 68.0472 - val_loss: 93.2192
Epoch 39/50
23/23 - 0s - loss: 67.8033 - val_loss: 91.6763
Epoch 40/50
23/23 - 0s - loss: 67.5042 - val_loss: 91.49

Epoch 14/50
23/23 - 0s - loss: 45.7742 - val_loss: 91.4897
Epoch 15/50
23/23 - 0s - loss: 45.5846 - val_loss: 91.7609
Epoch 16/50
23/23 - 0s - loss: 45.4191 - val_loss: 90.6444
Epoch 17/50
23/23 - 0s - loss: 45.2744 - val_loss: 90.0962
Epoch 18/50
23/23 - 0s - loss: 45.2971 - val_loss: 90.2964
Epoch 19/50
23/23 - 0s - loss: 45.2829 - val_loss: 90.3183
Epoch 20/50
23/23 - 0s - loss: 45.0552 - val_loss: 90.5646
Epoch 21/50
23/23 - 0s - loss: 44.9519 - val_loss: 90.2729
Epoch 22/50
23/23 - 0s - loss: 45.1532 - val_loss: 91.4755
Epoch 23/50
23/23 - 0s - loss: 44.7385 - val_loss: 91.2922
Epoch 24/50
23/23 - 0s - loss: 44.7305 - val_loss: 91.0438
Epoch 25/50
23/23 - 0s - loss: 44.5667 - val_loss: 91.3934
Epoch 26/50
23/23 - 0s - loss: 44.5129 - val_loss: 90.9618
Epoch 27/50
23/23 - 0s - loss: 44.3055 - val_loss: 90.2098
Epoch 28/50
23/23 - 0s - loss: 44.3471 - val_loss: 90.7452
Epoch 29/50
23/23 - 0s - loss: 44.1115 - val_loss: 89.9894
Epoch 30/50
23/23 - 0s - loss: 44.0111 - val_loss: 90.02

KeyboardInterrupt: 

#### C. Increate the number of epochs (5 marks)

Repeat Part B **but use 100 epochs this time for training.**

**How does the mean of the mean squared errors compare to that from Step B?**

#### D. Increase the number of hidden layers (5 marks)

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

**How does the mean of the mean squared errors compare to that from Step B?**