## Build a Regression Model in Keras

A. Build a baseline model (5 marks)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

In [1]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


In [2]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column


In [4]:
n_cols = predictors.shape[1] # number of predictors

In [5]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [6]:
# build the model
model = regression_model()

Instructions for updating:
Colocations handled automatically by placer.


In [7]:
# fit the model
model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)

Instructions for updating:
Use tf.cast instead.
Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 1s - loss: 8116.1945 - val_loss: 1331.4697
Epoch 2/50
 - 0s - loss: 1180.1746 - val_loss: 1122.3880
Epoch 3/50
 - 0s - loss: 880.0846 - val_loss: 1044.0893
Epoch 4/50
 - 0s - loss: 704.6353 - val_loss: 1025.3368
Epoch 5/50
 - 0s - loss: 618.2367 - val_loss: 1029.3556
Epoch 6/50
 - 0s - loss: 555.7834 - val_loss: 1014.2989
Epoch 7/50
 - 0s - loss: 514.9053 - val_loss: 985.3188
Epoch 8/50
 - 1s - loss: 483.5298 - val_loss: 943.4723
Epoch 9/50
 - 0s - loss: 456.3448 - val_loss: 898.9151
Epoch 10/50
 - 0s - loss: 435.0102 - val_loss: 852.2643
Epoch 11/50
 - 0s - loss: 411.3667 - val_loss: 809.5996
Epoch 12/50
 - 0s - loss: 391.8816 - val_loss: 762.1050
Epoch 13/50
 - 0s - loss: 367.7425 - val_loss: 722.4221
Epoch 14/50
 - 0s - loss: 353.3408 - val_loss: 681.4385
Epoch 15/50
 - 0s - loss: 334.2795 - val_loss: 643.3894
Epoch 16/50
 - 0s - loss: 319.5816 - val_loss: 604.6509
Epoch 17/50

<keras.callbacks.History at 0x7fb76665ecc0>

B. Normalize the data (5 marks)

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

### How does the mean of the mean squared errors compare to that from Step A?

It's higher

In [8]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [9]:
n_cols = predictors_norm.shape[1] # number of predictors

In [10]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [11]:
# build the model
model = regression_model()

In [12]:
# fit the model
model.fit(predictors_norm, target, validation_split=0.3, epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 1s - loss: 1670.3226 - val_loss: 1192.0467
Epoch 2/50
 - 0s - loss: 1651.8594 - val_loss: 1180.2937
Epoch 3/50
 - 0s - loss: 1632.7083 - val_loss: 1167.8401
Epoch 4/50
 - 0s - loss: 1612.9688 - val_loss: 1154.9617
Epoch 5/50
 - 0s - loss: 1591.9446 - val_loss: 1141.4478
Epoch 6/50
 - 0s - loss: 1569.6820 - val_loss: 1127.1006
Epoch 7/50
 - 0s - loss: 1546.0242 - val_loss: 1111.8828
Epoch 8/50
 - 0s - loss: 1520.2511 - val_loss: 1095.9470
Epoch 9/50
 - 0s - loss: 1493.0426 - val_loss: 1079.0188
Epoch 10/50
 - 0s - loss: 1463.9715 - val_loss: 1061.2720
Epoch 11/50
 - 0s - loss: 1432.9560 - val_loss: 1042.7496
Epoch 12/50
 - 0s - loss: 1400.0928 - val_loss: 1022.8635
Epoch 13/50
 - 0s - loss: 1365.4331 - val_loss: 1002.8420
Epoch 14/50
 - 0s - loss: 1329.7402 - val_loss: 982.2253
Epoch 15/50
 - 0s - loss: 1291.8561 - val_loss: 960.3786
Epoch 16/50
 - 0s - loss: 1253.3371 - val_loss: 938.2041
Epoch 17/50
 - 0s - loss: 1213.1642 - 

<keras.callbacks.History at 0x7fb71008cc88>

D. Increase the number of hidden layers (5 marks)

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

### How does the mean of the mean squared errors compare to that from Step B?

It'smaller

In [13]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [14]:
# build the model
model = regression_model()

In [15]:
# fit the model
model.fit(predictors_norm, target, validation_split=0.3, epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 2s - loss: 1719.7826 - val_loss: 1243.7953
Epoch 2/50
 - 1s - loss: 1707.0568 - val_loss: 1234.8653
Epoch 3/50
 - 1s - loss: 1700.0734 - val_loss: 1229.3391
Epoch 4/50
 - 1s - loss: 1694.8520 - val_loss: 1222.8525
Epoch 5/50
 - 1s - loss: 1687.7476 - val_loss: 1211.8628
Epoch 6/50
 - 1s - loss: 1672.7782 - val_loss: 1188.6516
Epoch 7/50
 - 1s - loss: 1637.9293 - val_loss: 1147.0392
Epoch 8/50
 - 1s - loss: 1571.7620 - val_loss: 1088.5636
Epoch 9/50
 - 1s - loss: 1479.7020 - val_loss: 1015.5367
Epoch 10/50
 - 1s - loss: 1352.4552 - val_loss: 921.4113
Epoch 11/50
 - 1s - loss: 1182.8887 - val_loss: 802.5283
Epoch 12/50
 - 1s - loss: 960.9918 - val_loss: 664.4310
Epoch 13/50
 - 1s - loss: 716.0375 - val_loss: 524.3876
Epoch 14/50
 - 1s - loss: 492.4057 - val_loss: 409.1550
Epoch 15/50
 - 1s - loss: 343.6518 - val_loss: 327.7438
Epoch 16/50
 - 1s - loss: 266.7434 - val_loss: 278.5412
Epoch 17/50
 - 1s - loss: 235.3004 - val_loss: 

<keras.callbacks.History at 0x7fb6f0098780>

C. Increate the number of epochs (5 marks)

Repeat Part B but use 100 epochs this time for training.

### How does the mean of the mean squared errors compare to that from Step B?

It'smaller

In [16]:
# fit the model
model.fit(predictors_norm, target, validation_split=0.3, epochs=100, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/100
 - 1s - loss: 135.4927 - val_loss: 151.6882
Epoch 2/100
 - 1s - loss: 134.8238 - val_loss: 150.2588
Epoch 3/100
 - 1s - loss: 133.8184 - val_loss: 151.1882
Epoch 4/100
 - 1s - loss: 133.0631 - val_loss: 150.1581
Epoch 5/100
 - 1s - loss: 132.1522 - val_loss: 150.2797
Epoch 6/100
 - 1s - loss: 131.2598 - val_loss: 151.1688
Epoch 7/100
 - 1s - loss: 130.5217 - val_loss: 150.8563
Epoch 8/100
 - 1s - loss: 129.9703 - val_loss: 149.5671
Epoch 9/100
 - 1s - loss: 128.5875 - val_loss: 150.2714
Epoch 10/100
 - 1s - loss: 128.3450 - val_loss: 149.4545
Epoch 11/100
 - 1s - loss: 127.2865 - val_loss: 148.8934
Epoch 12/100
 - 1s - loss: 125.9867 - val_loss: 148.0806
Epoch 13/100
 - 1s - loss: 124.9951 - val_loss: 148.9770
Epoch 14/100
 - 1s - loss: 124.1583 - val_loss: 148.0600
Epoch 15/100
 - 1s - loss: 123.0922 - val_loss: 148.7309
Epoch 16/100
 - 1s - loss: 122.2805 - val_loss: 148.1935
Epoch 17/100
 - 1s - loss: 121.4099 - val_loss: 147

<keras.callbacks.History at 0x7fb6f0098be0>