# Prepare the environment

First, let's do some imports and download the data set

In [1]:
import pandas as pd
import numpy as np
import keras

from keras.models import Sequential
from keras.layers import Dense

# load the concrete_data dataset
concrete_data = pd.read_csv('https://cocl.us/concrete_data')

# verify that the data was loaded correctly
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [2]:
# split the concrete_data set into predictors (inputs) and target (output)
predictors = concrete_data.drop(columns=['Strength'])
target = concrete_data['Strength']

n_cols = predictors.shape[1]

# Part A - Build a baseline model

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function
- Use the adam optimizer and the mean squared error as the loss function.


In [5]:
# Create the neural network in a function so we can use it multiple times in the
# subsequent sections
def regression_model():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

model = regression_model()

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

In [6]:
from sklearn.model_selection import train_test_split

predictors_train, predictors_test, target_train, target_test = train_test_split(predictors, target, 
                                                                                test_size=0.3, random_state=71)
# let's have a look at the shape of the predictors set
predictors_train.shape

(721, 8)

2. Train the model on the training data using 50 epochs.

In [7]:
model.fit(predictors_train, target_train, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7fb8ac0dfdc0>

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [8]:
predictions_test = model.predict(predictors_test)
from sklearn.metrics import mean_squared_error
print('Mean squared error on test data is %.3f' % (mean_squared_error(target_test, predictions_test)))

Mean squared error on test data is 149.975


4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.
5. Report the mean and the standard deviation of the mean squared errors.

In [9]:
# Create a function that evaluates the model so we can use it to evaluate the models created in part A, B, C and D
# the 'create_model_func' parameter is the function that is used to build the model. For part A, this is the
# regression_model function defined above
def evaluate_model(create_model_func, predictors, targets, epochs=50):
    mean_squared_errors = []
    for i in range(50):
        # create the model. I wasn't 100% clear whether this should be inside the loop, but I _think_ that was the
        # intent of the question. Otherwise, the average and stddev of the mean squared error is not that meaningful
        model = create_model_func()
        # 1. split the data in a train and test set
        predictors_train, predictors_test, target_train, target_test = train_test_split(predictors, target, 
                                                                                    test_size=0.3, random_state=71)
        # 2. train 50 epochs (suppress logging this time)
        model.fit(predictors_train, target_train, epochs=epochs, verbose=0)
        # 3. measure the mse and add this to the list
        predictions_test = model.predict(predictors_test)
        mse = mean_squared_error(target_test, predictions_test)
        mean_squared_errors.append(mse)
        print('.', end='') # output a dot so we can see that the function is still running
    print(' Done!')
    # return the mean and stddev of the mse list
    return np.mean(mean_squared_errors), np.std(mean_squared_errors)

In [10]:
# Evaluate the model and print the mean and std dev of the mean squared errors. Note that we pass in 
# the regression_model _function_ here. This is used in the evaluate_model function to create a fresh
# neural network in each loop
mean_mse, std_mse = evaluate_model(regression_model, predictors, target)
# Report the mean and stddev of the mean squared errors
print("Mean squared errors for 50 regression models: mean = %.3f, std dev = %.3f" %(mean_mse, std_mse))

.................................................. Done!
Mean squared errors for 50 regression models: mean = 312.522, std dev = 347.886


# Part B - Normalize the data

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

In [11]:
# Normalize the data
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [12]:
mean_mse, std_mse = evaluate_model(regression_model, predictors_norm, target)
# Report the mean and stddev of the mean squared errors
print("Mean squared errors for 50 regression models on normalized data: mean = %.3f, std dev = %.3f" % 
      (mean_mse, std_mse))

.................................................. Done!
Mean squared errors for 50 regression models on normalized data: mean = 334.617, std dev = 97.424


**How does the mean of the mean squared errors compare to that from Step A?**

The mean of the mean squared error has gone down a bit (~10%), but the std deviation has gone down significantly. This implies that when using normalized data, the performance on the test set is much less dependent on how the data happened to be (randomly) split into a training and a test data set.

# Part C - Increate the number of epochs

Repeat Part B **but use 100 epochs this time for training.**

In [13]:
mean_mse, std_mse = evaluate_model(regression_model, predictors_norm, target, epochs=100)
# Report the mean and stddev of the mean squared errors
print("Mean squared errors for 50 regression models on normalized data, trained 100 epochs: mean = %.3f, std dev = %.3f" % 
      (mean_mse, std_mse))

.................................................. Done!
Mean squared errors for 50 regression models on normalized data, trained 100 epochs: mean = 151.157, std dev = 11.388


**How does the mean of the mean squared errors compare to that from Step B?**

The mean squared error on the test set more than halved. Clearly, there was plenty of improvement to be gained with additional training. The standard deviation has gone down a lot as well.

# Part D - Increate the number of hidden layers

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

In [14]:
def regression_model_D():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# Evaluate model D
mean_mse, std_mse = evaluate_model(regression_model_D, predictors_norm, target)
# Report the mean and stddev of the mean squared errors
print("Mean squared errors for 50 regression models with 3 hidden layers: mean = %.3f, std dev = %.3f" % 
      (mean_mse, std_mse))

.................................................. Done!
Mean squared errors for 50 regression models with 3 hidden layers: mean = 123.939, std dev = 13.328


**How does the mean of the mean squared errors compare to that from Step B?**

The mean of the mean squared errors is a lot better than in part B (123 vs 321). This shows that the network with multiple hidden layers was significantly better at learning to predict the 'Strength' feature than the single layer network.