## Part A - Baseline Model

This part consists in the following tasks to be accomplished

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.



In [2]:
#!wget https://cocl.us/concrete_data

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np

In [4]:
concrete_data = pd.read_csv('concrete_data')
# Check data consistency
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

As it can be seen, the data is clean.

From the practice excercises, we can tale the predictors, which are:
1. Cement
2. Blast Furnace Slag
3. Fly Ash
4. Water
5. Superplasticizer
6. Coarse Aggregate
7. Fine Aggregate

The variable we would like to predict is
1. **Concrete strength** given the variables mentioned above

In [9]:
predictors = ['Cement', 'Blast Furnace Slag', 'Fly Ash', 'Water', 'Superplasticizer',
              'Coarse Aggregate', 'Fine Aggregate']
# The number of predictors will be used to create the inputs of the 
# neural network
n_cols = len(predictors)
X = concrete_data[predictors]
y = concrete_data['Strength']

In [10]:
# define regression model
def baseline_model():
    # create model with only one hidden layer with 10 nodes
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [11]:
model = baseline_model()

In [12]:
mean_sqr_err = list()
for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.30, random_state=42)
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, verbose=0)
    y_hat = model.predict(X_test)
    msqr_err = mean_squared_error(y_test, y_hat)
    mean_sqr_err.append(msqr_err)
    
print("The mean of the 50 mean square errors is: %.3f" % np.mean(mean_sqr_err))
print("The standard deviation of the 50 mean square errors is: %.3f" % np.std(mean_sqr_err))

The mean of the 50 mean square errors is: 163.73687766129467
The standard deviation of the 50 mean square errors is: 5.950482684761008


## Part B - Normalize the data
Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

In [23]:
X_norm = (X - X.mean()) / X.std()

In [24]:
mean_sqr_err_norm = list()
for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(
        X_norm, y, test_size=0.30, random_state=42)
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, verbose=0)
    y_hat = model.predict(X_test)
    msqr_err = mean_squared_error(y_test, y_hat)
    mean_sqr_err_norm.append(msqr_err)
    
print("The mean of the 50 mean square errors is: %.3f" % np.mean(mean_sqr_err_norm))
print("The standard deviation of the 50 mean square errors is: %.3f" % np.std(mean_sqr_err_norm))

The mean of the 50 mean square errors is: 167.629
The standard deviation of the 50 mean square errors is: 1.250


How does the mean of the mean squared errors compare to that from Step A?

The mean of the mean squared errors is slightly higher. However, normalized data shows a significantly lower standard deviation, as expected.

## Part C - More Epochs
Repeat Part B but use 100 epochs this time for training.

In [25]:
mean_sqr_err_norm = list()
for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(
        X_norm, y, test_size=0.30, random_state=42)
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0)
    y_hat = model.predict(X_test)
    msqr_err = mean_squared_error(y_test, y_hat)
    mean_sqr_err_norm.append(msqr_err)
    
print("The mean of the 100 mean square errors is: %.3f" % np.mean(mean_sqr_err_norm))
print("The standard deviation of the 100 mean square errors is: %.3f" % np.std(mean_sqr_err_norm))

The mean of the 100 mean square errors is: 171.616
The standard deviation of the 100 mean square errors is: 1.622


How does the mean of the mean squared errors compare to that from Step B?

In this case the mean of the mean squared errors is a bit higher, increasing the epochs doesn't improve the model.

## Part D - More hidden layers
Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

In [21]:
def deeper_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

model = deeper_model()

In [22]:
mean_sqr_err_norm = list()
for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(
        X_norm, y, test_size=0.30, random_state=42)
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, verbose=0)
    y_hat = model.predict(X_test)
    msqr_err = mean_squared_error(y_test, y_hat)
    mean_sqr_err_norm.append(msqr_err)
    
print("The mean of the 50 mean square errors is: %.3f" % np.mean(mean_sqr_err_norm))
print("The standard deviation of the 50 mean square errors is: %.3f" % np.std(mean_sqr_err_norm))

The mean of the 50 mean square errors is: 159.690
The standard deviation of the 50 mean square errors is: 5.265


How does the mean of the mean squared errors compare to that from Step B?

In the latest case the mean of the mean squared errors is lower, as a result of a deeper neural network.