#  Peer-graded Assignment: Build a Regression Model in Keras

# Instructions

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function
- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

# Answer

We wil start by importing the necessary libraries 

In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense

Now we are creating One hidden layer of 10 nodes, and a ReLU activation function using Sequential and Dense models of Keras.
Using the adam optimizer and the mean squared error as the loss function.

In [15]:
def create_one_layer_model(n_cols):
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model

Now we are creating a Three Layer Model using Sequential and Dense models of Keras

In [16]:
def create_three_layer_model(n_cols):
    
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model

Now we will randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.
And also now we will train the model on the training data using 50 epochs.

In [17]:
def split_train_evaluate(data, model, epochs=50, normalize=False):
    
    df = data.copy()
    
    y = df.pop('Strength')
    X = df.copy()
    
    if normalize:
        X = (X - X.mean()) / X.std()

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True)
     
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    
    y_pred = model.predict(X_test)   
    score = mean_squared_error(y_test, y_pred)
    
    return score

We will now evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. By using the mean_squared_error function from Scikit-learn.

We will repeat these steps 50 times, i.e., create a list of 50 mean squared errors.
And report the mean and the standard deviation of the mean squared errors.

In [18]:
def run_model(times=50, epochs=50, normalize=False, three_layer_model=False):
    concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')

    mean_squared_errors = np.zeros(times)

    n_cols = concrete_data.shape[1] - 1
    model = create_three_layer_model(n_cols) if three_layer_model else create_one_layer_model(n_cols)
    
    for i in range(len(mean_squared_errors)):
        mean_squared_errors[i] = split_train_evaluate(concrete_data, model, epochs, normalize)

    mse_mean = mean_squared_errors.mean()
    mse_std = mean_squared_errors.std()

    return f"After {times} runs the average Mean Squared Error was {mse_mean} and the \
    Standard Deviation was {mse_std}"

# Results Part - A Build a baseline model

In [19]:
output = run_model()
print(output)

After 50 runs the average Mean Squared Error was 59.48019608595686 and the     Standard Deviation was 17.43089923104828


# Results Part - B Normalize the data

We are repeating Part A but this time we are using a normalized version of the data. One way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

In [20]:
output = run_model(normalize=True)
print(output)

After 50 runs the average Mean Squared Error was 108.64486829341254 and the     Standard Deviation was 20.11558056598766


# Results Part - C Increate the number of epochs

We are repeating Part B but using 100 epochs this time for training.

In [21]:
output = run_model(normalize=True, epochs=100)
print(output)

After 50 runs the average Mean Squared Error was 33.273851116527005 and the     Standard Deviation was 26.132623356112273


# Results Part - D Increase the number of hidden layers

Repeating part B but using a neural network with the following instead:
- Three hidden layers, each of 10 nodes and ReLU activation function.

In [23]:
output = run_model(normalize=True, epochs=50, three_layer_model=True)
print(output)

After 50 runs the average Mean Squared Error was 35.41509640540951 and the     Standard Deviation was 24.93765719187196
