# Build a Regression Model in Keras

  - In this course project, you will build a regression model using the deep learning Keras library, and then you will experiment with increasing the number of training epochs and changing number of hidden layers and you will see how changing these parameters impacts the performance of the model.

# A. Build a baseline model (5 marks)

In [33]:
import pandas as pd
import numpy as np

# read the csv file by using pandas
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [34]:
# import keras library
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from termcolor import colored  # just for color print

**- One hidden layer of 10 nodes, and a ReLU activation function**

**- Use the adam optimizer and the mean squared error as the loss function**

In [35]:
# build up the neural network with keras by defining it in a function
def Regression_model(inputs_cols):
    # creat the model
    model = Sequential()
    # One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(10, activation='relu', input_shape=(inputs_cols,)))
    model.add(Dense(1))
    
    # Use the adam optimizer and the mean squared error as the loss function.
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

In [36]:
# split concrete data into X(predictors) and y(target)
X_data = concrete_data[concrete_data.columns[concrete_data.columns != 'Strength']]
y_data = concrete_data['Strength']

# initialize a list of 50 mean squared errors. 
mean_squared_error_list = []

for i in range(50):
    
    # split into train and test data, 30% for test data, randomly
    X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size = 0.3, random_state=42)

    # get number of input
    number_columns = X_train.shape[1]
    
    # build the model
    model = Regression_model(number_columns)

    # fit the model
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, verbose=0)  # verbose = 0 => don't show out any process

    # evaluate the model with mean_squared_error
    prediction = model.predict(X_test)
    error = mean_squared_error(y_test, prediction)
    # add the error to our list
    mean_squared_error_list.append(error)
    
    print(colored("The {} Iteration finished".format(i+1), 'red'))


print("The mean of mean squared errors of our prediction is {:.4f}".format(np.mean(mean_squared_error_list)))
print("The standard deviation of the mean squared errors our prediction is {:.4f}".format(np.std(mean_squared_error_list)))


[31mThe 1 Iteration finished[0m
[31mThe 2 Iteration finished[0m
[31mThe 3 Iteration finished[0m
[31mThe 4 Iteration finished[0m
[31mThe 5 Iteration finished[0m
[31mThe 6 Iteration finished[0m
[31mThe 7 Iteration finished[0m
[31mThe 8 Iteration finished[0m
[31mThe 9 Iteration finished[0m
[31mThe 10 Iteration finished[0m
[31mThe 11 Iteration finished[0m
[31mThe 12 Iteration finished[0m
[31mThe 13 Iteration finished[0m
[31mThe 14 Iteration finished[0m
[31mThe 15 Iteration finished[0m
[31mThe 16 Iteration finished[0m
[31mThe 17 Iteration finished[0m
[31mThe 18 Iteration finished[0m
[31mThe 19 Iteration finished[0m
[31mThe 20 Iteration finished[0m
[31mThe 21 Iteration finished[0m
[31mThe 22 Iteration finished[0m
[31mThe 23 Iteration finished[0m
[31mThe 24 Iteration finished[0m
[31mThe 25 Iteration finished[0m
[31mThe 26 Iteration finished[0m
[31mThe 27 Iteration finished[0m
[31mThe 28 Iteration finished[0m
[31mThe 29 Iteration finishe

# B. Normalize the data (5 marks)

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

How does the mean of the mean squared errors compare to that from Step A?
  - Because we normalize the data now the mean of the mean squared will be in range 0-1, whichs also means 0-100% 
  - standard deviation as well between 

In [37]:
# split concrete data into X(predictors) and y(target)
X_data = concrete_data[concrete_data.columns[concrete_data.columns != 'Strength']]
y_data = concrete_data['Strength']

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Normalize the data 
X_data_norm = (X_data - X_data.mean()) / X_data.std()
y_data_norm = (y_data - y_data.mean()) / y_data.std()
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

# initialize a list of 50 mean squared errors.
mean_squared_error_list = []

for i in range(50):
        
    # split into train and test data, 30% for test data, randomly
    X_train, X_test, y_train, y_test = train_test_split(X_data_norm, y_data_norm, test_size = 0.3, random_state=42)

    # get number of input
    number_columns = X_train.shape[1]
    
    # build the model
    model = Regression_model(number_columns)
    # fit the model
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, verbose=0)

    # evaluate the model
    prediction = model.predict(X_test)
    error = mean_squared_error(y_test, prediction)
    
    # add the error to our list
    mean_squared_error_list.append(error)
    
    print(colored("The {} Iteration finished".format(i+1), 'red'))


print("The mean of mean squared errors of our prediction is {:.4f}".format(np.mean(mean_squared_error_list)))
print("The standard deviation of the mean squared errors our prediction is {:.4f}".format(np.std(mean_squared_error_list)))

[31mThe 1 Iteration finished[0m
[31mThe 2 Iteration finished[0m
[31mThe 3 Iteration finished[0m
[31mThe 4 Iteration finished[0m
[31mThe 5 Iteration finished[0m
[31mThe 6 Iteration finished[0m
[31mThe 7 Iteration finished[0m
[31mThe 8 Iteration finished[0m
[31mThe 9 Iteration finished[0m
[31mThe 10 Iteration finished[0m
[31mThe 11 Iteration finished[0m
[31mThe 12 Iteration finished[0m
[31mThe 13 Iteration finished[0m
[31mThe 14 Iteration finished[0m
[31mThe 15 Iteration finished[0m
[31mThe 16 Iteration finished[0m
[31mThe 17 Iteration finished[0m
[31mThe 18 Iteration finished[0m
[31mThe 19 Iteration finished[0m
[31mThe 20 Iteration finished[0m
[31mThe 21 Iteration finished[0m
[31mThe 22 Iteration finished[0m
[31mThe 23 Iteration finished[0m
[31mThe 24 Iteration finished[0m
[31mThe 25 Iteration finished[0m
[31mThe 26 Iteration finished[0m
[31mThe 27 Iteration finished[0m
[31mThe 28 Iteration finished[0m
[31mThe 29 Iteration finishe

# C. Increate the number of epochs (5 marks)

Repeat Part B but use 100 epochs this time for training.

How does the mean of the mean squared errors compare to that from Step B?
  - The mean of the mean squared errors is clearly lower im comparison with the result from B
  - which means the more epoch we set the more accurate the model is

In [38]:
# split concrete data into X(predictors) and y(target)
X_data = concrete_data[concrete_data.columns[concrete_data.columns != 'Strength']]
y_data = concrete_data['Strength']

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Normalize the data 
X_data_norm = (X_data - X_data.mean()) / X_data.std()
y_data_norm = (y_data - y_data.mean()) / y_data.std()
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

# initialize a list of 50 mean squared errors.
mean_squared_error_list = []

for i in range(50):
    
    # split into train and test data, 30% for test data, randomly
    X_train, X_test, y_train, y_test = train_test_split(X_data_norm, y_data_norm, test_size = 0.3, random_state=42)

    # get number of input
    number_columns = X_train.shape[1]
    
    # build the model
    model = Regression_model(number_columns)
    # fit the model
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, verbose=0)

    # evaluate the model
    prediction = model.predict(X_test)
    error = mean_squared_error(y_test, prediction)
    
    # add the error to our list
    mean_squared_error_list.append(error)
    
    print(colored("The {} Iteration finished".format(i+1), 'red'))


print("The mean of mean squared errors of our prediction is {:.4f}".format(np.mean(mean_squared_error_list)))
print("The standard deviation of the mean squared errors our prediction is {:.4f}".format(np.std(mean_squared_error_list)))

[31mThe 1 Iteration finished[0m
[31mThe 2 Iteration finished[0m
[31mThe 3 Iteration finished[0m
[31mThe 4 Iteration finished[0m
[31mThe 5 Iteration finished[0m
[31mThe 6 Iteration finished[0m
[31mThe 7 Iteration finished[0m
[31mThe 8 Iteration finished[0m
[31mThe 9 Iteration finished[0m
[31mThe 10 Iteration finished[0m
[31mThe 11 Iteration finished[0m
[31mThe 12 Iteration finished[0m
[31mThe 13 Iteration finished[0m
[31mThe 14 Iteration finished[0m
[31mThe 15 Iteration finished[0m
[31mThe 16 Iteration finished[0m
[31mThe 17 Iteration finished[0m
[31mThe 18 Iteration finished[0m
[31mThe 19 Iteration finished[0m
[31mThe 20 Iteration finished[0m
[31mThe 21 Iteration finished[0m
[31mThe 22 Iteration finished[0m
[31mThe 23 Iteration finished[0m
[31mThe 24 Iteration finished[0m
[31mThe 25 Iteration finished[0m
[31mThe 26 Iteration finished[0m
[31mThe 27 Iteration finished[0m
[31mThe 28 Iteration finished[0m
[31mThe 29 Iteration finishe

# D. Increase the number of hidden layers (5 marks)

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

How does the mean of the mean squared errors compare to that from Step B?
  - The mean of the mean squared errors is also clearly lower im comparison with the result from B
  - which means the more hidden layers we compute the more accurate the model is

In [39]:
# new neural network with 3 hidden layers
def Regression_model(inputs_cols):
    # creat the model
    model = Sequential()
    # One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(10, activation='relu', input_shape=(inputs_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # Use the adam optimizer and the mean squared error as the loss function.
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [40]:
# split concrete data into X(predictors) and y(target)
X_data = concrete_data[concrete_data.columns[concrete_data.columns != 'Strength']]
y_data = concrete_data['Strength']

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Normalize the data 
X_data_norm = (X_data - X_data.mean()) / X_data.std()
y_data_norm = (y_data - y_data.mean()) / y_data.std()
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

# initialize a list of 50 mean squared errors.
mean_squared_error_list = []

for i in range(50):
    
    # split into train and test data, 30% for test data, randomly
    X_train, X_test, y_train, y_test = train_test_split(X_data_norm, y_data_norm, test_size = 0.3, random_state=42)

    # get number of input
    number_columns = X_train.shape[1]
    
    # build the model
    model = Regression_model(number_columns)
    # fit the model
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, verbose=0)

    # evaluate the model
    prediction = model.predict(X_test)
    error = mean_squared_error(y_test, prediction)
    
    # add the error to our list
    mean_squared_error_list.append(error)
    
    print(colored("The {} Iteration finished".format(i+1), 'red'))


print("The mean of mean squared errors of our prediction is {:.4f}".format(np.mean(mean_squared_error_list)))
print("The standard deviation of the mean squared errors our prediction is {:.4f}".format(np.std(mean_squared_error_list)))

[31mThe 1 Iteration finished[0m
[31mThe 2 Iteration finished[0m
[31mThe 3 Iteration finished[0m
[31mThe 4 Iteration finished[0m
[31mThe 5 Iteration finished[0m
[31mThe 6 Iteration finished[0m
[31mThe 7 Iteration finished[0m
[31mThe 8 Iteration finished[0m
[31mThe 9 Iteration finished[0m
[31mThe 10 Iteration finished[0m
[31mThe 11 Iteration finished[0m
[31mThe 12 Iteration finished[0m
[31mThe 13 Iteration finished[0m
[31mThe 14 Iteration finished[0m
[31mThe 15 Iteration finished[0m
[31mThe 16 Iteration finished[0m
[31mThe 17 Iteration finished[0m
[31mThe 18 Iteration finished[0m
[31mThe 19 Iteration finished[0m
[31mThe 20 Iteration finished[0m
[31mThe 21 Iteration finished[0m
[31mThe 22 Iteration finished[0m
[31mThe 23 Iteration finished[0m
[31mThe 24 Iteration finished[0m
[31mThe 25 Iteration finished[0m
[31mThe 26 Iteration finished[0m
[31mThe 27 Iteration finished[0m
[31mThe 28 Iteration finished[0m
[31mThe 29 Iteration finishe