<h1>Regression Models with Keras </h1>


## Introduction

In this project, you will build a regression model using the Keras library to model the same data about concrete compressive strength that we used in labs 3. For your convenience, the data can be found here again: https://cocl.us/concrete_data. To recap, the predictors in the data of concrete strength include:

Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate

The four parts of the capstone:

A. Build a baseline model
B. Normalize the data
C. Increate the number of epochs
D. Increase the number of hidden layers

## Download and Clean The Dataset

Let's start by importing the pandas and the Numpy libraries.

In [3]:
import pandas as pd
import numpy as np

Let's download the data and read it into a pandas dataframe.

In [6]:
#!wget -O concrete_data.csv 'https://cocl.us/concrete_data'

concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


let's check the size of data

In [7]:
concrete_data.shape

(1030, 9)

there are approximately 1000 samples to train our model on

In [9]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Let's check the dataset for any missing values.

In [11]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## Data seperation to predictors and target

In [12]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [13]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [14]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Let's save the number of predictors to _ncols since we will need this number when building our network.

In [20]:
n_cols = predictors.shape[1] # number of predictors

# Build a baseline model Part A

Use the Keras library to build a neural network with the following:

One hidden layer of 10 nodes, and a ReLU activation function

 1. Use the adam optimizer and the mean squared error as the loss function.

 2. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

 3. Train the model on the training data using 50 epochs.

 4. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

 5. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

 6. Report the mean and the standard deviation of the mean squared errors

## Import Keras

In [16]:
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Build a Neural Network

In [22]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model


In [25]:
# 1.split 30% for testing without normalize
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Train and Test the Network

Let's call the function now to create our model.

In [27]:
# build the model
model = regression_model()

# fit the model
model.fit(predictors, target, epochs=50, verbose=2)

Epoch 1/50
33/33 - 0s - loss: 2343.1116 - 380ms/epoch - 12ms/step
Epoch 2/50
33/33 - 0s - loss: 729.3951 - 32ms/epoch - 979us/step
Epoch 3/50
33/33 - 0s - loss: 594.5317 - 33ms/epoch - 1ms/step
Epoch 4/50
33/33 - 0s - loss: 497.9270 - 32ms/epoch - 985us/step
Epoch 5/50
33/33 - 0s - loss: 430.3865 - 33ms/epoch - 988us/step
Epoch 6/50
33/33 - 0s - loss: 366.3284 - 32ms/epoch - 968us/step
Epoch 7/50
33/33 - 0s - loss: 321.4952 - 32ms/epoch - 958us/step
Epoch 8/50
33/33 - 0s - loss: 291.9433 - 33ms/epoch - 990us/step
Epoch 9/50
33/33 - 0s - loss: 267.3935 - 31ms/epoch - 944us/step
Epoch 10/50
33/33 - 0s - loss: 247.8407 - 31ms/epoch - 937us/step
Epoch 11/50
33/33 - 0s - loss: 230.5810 - 30ms/epoch - 913us/step
Epoch 12/50
33/33 - 0s - loss: 218.2923 - 31ms/epoch - 941us/step
Epoch 13/50
33/33 - 0s - loss: 205.4050 - 30ms/epoch - 922us/step
Epoch 14/50
33/33 - 0s - loss: 196.1132 - 36ms/epoch - 1ms/step
Epoch 15/50
33/33 - 0s - loss: 187.7839 - 31ms/epoch - 949us/step
Epoch 16/50
33/33 - 0s

<keras.callbacks.History at 0x7f70f0692c40>

## Evaluate the model on the test data.

In [32]:
model_evaluation = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
model_evaluation



113.7392578125

## The mean squared error between the predicted concrete strength and the actual concrete strength

In [33]:

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

113.73925420040668 0.0


## List of 50 mean squared errors with report of mean and the standard deviation of the mean squared errors

In [34]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors without normalized data. \n Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 106.98067474365234
MSE 2: 122.17414093017578
MSE 3: 112.68267059326172
MSE 4: 123.04710388183594
MSE 5: 121.07058715820312
MSE 6: 115.3509521484375
MSE 7: 130.46192932128906
MSE 8: 109.5716781616211
MSE 9: 118.83709716796875
MSE 10: 113.289794921875
MSE 11: 108.24131774902344
MSE 12: 108.77088165283203
MSE 13: 114.61260986328125
MSE 14: 119.10987854003906
MSE 15: 126.44869995117188
MSE 16: 91.1926498413086
MSE 17: 68.96589660644531
MSE 18: 60.52132034301758
MSE 19: 50.51111602783203
MSE 20: 56.25905227661133
MSE 21: 48.897491455078125
MSE 22: 50.619171142578125
MSE 23: 47.15250015258789
MSE 24: 49.457923889160156
MSE 25: 50.509552001953125
MSE 26: 51.37287139892578
MSE 27: 52.53009796142578
MSE 28: 45.21180725097656
MSE 29: 52.29566192626953
MSE 30: 53.44602584838867
MSE 31: 52.546932220458984
MSE 32: 43.183807373046875
MSE 33: 50.26768493652344
MSE 34: 50.12297058105469
MSE 35: 48.38895034790039
MSE 36: 53.33306884765625
MSE 37: 52.70392608642578
MSE 38: 52.84963607788086
MSE 3