<h1>Regression Models with Keras  Part B </h1>


## Introduction

In this project, you will build a regression model using the Keras library to model the same data about concrete compressive strength that we used in labs 3. For your convenience, the data can be found here again: https://cocl.us/concrete_data. To recap, the predictors in the data of concrete strength include:

Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate

The four parts of the capstone:

A. Build a baseline model
B. Normalize the data
C. Increate the number of epochs
D. Increase the number of hidden layers

## Download and Clean The Dataset

Let's start by importing the pandas and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

Let's download the data and read it into a pandas dataframe.

In [2]:
#!wget -O concrete_data.csv 'https://cocl.us/concrete_data'

concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


let's check the size of data

In [3]:
concrete_data.shape

(1030, 9)

there are approximately 1000 samples to train our model on

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Let's check the dataset for any missing values.

In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## Data seperation to predictors and target

In [6]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## Normalize the data 

In [9]:
# normalize the data by substracting the mean and dividing by the standard deviation.
# Normalize the data
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Let's save the number of predictors to _ncols since we will need this number when building our network.

In [11]:
n_cols = predictors.shape[1] # number of predictors

## Import Keras

In [12]:
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Build a Neural Network

In [13]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model


In [14]:
# 1.split 30% for testing without normalize
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Train and Test the Network

Let's call the function now to create our model.

In [15]:
# build the model
model = regression_model()

# fit the model
model.fit(predictors, target, epochs=50, verbose=2)

Epoch 1/50
33/33 - 0s - loss: 541553.4375 - 337ms/epoch - 10ms/step
Epoch 2/50
33/33 - 0s - loss: 237412.9844 - 35ms/epoch - 1ms/step
Epoch 3/50
33/33 - 0s - loss: 69646.9922 - 30ms/epoch - 896us/step
Epoch 4/50
33/33 - 0s - loss: 13878.2217 - 34ms/epoch - 1ms/step
Epoch 5/50
33/33 - 0s - loss: 4360.3677 - 29ms/epoch - 868us/step
Epoch 6/50
33/33 - 0s - loss: 3496.1743 - 27ms/epoch - 807us/step
Epoch 7/50
33/33 - 0s - loss: 3261.5271 - 30ms/epoch - 902us/step
Epoch 8/50
33/33 - 0s - loss: 3039.7195 - 31ms/epoch - 944us/step
Epoch 9/50
33/33 - 0s - loss: 2830.2217 - 36ms/epoch - 1ms/step
Epoch 10/50
33/33 - 0s - loss: 2618.1765 - 32ms/epoch - 958us/step
Epoch 11/50
33/33 - 0s - loss: 2420.7290 - 29ms/epoch - 884us/step
Epoch 12/50
33/33 - 0s - loss: 2230.5159 - 31ms/epoch - 951us/step
Epoch 13/50
33/33 - 0s - loss: 2054.4460 - 28ms/epoch - 843us/step
Epoch 14/50
33/33 - 0s - loss: 1888.8442 - 27ms/epoch - 827us/step
Epoch 15/50
33/33 - 0s - loss: 1737.3802 - 28ms/epoch - 850us/step
Epoc

<keras.callbacks.History at 0x7f081d6f1fa0>

## Evaluate the model on the test data.

In [16]:
model_evaluation = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
model_evaluation



256.4826354980469

## The mean squared error between the predicted concrete strength and the actual concrete strength

In [17]:

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

256.48263699712 0.0


## List of 50 mean squared errors with report of mean and the standard deviation of the mean squared errors

In [None]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors without normalized data. \n Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 150.06663513183594
MSE 2: 138.96234130859375
MSE 3: 113.8921890258789
MSE 4: 128.50930786132812
MSE 5: 143.475341796875
MSE 6: 109.77851867675781
MSE 7: 144.95388793945312
MSE 8: 100.97129821777344
MSE 9: 119.30418395996094
MSE 10: 127.28488159179688
MSE 11: 106.13741302490234
MSE 12: 116.47370910644531
MSE 13: 115.5986328125
MSE 14: 121.37665557861328
MSE 15: 116.42910766601562
MSE 16: 111.08041381835938
MSE 17: 105.651611328125
MSE 18: 99.1829605102539
MSE 19: 97.2935791015625
MSE 20: 128.48020935058594
MSE 21: 96.56719970703125
MSE 22: 102.92137908935547
MSE 23: 121.94683837890625
MSE 24: 103.62215423583984
MSE 25: 110.03074645996094
MSE 26: 100.817138671875
MSE 27: 130.73028564453125
MSE 28: 110.32652282714844
MSE 29: 109.99884796142578
MSE 30: 125.49868774414062
MSE 31: 131.3505401611328
MSE 32: 111.37294006347656
MSE 33: 109.42801666259766
MSE 34: 129.76463317871094
MSE 35: 123.4890365600586
MSE 36: 137.22572326660156
MSE 37: 120.76641845703125
MSE 38: 116.2818832397461
MS