
<h1>Regression Models with Keras  Part C </h1>




## Introduction

In this project, you will build a regression model using the Keras library to model the same data about concrete compressive strength that we used in labs 3. For your convenience, the data can be found here again: https://cocl.us/concrete_data. To recap, the predictors in the data of concrete strength include:

Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate

The four parts of the capstone:

A. Build a baseline model
B. Normalize the data
C. Increate the number of epochs
D. Increase the number of hidden layers

## Download and Clean The Dataset

Let's start by importing the pandas and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

Let's download the data and read it into a pandas dataframe.

In [2]:
#!wget -O concrete_data.csv 'https://cocl.us/concrete_data'

concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


let's check the size of data

In [3]:
concrete_data.shape

(1030, 9)

there are approximately 1000 samples to train our model on

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Let's check the dataset for any missing values.

In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## Data seperation to predictors and target

In [6]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## Normalize the data 

In [9]:
# normalize the data by substracting the mean and dividing by the standard deviation.
# Normalize the data
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Let's save the number of predictors to _ncols since we will need this number when building our network.

In [11]:
n_cols = predictors.shape[1] # number of predictors

## Import Keras

In [12]:
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Build a Neural Network

In [13]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model


In [14]:
# 1.split 30% for testing without normalize
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Train and Test the Network

Repeat Part B but use 100 epochs this time for training.

How does the mean of the mean squared errors compare to that from Step B?


In [20]:
# build the model
model = regression_model()

# fit the model
model.fit(predictors, target, epochs= 100, verbose=2)

Epoch 1/100
33/33 - 0s - loss: 8392.7881 - 311ms/epoch - 9ms/step
Epoch 2/100
33/33 - 0s - loss: 5910.5132 - 37ms/epoch - 1ms/step
Epoch 3/100
33/33 - 0s - loss: 4356.4512 - 36ms/epoch - 1ms/step
Epoch 4/100
33/33 - 0s - loss: 3263.5752 - 36ms/epoch - 1ms/step
Epoch 5/100
33/33 - 0s - loss: 2443.9670 - 29ms/epoch - 871us/step
Epoch 6/100
33/33 - 0s - loss: 1857.5073 - 30ms/epoch - 909us/step
Epoch 7/100
33/33 - 0s - loss: 1408.2562 - 28ms/epoch - 863us/step
Epoch 8/100
33/33 - 0s - loss: 1072.5763 - 28ms/epoch - 848us/step
Epoch 9/100
33/33 - 0s - loss: 814.2299 - 30ms/epoch - 907us/step
Epoch 10/100
33/33 - 0s - loss: 621.7742 - 30ms/epoch - 916us/step
Epoch 11/100
33/33 - 0s - loss: 485.8088 - 29ms/epoch - 879us/step
Epoch 12/100
33/33 - 0s - loss: 390.1598 - 29ms/epoch - 893us/step
Epoch 13/100
33/33 - 0s - loss: 315.8481 - 30ms/epoch - 905us/step
Epoch 14/100
33/33 - 0s - loss: 264.6430 - 29ms/epoch - 882us/step
Epoch 15/100
33/33 - 0s - loss: 228.6656 - 30ms/epoch - 900us/step
Epo

<keras.callbacks.History at 0x7f07f47ad940>

## Evaluate the model on the test data.

In [21]:
model_evaluation = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
model_evaluation



116.74239349365234

## The mean squared error between the predicted concrete strength and the actual concrete strength

In [17]:

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

256.48263699712 0.0


## List of 50 mean squared errors with report of mean and the standard deviation of the mean squared errors

In [19]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. \n Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 99.90746307373047
MSE 2: 120.39151000976562
MSE 3: 109.86820983886719
MSE 4: 155.37533569335938
MSE 5: 122.28929138183594
MSE 6: 114.38406372070312
MSE 7: 132.22802734375
MSE 8: 100.5581283569336
MSE 9: 121.26358795166016
MSE 10: 109.69779968261719
MSE 11: 106.46159362792969
MSE 12: 117.74054718017578
MSE 13: 114.32808685302734
MSE 14: 118.31655883789062
MSE 15: 108.62405395507812
MSE 16: 110.51055908203125
MSE 17: 109.41685485839844
MSE 18: 95.91179656982422
MSE 19: 118.18582153320312
MSE 20: 115.23384094238281
MSE 21: 106.79666900634766
MSE 22: 103.61561584472656
MSE 23: 107.50588989257812
MSE 24: 105.32545471191406
MSE 25: 111.09027099609375
MSE 26: 105.7374267578125
MSE 27: 122.43828582763672
MSE 28: 110.80480194091797
MSE 29: 110.12635040283203
MSE 30: 113.94083404541016
MSE 31: 136.80589294433594
MSE 32: 106.33724975585938
MSE 33: 99.46907043457031
MSE 34: 106.60433197021484
MSE 35: 110.61865234375
MSE 36: 92.91411590576172
MSE 37: 90.08121490478516
MSE 38: 74.878234863281