<h1 align=center><font size = 5>Regression Models with Keras</font></h1>

# Introduction

This notebook performs predictions, through Neural Networks using the Keras library, on a data set for different concrete samples based on the volumes of the different ingredients that were used to manufacture them. The ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

### Imports

In [1]:
import pandas as pd
import numpy as np
import keras
from sklearn.model_selection import train_test_split

Using TensorFlow backend.


Downloading the data and reading it into a <em>pandas</em> dataframe.

In [2]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Purifying data by removing null values and normalizing

In [3]:
concrete_data.shape

(1030, 9)

In [4]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

## A. Building a baseline model 

Now, split data into predictors and target

In [5]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] 

target = concrete_data['Strength'] 

predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


Train and test split

In [6]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.30, random_state=42)

#### Building the Neural Network

Imports

In [7]:
from keras.models import Sequential
from keras.layers import Dense

Neural Network Definition

In [10]:
def neural_net_1():
    #define model type
    model = Sequential()
    
    #define layers
    model.add(Dense(10, activation='relu', input_shape=(predictors.shape[1],)))
    model.add(Dense(1))
    
    #compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [11]:
#build the model
model_a = neural_net_1()

#### Teaching the neural network and predicting

In [13]:
from sklearn.metrics import mean_squared_error 

#empty list for mean squared errors
mses = np.array([])

#fit the model (50 times)
for i in range(1, 50):
    model_a.fit(X_train, y_train, epochs=50, verbose=0)
    y_prediction = model_a.predict(X_test)
    mses = np.append(mses, mean_squared_error(y_test, y_prediction)) #add the mean_squared_error on the mses list
    
#show array of mean_squared_errors
mses

array([130.83089113, 110.98155522, 108.46251703, 115.40315931,
       112.52172297, 107.27585336, 106.71707301, 119.7590221 ,
       120.67684544, 108.72914417, 106.92177279, 112.41549205,
       107.74943856, 108.83328772, 108.05211213, 107.32676759,
       106.28460929, 104.5791858 , 110.7412529 , 107.71947386,
       111.23748202, 104.75059213, 107.09625143, 104.15168403,
       106.53688605, 107.52053511, 108.50796854, 113.39565149,
       105.78157318, 106.31715219, 104.82313882, 106.71982178,
       107.7718184 , 108.10859125, 117.55266853, 108.90198782,
       105.88845299, 106.75031431, 121.40224   , 105.73565739,
       110.19407588, 106.19267597, 110.30001912, 109.39220707,
       111.66414872, 109.82887795, 118.19887803, 107.90521356,
       128.36489534])

#### Error Mean and Standard Deviation

In [20]:
a_std = mses.std()
a_mean = mses.mean()
print("Error Mean: " + str(a_mean) + "\n"
     + "Standard Deviation: " + str(a_std))

Error Mean: 110.26474766385365
Standard Deviation: 5.752897282361342


## B. Normalize the data

In [21]:
#normalize predictors
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Train and Test split

In [22]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.30, random_state=42)

#### Building the Neural Network

In [23]:
model_b = neural_net_1()

#### Teaching the neural network and predicting

In [24]:
#empty list for mean squared errors
mses = np.array([])

#fit the model (50 times)
for i in range(1, 50):
    model_b.fit(X_train, y_train, epochs=50, verbose=0)
    y_prediction = model_b.predict(X_test)
    mses = np.append(mses, mean_squared_error(y_test, y_prediction)) #add the mean_squared_error on the mses list
    
#show array of mean_squared_errors
mses

array([325.17263063, 155.51827003, 115.33071543,  85.68343154,
        66.00387749,  55.70353538,  50.89785844,  48.173642  ,
        45.84693773,  44.23304206,  42.85754159,  41.9230095 ,
        41.49512985,  40.75099865,  40.61631532,  39.88488444,
        39.45821113,  38.82131526,  38.59042761,  38.49993913,
        38.59302122,  38.08427122,  38.22232054,  38.47527019,
        38.07928719,  37.81338295,  37.64086929,  37.27165278,
        37.09089047,  36.95446951,  36.64917698,  36.25811387,
        36.33347328,  36.11571577,  35.91761216,  36.04759476,
        35.83671642,  35.62935958,  35.49319075,  35.38077437,
        35.39405073,  35.21587727,  35.35041657,  35.04844106,
        35.01456096,  35.29702197,  34.94858402,  34.90173098,
        34.6735748 ])

#### Error Mean and Standard Deviation

In [25]:
b_std = mses.std()
b_mean = mses.mean()
print("Error Mean: " + str(b_mean) + "\n"
     + "Standard Deviation: " + str(b_std))

Error Mean: 49.98353336509143
Standard Deviation: 44.98497452606134


#### Mean of the mean squared errors compared to those of Step A.

In [26]:
print("Mean of Step A: " + str(a_mean) + "\n" 
      + "Mean of Step B: " + str(b_mean) + "\n" 
      + "Difference: " + str(a_mean - b_mean))

Mean of Step A: 110.26474766385365
Mean of Step B: 49.98353336509143
Difference: 60.28121429876222


This demonstrates that normalizing the data, in this case, is more accurate

## C. Increate the number of epochs (by 100)

#### Building the model

In [27]:
model_c = neural_net_1()

#### Training and predicting the model

In [28]:
#empty list for mean squared errors
mses = np.array([])

#fit the model (50 times)
for i in range(1, 50):
    model_c.fit(X_train, y_train, epochs=100, verbose=0)
    y_prediction = model_c.predict(X_test)
    mses = np.append(mses, mean_squared_error(y_test, y_prediction)) #add the mean_squared_error on the mses list
    
#show array of mean_squared_errors
mses

array([143.9444553 ,  94.27293691,  65.82066488,  56.90819515,
        51.05623367,  47.84341401,  45.95631329,  44.31305377,
        43.53113441,  43.11366533,  43.03323995,  42.60572082,
        41.11556213,  40.34625309,  40.14941352,  40.23205855,
        40.08514357,  39.82343634,  39.84066782,  39.76346331,
        39.61984997,  39.63769486,  39.37845375,  39.17054078,
        39.57033506,  39.60656613,  39.49247064,  39.57907421,
        39.21302982,  39.79438055,  39.35783694,  39.42789194,
        39.35604074,  39.47259175,  39.52146057,  39.89863893,
        39.65912259,  39.62552125,  39.53574849,  39.66268506,
        39.48544819,  39.5567682 ,  39.55838272,  39.49168988,
        39.27181829,  39.54669564,  39.31951543,  39.37070281,
        39.51733884])

#### Error Mean and Standard Deviation

In [29]:
c_std = mses.std()
c_mean = mses.mean()
print("Error Mean: " + str(c_mean) + "\n"
     + "Standard Deviation: " + str(c_std))

Error Mean: 44.68272081394716
Standard Deviation: 16.850767095629273


#### Mean of the mean squared errors compared to those of Step B

In [33]:
print("Error Mean of Step B: " + str(b_mean) + "\n" 
      + "Error Mean of Step C: " + str(c_mean) + "\n" 
      + "Difference: " + str(b_mean - c_mean))

Error Mean of Step B: 49.98353336509143
Error Mean of Step C: 44.68272081394716
Difference: 5.300812551144276


This demonstrates that increasing number of epochs, in this case, is more accurate

## D. Increase the number of hidden layers (3 Layers)

#### Building the Neural Network


In [34]:
def neural_net_2():
    #define model type
    model = Sequential()
    
    #define layers
    model.add(Dense(10, activation='relu', input_shape=(predictors_norm.shape[1],)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    #compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [35]:
model_d = neural_net_2()

#### Teaching the neural network and predicting

In [36]:
#empty list for mean squared errors
mses = np.array([])

#fit the model (50 times)
for i in range(1, 50):
    model_d.fit(X_train, y_train, epochs=100, verbose=0)
    y_prediction = model_d.predict(X_test)
    mses = np.append(mses, mean_squared_error(y_test, y_prediction)) #add the mean_squared_error on the mses list
    
#show array of mean_squared_errors
mses

array([117.51850997, 103.26610368,  52.8054775 ,  41.10100374,
        40.23719966,  39.04296453,  36.40980112,  35.04504526,
        34.18557459,  32.9127797 ,  32.71180527,  32.47373218,
        32.48670971,  32.31870811,  32.45243952,  33.46290119,
        31.64622386,  31.50300025,  31.7819505 ,  32.08344335,
        31.52229192,  31.51585774,  31.5616772 ,  31.40256344,
        31.15928937,  30.58040669,  31.50438185,  30.7392634 ,
        32.63060004,  31.38634256,  30.78114674,  31.17247712,
        30.93135465,  31.10702485,  30.63239131,  30.85147539,
        31.27997248,  30.97383459,  31.7048188 ,  30.91349703,
        30.84242545,  31.69270822,  30.75167605,  30.80009141,
        31.13431561,  31.32593404,  30.58650735,  30.39155666,
        30.60519386])

#### Error Mean and Standard Deviation

In [37]:
d_std = mses.std()
d_mean = mses.mean()
print("Error Mean: " + str(d_mean) + "\n"
     + "Standard Deviation: " + str(d_std))

Error Mean: 35.87604999058534
Standard Deviation: 15.879699125154993


#### Mean of the mean squared errors compared to those of Step C

In [39]:
print("Error Mean of Step C: " + str(c_mean) + "\n" 
      + "Error Mean of Step D: " + str(d_mean) + "\n" 
      + "Difference: " + str(c_mean - d_mean))

Error Mean of Step C: 44.68272081394716
Error Mean of Step D: 35.87604999058534
Difference: 8.80667082336182


This demonstrates that increasing number of hidden layers, in this case, is more accurate