<h1 align=center><font size = 5>Regression Models with Keras to Predict Concrete Strength v0.4</font></h1>

## Introduction

Building a Regression Model Using Keras to predict the strength of the concrete sample.Starting with training it on 50 epochs.We will increase the epochs later to 100. 
We are using https://cocl.us/concrete_data dataset to train and test our model.
Also, Evaluating the accuracy of model using mean_squared error.

## Preprocessing Dataset

Importing the <em>pandas</em> and the Numpy libraries and test_train_split function from scikit-learn.

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
data = pd.read_csv('https://cocl.us/concrete_data')
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Finding the shape of the data

In [3]:
data.shape

(1030, 9)

So, there are 1030 samples to train our model.

Searching for any missing values in the dataset.




In [4]:
data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Dataset looks super clean. No null value is found in the model. Hence, It is ready to be predicted....After Spiliting the dataset into predictors and target and into training and testing the data ofcourse :)

#### Spliting data 

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [0]:
data_columns = data.columns

X = data[data_columns[data_columns != 'Strength']] # Predictors: all columns except 'Strength'
y = data['Strength'] # Target: 'Strength' column


In [7]:
''' Normalizing the dataset first before spilting'''
X_norm = (X - X.mean()) / X.std() 
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


<a id="item2"></a>

In [0]:
# Spliting the dataset into training and testing data.
X_train, X_test, y_train,y_test = train_test_split(X_norm,y,test_size=0.3)

Let's do a quick sanity check of the predictors and the target dataframes of training set and testing set.

In [9]:
X_train.head()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
858,0.390714,0.870477,-0.846733,0.20758,0.300534,-0.281901,-0.805484,-0.279597
345,-0.645586,-0.856472,1.883083,-1.253489,0.668806,1.03636,0.035167,-0.501222
30,0.218476,0.024388,-0.846733,2.174405,-1.038638,-0.526262,-1.291914,5.055221
539,1.902584,-0.856472,-0.846733,0.488555,-1.038638,-0.472245,-0.765572,0.701883
150,0.360094,1.606458,-0.846733,-1.211343,1.355131,-1.553862,1.332313,0.163652


In [10]:
X_test.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
600,0.553384,-0.856472,-0.846733,0.722701,-1.038638,-0.063263,0.09254,-0.501222
850,-1.197706,1.959961,-0.846733,-0.073394,0.802723,0.631236,-0.942682,-0.279597
391,1.108374,-0.657119,1.622133,-0.682173,2.443208,-0.076124,-1.753399,-0.279597
870,-1.264687,0.754574,0.856472,0.535385,-0.034259,-1.040705,0.080068,-0.279597
905,-1.39865,-0.856472,1.747139,-0.073394,0.635327,-0.15329,0.391882,-0.279597


In [11]:
y_train.head()


858    52.42
345    33.73
30     55.26
539    54.32
150    66.10
Name: Strength, dtype: float64

In [12]:
y_test.head()

600    27.04
850    37.36
391    55.65
870    23.69
905    13.29
Name: Strength, dtype: float64

Let's save the number of predictors to *n_cols* since we will need this number when building our network.

In [0]:
n_cols = X_train.shape[1] # number of predictors

<a id="item1"></a>

<a id='item32'></a>

## Building Keras Model For Prediction

#### Importing the Keras library

In [14]:
import keras

Using TensorFlow backend.


Importing the Sequential model and Dense layer to create the MLP for training this model. 

In [0]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [0]:
# define regression model
# increasing the number of layers
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error',metrics=['mean_squared_error'])
    return model

The above function create a model that has three hidden layers, each of 10 hidden units.

<a id="item4"></a>

<a id='item34'></a>

## Train and Test the Network

Let's call the function now to create our model.

In [0]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 10% of the data for validation and we will train the model for 50 epochs.

In [18]:
# fit the model
model.fit(X_train, y_train, validation_split=0.1, epochs=50, verbose=2)

Train on 648 samples, validate on 73 samples
Epoch 1/50
 - 0s - loss: 1488.1482 - mean_squared_error: 1488.1482 - val_loss: 1624.4905 - val_mean_squared_error: 1624.4905
Epoch 2/50
 - 0s - loss: 1464.5896 - mean_squared_error: 1464.5896 - val_loss: 1594.0123 - val_mean_squared_error: 1594.0123
Epoch 3/50
 - 0s - loss: 1430.5731 - mean_squared_error: 1430.5731 - val_loss: 1549.3493 - val_mean_squared_error: 1549.3494
Epoch 4/50
 - 0s - loss: 1381.9798 - mean_squared_error: 1381.9799 - val_loss: 1484.7393 - val_mean_squared_error: 1484.7391
Epoch 5/50
 - 0s - loss: 1311.0159 - mean_squared_error: 1311.0159 - val_loss: 1391.2808 - val_mean_squared_error: 1391.2808
Epoch 6/50
 - 0s - loss: 1210.5580 - mean_squared_error: 1210.5580 - val_loss: 1257.1159 - val_mean_squared_error: 1257.1158
Epoch 7/50
 - 0s - loss: 1072.8677 - mean_squared_error: 1072.8678 - val_loss: 1079.7527 - val_mean_squared_error: 1079.7527
Epoch 8/50
 - 0s - loss: 899.6145 - mean_squared_error: 899.6146 - val_loss: 867

<keras.callbacks.callbacks.History at 0x7fa9f4879c88>

In [19]:
import statistics 
from sklearn.metrics import mean_squared_error

print(model.summary())
mse_results=[]
for x in range(50):
    model.fit(X_train,y_train,epochs=50, verbose=0)
    train_results = model.predict( X_test )
    mse_results.append( mean_squared_error(y_test, train_results) )


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_3 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 11        
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
None


In [20]:
print("Mean of the list of mean square errors is % s" % (statistics.mean( mse_results ) ))
print("Standard deviation of the list of mean square errors is % s" % ( statistics.stdev( mse_results )))


Mean of the list of mean square errors is 47.670850713856666
Standard deviation of the list of mean square errors is 7.255248500768844


In [21]:
mse_results

[94.13862015850026,
 55.11252167193764,
 52.47458266992788,
 52.86751677359567,
 52.92597948759507,
 52.52696349910552,
 52.52006661931197,
 51.467234985679255,
 49.615840309376445,
 49.19011198526588,
 49.03496452890965,
 47.0412501412634,
 46.18490823720615,
 45.52963635067293,
 46.09349560335964,
 46.51428524730684,
 45.389317379187865,
 44.85950133900051,
 44.75924368341714,
 47.29101867618457,
 45.141485851110104,
 46.94671758502724,
 44.94299984541767,
 45.24339900615229,
 45.516482594551206,
 44.640370350549546,
 44.965620618951775,
 45.032299633392824,
 45.005766036496794,
 45.70326219967758,
 45.15365999534378,
 45.40353345045647,
 44.4576571572652,
 44.897428777009694,
 44.703328638749554,
 44.74442893850414,
 45.97720442393892,
 45.43585587615038,
 45.150036506608885,
 44.547537048479136,
 44.532421915441326,
 44.90780111230876,
 44.49056081534697,
 46.31323158429457,
 45.161012107748775,
 46.44325444903427,
 45.29797706536988,
 46.32010871938631,
 45.73897268443216,
 45.191

#### Increasing the number of layers decreased the mean and standard deviation. Hence, have better fit on the dataset compared to one layer normalized model(v0.2)
Mean dataset decreased by approx. 7 points while the standard deviation decreased by 15 points.

# Thanks for Reviewing the Project.