<h1 align=center><font size = 5>Regression Models with Keras to Predict Concrete Strength v0.2</font></h1>

## Introduction

Building a Regression Model Using Keras to predict the strength of the concrete sample.Starting with training it on 50 epochs.We will increase the epochs later to 100. 
We are using https://cocl.us/concrete_data dataset to train and test our model.
Also, Evaluating the accuracy of model using mean_squared error.

## Preprocessing Dataset

Importing the <em>pandas</em> and the Numpy libraries and test_train_split function from scikit-learn.

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
data = pd.read_csv('https://cocl.us/concrete_data')
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Finding the shape of the data

In [3]:
data.shape

(1030, 9)

So, there are 1030 samples to train our model.

Searching for any missing values in the dataset.




In [4]:
data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Dataset looks super clean. No null value is found in the model. Hence, It is ready to be predicted....After Spiliting the dataset into predictors and target and into training and testing the data ofcourse :)

#### Spliting data 

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [0]:
data_columns = data.columns

X = data[data_columns[data_columns != 'Strength']] # Predictors: all columns except 'Strength'
y = data['Strength'] # Target: 'Strength' column


In [7]:
''' Normalizing the dataset first before spilting'''
X_norm = (X - X.mean()) / X.std() 
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


<a id="item2"></a>

In [0]:
# Spliting the dataset into training and testing data.
X_train, X_test, y_train,y_test = train_test_split(X_norm,y,test_size=0.3)

Let's do a quick sanity check of the predictors and the target dataframes of training set and testing set.

In [9]:
X_train.head()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
616,-0.039881,-0.856472,-0.846733,0.441726,-1.038638,-0.063263,1.027983,4.976069
126,1.3763,0.375573,-0.846733,-1.314367,1.723404,-1.553862,1.415879,-0.279597
217,-0.869496,-0.856472,1.109609,-0.921002,0.618587,1.481353,0.361948,0.163652
596,-1.060872,0.945814,-0.846733,0.193532,-1.038638,0.690397,-0.614654,-0.612034
439,-1.027381,0.226058,1.651822,-0.433979,0.585108,0.440892,-0.336516,-0.279597


In [10]:
X_test.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
561,1.108374,-0.856472,-0.846733,0.193532,-1.038638,0.870452,-0.489928,-0.279597
151,1.154304,0.241126,-0.846733,-1.600025,0.869682,-0.335918,0.97934,0.163652
820,2.33318,-0.856472,-0.846733,0.348068,-1.038638,1.955927,-2.00285,3.55134
936,-0.42359,0.206355,0.270507,3.059476,-0.034259,-1.543573,-0.975111,-0.279597
494,1.012686,-0.624667,0.622086,-1.150465,1.355131,-0.449095,0.890784,0.163652


In [11]:
y_train.head()


616    33.70
126    60.29
217    38.56
596    10.73
439    37.81
Name: Strength, dtype: float64

In [12]:
y_test.head()

561    33.08
151    73.70
820    67.11
936    28.63
494    56.34
Name: Strength, dtype: float64

Let's save the number of predictors to *n_cols* since we will need this number when building our network.

In [0]:
n_cols = X_train.shape[1] # number of predictors

<a id="item1"></a>

<a id='item32'></a>

## Building Keras Model For Prediction

#### Importing the Keras library

In [14]:
import keras

Using TensorFlow backend.


Importing the Sequential model and Dense layer to create the MLP for training this model. 

In [0]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [0]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error',metrics=['mean_squared_error'])
    return model

The above function create a model that has one hidden layers, each of 10 hidden units.

<a id="item4"></a>

<a id='item34'></a>

## Train and Test the Network

Let's call the function now to create our model.

In [0]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 10% of the data for validation and we will train the model for 50 epochs.

In [18]:
# fit the model
model.fit(X_train, y_train, validation_split=0.1, epochs=50, verbose=2)

Train on 648 samples, validate on 73 samples
Epoch 1/50
 - 0s - loss: 1558.4847 - mean_squared_error: 1558.4846 - val_loss: 1344.1198 - val_mean_squared_error: 1344.1199
Epoch 2/50
 - 0s - loss: 1546.8112 - mean_squared_error: 1546.8110 - val_loss: 1332.9739 - val_mean_squared_error: 1332.9739
Epoch 3/50
 - 0s - loss: 1535.0224 - mean_squared_error: 1535.0227 - val_loss: 1321.7864 - val_mean_squared_error: 1321.7864
Epoch 4/50
 - 0s - loss: 1522.8697 - mean_squared_error: 1522.8698 - val_loss: 1310.3120 - val_mean_squared_error: 1310.3120
Epoch 5/50
 - 0s - loss: 1510.0904 - mean_squared_error: 1510.0905 - val_loss: 1298.4649 - val_mean_squared_error: 1298.4648
Epoch 6/50
 - 0s - loss: 1496.7598 - mean_squared_error: 1496.7598 - val_loss: 1285.7398 - val_mean_squared_error: 1285.7397
Epoch 7/50
 - 0s - loss: 1482.3041 - mean_squared_error: 1482.3042 - val_loss: 1272.6742 - val_mean_squared_error: 1272.6741
Epoch 8/50
 - 0s - loss: 1467.1779 - mean_squared_error: 1467.1781 - val_loss: 1

<keras.callbacks.callbacks.History at 0x7fac598a1d68>

In [19]:
import statistics 
from sklearn.metrics import mean_squared_error

print(model.summary())
mse_results=[]
for x in range(50):
    model.fit(X_train,y_train,epochs=50, verbose=0)
    train_results = model.predict( X_test )
    mse_results.append( mean_squared_error(y_test, train_results) )


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________
None


In [20]:
print("Mean of the list of mean square errors is % s" % (statistics.mean( mse_results ) ))
print("Standard deviation of the list of mean square errors is % s" % ( statistics.stdev( mse_results )))


Mean of the list of mean square errors is 54.19661305393201
Standard deviation of the list of mean square errors is 22.42585174977053


In [21]:
mse_results

[164.55243627875203,
 137.86844551453586,
 105.79772063330702,
 80.69039831534535,
 66.20565386829337,
 57.218210240186714,
 52.79612517136417,
 50.545734235163025,
 49.979624991251065,
 49.73726597614001,
 49.3423833731727,
 48.84153274172893,
 48.7817684820911,
 48.31513781838075,
 48.51176916650391,
 48.657606103224985,
 48.56543548987,
 48.221598575271216,
 47.25082263633774,
 47.394563422556224,
 47.75721063546165,
 47.576782407477964,
 47.48385274395181,
 47.446536821928504,
 47.45045365402771,
 47.406788621527454,
 47.478678821610316,
 47.5851091015949,
 47.62197142478482,
 47.64598955310007,
 47.13798134039112,
 47.219086052305514,
 47.163751330514664,
 47.08569478648077,
 47.02174275005428,
 47.060137068405545,
 46.89273198873461,
 46.896609669714096,
 46.82593846812859,
 46.707396957210406,
 46.7486774820857,
 46.67962600837957,
 46.634792571868324,
 46.5199792202626,
 46.40965220195314,
 46.47091355785436,
 46.579531280890855,
 46.30321129293278,
 46.40367462834705,
 46.3419

#### Using Normalization, The the value of Mean Squared Errors is lower compared to the value of MSE without Normalization in corresponding epochs.(Reached minima faster than Non-Normalized version)
Mean dataset decreased by approx. 4 points while the standard deviation decreased by 1 points

# Thanks for Reviewing the Project.