<h1 align=center><font size = 5>Regression Models with Keras to Predict Concrete Strength v0.3</font></h1>

## Introduction

Building a Regression Model Using Keras to predict the strength of the concrete sample.Starting with training it on 50 epochs.We will increase the epochs later to 100. 
We are using https://cocl.us/concrete_data dataset to train and test our model.
Also, Evaluating the accuracy of model using mean_squared error.

## Preprocessing Dataset

Importing the <em>pandas</em> and the Numpy libraries and test_train_split function from scikit-learn.

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
data = pd.read_csv('https://cocl.us/concrete_data')
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Finding the shape of the data

In [3]:
data.shape

(1030, 9)

So, there are 1030 samples to train our model.

Searching for any missing values in the dataset.




In [4]:
data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Dataset looks super clean. No null value is found in the model. Hence, It is ready to be predicted....After Spiliting the dataset into predictors and target and into training and testing the data ofcourse :)

#### Spliting data 

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [0]:
data_columns = data.columns

X = data[data_columns[data_columns != 'Strength']] # Predictors: all columns except 'Strength'
y = data['Strength'] # Target: 'Strength' column


In [7]:
''' Normalizing the dataset first before spilting'''
X_norm = (X - X.mean()) / X.std() 
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


<a id="item2"></a>

In [0]:
# Spliting the dataset into training and testing data.
X_train, X_test, y_train,y_test = train_test_split(X_norm,y,test_size=0.3)

Let's do a quick sanity check of the predictors and the target dataframes of training set and testing set.

In [9]:
X_train.head()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
838,-1.168999,0.870477,0.965852,-0.30754,1.472309,-0.256179,-0.668286,-0.279597
100,1.3763,0.375573,-0.846733,-1.314367,1.723404,-1.553862,1.415879,-0.612034
766,0.993548,-0.856472,-0.846733,0.20758,-1.038638,-0.088985,-0.131966,-0.501222
98,1.85474,0.520451,-0.846733,-0.021882,0.45119,-1.553862,0.098777,-0.612034
677,-1.714421,0.916838,-0.846733,0.488555,-1.038638,-1.10501,2.100623,-0.612034


In [10]:
X_test.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
781,0.141926,-0.856472,-0.846733,0.488555,-1.038638,1.441484,-0.107021,-0.612034
942,0.307466,0.824116,-0.846733,-0.143638,0.300534,-1.35966,0.628861,-0.279597
750,2.09396,-0.856472,-0.846733,0.863189,-1.038638,1.955927,-2.00285,-0.279597
97,0.89786,0.230695,-0.846733,-2.574071,2.878439,-1.553862,2.731735,-0.612034
895,-0.202551,0.314144,0.372074,-0.494857,0.635327,-0.474817,-0.131966,-0.279597


In [11]:
y_train.head()


838    27.68
100    49.20
766    27.92
98     55.60
677     7.68
Name: Strength, dtype: float64

In [12]:
y_test.head()

781    14.20
942    40.93
750    44.09
97     45.70
895    49.77
Name: Strength, dtype: float64

Let's save the number of predictors to *n_cols* since we will need this number when building our network.

In [0]:
n_cols = X_train.shape[1] # number of predictors

<a id="item1"></a>

<a id='item32'></a>

## Building Keras Model For Prediction

#### Importing the Keras library

In [14]:
import keras

Using TensorFlow backend.


Importing the Sequential model and Dense layer to create the MLP for training this model. 

In [0]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [0]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error',metrics=['mean_squared_error'])
    return model

The above function create a model that has one hidden layers, each of 10 hidden units.

<a id="item4"></a>

<a id='item34'></a>

## Train and Test the Network

Let's call the function now to create our model.

In [0]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 10% of the data for validation and we will train the model for 100 epochs.

In [18]:
# fit the model
model.fit(X_train, y_train, validation_split=0.1, epochs=100, verbose=2)

Train on 648 samples, validate on 73 samples
Epoch 1/100
 - 0s - loss: 1513.6796 - mean_squared_error: 1513.6796 - val_loss: 1654.4695 - val_mean_squared_error: 1654.4695
Epoch 2/100
 - 0s - loss: 1501.1776 - mean_squared_error: 1501.1777 - val_loss: 1641.5714 - val_mean_squared_error: 1641.5714
Epoch 3/100
 - 0s - loss: 1488.5067 - mean_squared_error: 1488.5067 - val_loss: 1627.9216 - val_mean_squared_error: 1627.9215
Epoch 4/100
 - 0s - loss: 1474.9964 - mean_squared_error: 1474.9965 - val_loss: 1613.3196 - val_mean_squared_error: 1613.3196
Epoch 5/100
 - 0s - loss: 1460.5449 - mean_squared_error: 1460.5449 - val_loss: 1598.1988 - val_mean_squared_error: 1598.1989
Epoch 6/100
 - 0s - loss: 1445.1330 - mean_squared_error: 1445.1331 - val_loss: 1581.6379 - val_mean_squared_error: 1581.6379
Epoch 7/100
 - 0s - loss: 1428.5757 - mean_squared_error: 1428.5756 - val_loss: 1563.7683 - val_mean_squared_error: 1563.7682
Epoch 8/100
 - 0s - loss: 1410.6631 - mean_squared_error: 1410.6631 - val

<keras.callbacks.callbacks.History at 0x7f0cd2e92f28>

In [19]:
import statistics 
from sklearn.metrics import mean_squared_error

print(model.summary())
mse_results=[]
for x in range(50):
    model.fit(X_train,y_train,epochs=100, verbose=0)
    train_results = model.predict( X_test )
    mse_results.append( mean_squared_error(y_test, train_results) )


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________
None


In [20]:
print("Mean of the list of mean square errors is % s" % (statistics.mean( mse_results ) ))
print("Standard deviation of the list of mean square errors is % s" % ( statistics.stdev( mse_results )))


Mean of the list of mean square errors is 39.022622714479965
Standard deviation of the list of mean square errors is 8.959905421423958


In [21]:
mse_results

[95.53623556287994,
 61.34297492437536,
 46.327768687157636,
 40.90152068044677,
 39.33184298467229,
 38.61157896147141,
 38.351934860393385,
 37.92625548262455,
 37.909942747636016,
 37.71272506029706,
 37.732639364847856,
 37.82143623245902,
 37.605890473300676,
 37.5806262853749,
 37.55202307528633,
 37.358913378942276,
 37.31249207314606,
 37.21626896141561,
 36.99477835770912,
 36.931318113511445,
 37.00337438998788,
 36.992837167442886,
 37.15422538820508,
 36.91035264204172,
 36.908915698558694,
 36.83421347729007,
 36.918593919154034,
 36.872694305801815,
 36.83024500937844,
 36.7336788880035,
 36.76207097313641,
 36.866004022932046,
 36.88593722536031,
 36.919894446985516,
 36.68794786289023,
 36.64977572703268,
 36.88014416015594,
 36.67713730612922,
 36.82538795966815,
 36.764673309735855,
 36.72093956899923,
 36.76491152064336,
 36.664822881808846,
 36.54199836997531,
 36.53771105530892,
 36.631775879078965,
 36.664538942882665,
 36.46916123600699,
 36.48105479171003,
 36.5

#### Increasing the epochsin this dataset results in better accuracy and fitting. (Sometimes even overfitting :o )
Mean of list of MSE decreased by approx. 15 points while the standard deviation shrunked by 13.4 points

# Thanks for Reviewing the Project.