<h1 align=center><font size = 5>Regression Models with Keras to Predict Concrete Strength </font></h1>

## Introduction

Building a Regression Model Using Keras to predict the strength of the concrete sample.Starting with training it on 50 epochs.We will increase the epochs later to 100. 
We are using https://cocl.us/concrete_data dataset to train and test our model.
Also, Evaluating the accuracy of model using mean_squared error.

## Preprocessing Dataset

Importing the <em>pandas</em> and the Numpy libraries and test_train_split function from scikit-learn.

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
data = pd.read_csv('https://cocl.us/concrete_data')
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Finding the shape of the data

In [3]:
data.shape

(1030, 9)

So, there are 1030 samples to train our model.

Searching for any missing values in the dataset.




In [4]:
data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Dataset looks super clean. No null value is found in the model. Hence, It is ready to be predicted....After Spiliting the dataset into predictors and target and into training and testing the data ofcourse :)

#### Spliting data 

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [0]:
data_columns = data.columns

X = data[data_columns[data_columns != 'Strength']] # Predictors: all columns except 'Strength'
y = data['Strength'] # Target: 'Strength' column


<a id="item2"></a>

In [0]:
# Spliting the dataset into training and testing data.
X_train, X_test, y_train,y_test = train_test_split(X,y,test_size=0.3)

Let's do a quick sanity check of the predictors and the target dataframes of training set and testing set.

In [8]:
X_train.head()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
260,212.6,0.0,100.4,159.4,10.4,1003.8,903.8,14
83,362.6,189.0,0.0,164.9,11.6,944.7,755.8,3
909,146.0,173.0,0.0,182.0,3.0,986.0,817.0,28
534,393.0,0.0,0.0,192.0,0.0,940.6,785.6,3
490,387.0,20.0,94.0,157.0,11.6,938.0,845.0,3


In [9]:
X_test.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
477,446.0,24.0,79.0,162.0,11.6,967.0,712.0,3
210,230.0,0.0,118.3,195.5,4.6,1029.4,758.6,14
513,424.0,22.0,132.0,168.0,8.9,822.0,750.0,7
831,154.0,144.0,112.0,220.0,10.0,923.0,658.0,28
457,251.4,0.0,118.3,192.9,5.8,1043.6,754.3,56


In [10]:
y_train.head()


260    25.37
83     35.30
909    23.74
534    19.20
490    34.77
Name: Strength, dtype: float64

In [11]:
y_test.head()

477    23.35
210    20.08
513    40.29
831    16.50
457    39.27
Name: Strength, dtype: float64

Let's save the number of predictors to *n_cols* since we will need this number when building our network.

In [0]:
n_cols = X_train.shape[1] # number of predictors

<a id="item1"></a>

<a id='item32'></a>

## Building Keras Model For Prediction

#### Importing the Keras library

In [13]:
import keras

Using TensorFlow backend.


Importing the Sequential model and Dense layer to create the MLP for training this model. 

In [0]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [0]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error',metrics=['mean_squared_error'])
    return model

The above function create a model that has one hidden layers, each of 10 hidden units.

<a id="item4"></a>

<a id='item34'></a>

## Train and Test the Network

Let's call the function now to create our model.

In [0]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 10% of the data for validation and we will train the model for 50 epochs.

In [17]:
# fit the model
model.fit(X_train, y_train, validation_split=0.1, epochs=50, verbose=2)

Train on 648 samples, validate on 73 samples
Epoch 1/50
 - 0s - loss: 748695.6358 - mean_squared_error: 748695.5625 - val_loss: 626644.4923 - val_mean_squared_error: 626644.5000
Epoch 2/50
 - 0s - loss: 535679.8376 - mean_squared_error: 535679.8750 - val_loss: 438332.3048 - val_mean_squared_error: 438332.3125
Epoch 3/50
 - 0s - loss: 361926.2901 - mean_squared_error: 361926.2812 - val_loss: 280495.7440 - val_mean_squared_error: 280495.7500
Epoch 4/50
 - 0s - loss: 214782.7542 - mean_squared_error: 214782.7344 - val_loss: 150644.3793 - val_mean_squared_error: 150644.3750
Epoch 5/50
 - 0s - loss: 103910.1323 - mean_squared_error: 103910.1250 - val_loss: 61258.4265 - val_mean_squared_error: 61258.4258
Epoch 6/50
 - 0s - loss: 36540.5595 - mean_squared_error: 36540.5547 - val_loss: 17279.3753 - val_mean_squared_error: 17279.3750
Epoch 7/50
 - 0s - loss: 10419.3080 - mean_squared_error: 10419.3076 - val_loss: 5795.7761 - val_mean_squared_error: 5795.7759
Epoch 8/50
 - 0s - loss: 5335.3998 -

<keras.callbacks.callbacks.History at 0x7f92745bf8d0>

In [18]:
import statistics 
from sklearn.metrics import mean_squared_error

print(model.summary())
mse_results=[]
for x in range(50):
    model.fit(X_train,y_train,epochs=50, verbose=0)
    train_results = model.predict( X_test )
    mse_results.append( mean_squared_error(y_test, train_results) )


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________
None


In [19]:
print("Mean of the list of mean square errors is % s" % (statistics.mean( mse_results ) ))
print("Standard deviation of the list of mean square errors is % s" % ( statistics.stdev( mse_results )))


Mean of the list of mean square errors is 58.45680421702066
Standard deviation of the list of mean square errors is 23.748629803496954


In [20]:
mse_results

[139.6727210916912,
 117.56598575384467,
 108.97119341989183,
 105.55911263681432,
 111.38562402687256,
 103.8560192200769,
 101.461786940363,
 85.52588200461496,
 61.48434906963855,
 59.074719039081415,
 73.29097601709664,
 54.94494182869279,
 51.04045537767606,
 49.83619128435677,
 49.79017483789452,
 49.23035396760285,
 50.15574992467089,
 47.289455086332936,
 49.69610079567766,
 57.720837306834355,
 45.813813613433474,
 47.30990430276002,
 45.72322232309048,
 46.578246425785096,
 45.64815854496747,
 45.597314145273934,
 45.38177170050724,
 47.591464187397385,
 45.4064773701017,
 45.26048290807247,
 52.56980508939106,
 48.03699869861012,
 48.46304285481455,
 47.4904693166964,
 44.292609427718254,
 45.68484828907662,
 44.81797193923187,
 45.70341335396679,
 46.77445646469428,
 44.54949286464819,
 44.35973569987803,
 46.13968125477412,
 45.6870024556592,
 57.90292827461673,
 45.26763994232453,
 50.96869212902154,
 44.271158100630544,
 44.08247226619603,
 44.42249180833807,
 43.4918154

# Thanks for Reviewing the Project.