<H1>Building a Regression Model in Keras<H1>

# Table of Contents


<font size = 3>
    
1. Download and Clean Dataset
2. Import Keras  
3. Build a Neural Network  
4. Train and Test the Network  
5. Evaluating the model on the test data and computing the mean squared error between the predicted concrete strength and the actual concrete strength. 
6. Create a list of 50 mean squared errors 
7. Report the mean and the standard deviation of the mean squared errors. 

</font>
</div>

<h2>Download and clean dataset<h2>

In [3]:
import pandas as pd
import numpy as np

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

**_Now we will download the data and put it in pandas dataframe_**

In [10]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Let's check how many data points we have.

In [11]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

In [12]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Now to check the data set for the missing values

In [13]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

So, our data have no missing value

**_Now to separate the prectictor columns and the target column_** 

In [72]:
concrete_columns= concrete_data.columns
predictor= concrete_data[concrete_columns[concrete_columns!= "Strength"]]
target = concrete_data["Strength"]

now lets check both the predictors and the target dataframes

In [73]:
predictor.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [74]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

**NORMALIZING THE DATA**

In [75]:
predictors_norm = (predictor - predictor.mean()) / predictor.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


_For the finding the no of predictor which is to be provided as an input to the regression model_

In [76]:
no_of_predictors= predictors_norm.shape[1]
no_of_predictors

8

<h2>Importing Keras<h2>

In [24]:
import keras

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Let's import the rest of the packages from the Keras library that we will need to build our regression model.

In [26]:
from keras.models import Sequential
from keras.layers import Dense

<H1>Building the Neural Network for the regression model<H1>

In [77]:
def regression_model():
    model= Sequential()
    #designing with 1 hidden layer having 10 nodes and output layer having one node.
    model.add(Dense(10, activation= 'relu', input_shape=(no_of_predictors,)))
    model.add(Dense(1))
    
    #use the adam optimizer
    model.compile(optimizer= 'adam', loss= 'mean_squared_error')
    return model
    

**_1. At first importing the train_test_split and then split the data into training and test data_**

In [78]:
from sklearn.model_selection import train_test_split

In [79]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

In [80]:
X_train.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
196,-0.827393,-0.856472,0.723653,-0.747734,0.216835,0.430603,1.650364,-0.279597
631,0.419421,-0.856472,-0.846733,0.113922,-1.038638,1.15854,0.117485,-0.612034
81,0.360094,1.606458,-0.846733,-1.211343,1.355131,-1.553862,1.332313,-0.675355
526,0.74476,-0.636257,1.356496,-1.290952,0.785983,-0.397651,0.341992,-0.675355
830,-1.140293,1.345678,1.465876,-0.120224,2.141895,-1.735203,-0.406362,-0.279597


In [81]:
y_train.head()

196    25.72
631    17.54
81     25.20
526    23.64
830    33.76
Name: Strength, dtype: float64

**<h2>Training and Testing the Network<h2>**

**_2. Taking the epochs to be 100_**

In [82]:
model= regression_model()

In [83]:
no_of_epochs= 100
model.fit(X_train, y_train, epochs= no_of_epochs, verbose=1) #verbose = 1, which includes both progress bar and one line per epoch.

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fd2703d9668>

**_3. Evaluating the model on the test data and computing the mean squared error between the predicted concrete strength and the actual concrete strength._**

In [84]:
loss= model.evaluate(X_test, y_test)
loss



167.7187526172033

Computing the mean_squared error

In [85]:
from sklearn.metrics import mean_squared_error

In [86]:
y_pred=model.predict(X_test)

In [87]:
MSE = mean_squared_error(y_test, y_pred)
mean = np.mean(MSE)
standard_deviation = np.std(MSE)
print(mean, standard_deviation)

167.7187517306672 0.0


<h3>4. Create a list of 50 mean squared errors with 100 epochs.<h3>

In [89]:
total_MSE = 50
epochs = 100
mean_squared_errors = []
for i in range(0, total_MSE):
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE no. {} IS {}:".format(i+1,MSE))
    y_pred = model.predict(X_test)
    MSE = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(MSE)

mean_squared_errors = np.array(mean_squared_errors)


MSE no. 1 IS 51.57256294917134:
MSE no. 2 IS 44.86564153529294:
MSE no. 3 IS 43.00128704675964:
MSE no. 4 IS 42.21327493414524:
MSE no. 5 IS 42.09751842091384:
MSE no. 6 IS 41.34615369753544:
MSE no. 7 IS 40.420929411854175:
MSE no. 8 IS 39.10606364524866:
MSE no. 9 IS 38.47983234985747:
MSE no. 10 IS 38.68607643275585:
MSE no. 11 IS 38.10269716256645:
MSE no. 12 IS 37.89426817168695:
MSE no. 13 IS 38.010344755302356:
MSE no. 14 IS 37.604843935920194:
MSE no. 15 IS 37.674815890858476:
MSE no. 16 IS 37.727238571759564:
MSE no. 17 IS 37.740910903535614:
MSE no. 18 IS 37.43841913217094:
MSE no. 19 IS 37.32266521145225:
MSE no. 20 IS 37.54914884351218:
MSE no. 21 IS 37.10752325459205:
MSE no. 22 IS 37.05213791807107:
MSE no. 23 IS 37.123387660795046:
MSE no. 24 IS 36.993756093639384:
MSE no. 25 IS 36.99247128137878:
MSE no. 26 IS 36.8751409154108:
MSE no. 27 IS 37.204052625736374:
MSE no. 28 IS 36.96218897378175:
MSE no. 29 IS 36.73423914616162:
MSE no. 30 IS 36.48228911983157:
MSE no. 31 

<h3>5. Report the mean and the standard deviation of the mean squared errors.<h3>

In [90]:
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("The mean of all the MSE for the normalized data and {} epochs is {}\n".format(epochs,mean))
print("Standard Deviation of all the MSE for the normalized data and {} epochs is {}: ".format(epochs,standard_deviation))



The mean of all the MSE for the normalized data and 100 epochs is 37.979715716544455

Standard Deviation of all the MSE for the normalized data and 100 epochs is 2.7075057106917626: 


**<H1>THANK YOU<H1>**