<H1>Building a Regression Model in Keras<H1>

# Table of Contents


<font size = 3>
    
1. Download and Clean Dataset
2. Import Keras  
3. Build a Neural Network  
4. Train and Test the Network  
5. Evaluating the model on the test data and computing the mean squared error between the predicted concrete strength and the actual concrete strength. 
6. Create a list of 50 mean squared errors 
7. Report the mean and the standard deviation of the mean squared errors. 

</font>
</div>

<h2>Download and clean dataset<h2>

In [1]:
import pandas as pd
import numpy as np

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

**_Now we will download the data and put it in pandas dataframe_**

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Let's check how many data points we have.

In [3]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Now to check the data set for the missing values

In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

So, our data have no missing value

**_Now to separate the prectictor columns and the target column_** 

In [6]:
concrete_columns= concrete_data.columns
predictor= concrete_data[concrete_columns[concrete_columns!= "Strength"]]
target = concrete_data["Strength"]

now lets check both the predictors and the target dataframes

In [7]:
predictor.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

**NORMALIZING THE DATA**

In [9]:
predictors_norm = (predictor - predictor.mean()) / predictor.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


_For the finding the no of predictor which is to be provided as an input to the regression model_

In [10]:
no_of_predictors= predictors_norm.shape[1]
no_of_predictors

8

<h2>Importing Keras<h2>

In [11]:
import keras

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Let's import the rest of the packages from the Keras library that we will need to build our regression model.

In [12]:
from keras.models import Sequential
from keras.layers import Dense

<H1>Building the Neural Network for the regression model with 3 hidden layer each having 10 nodes<H1>

In [13]:
def regression_model():
    model= Sequential()
    #designing with 1 hidden layer having 10 nodes and output layer having one node.
    model.add(Dense(10, activation= 'relu', input_shape=(no_of_predictors,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    #use the adam optimizer
    model.compile(optimizer= 'adam', loss= 'mean_squared_error')
    return model
    

**_1. At first importing the train_test_split and then split the data into training and test data_**

In [14]:
from sklearn.model_selection import train_test_split

In [15]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

In [16]:
X_train.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
196,-0.827393,-0.856472,0.723653,-0.747734,0.216835,0.430603,1.650364,-0.279597
631,0.419421,-0.856472,-0.846733,0.113922,-1.038638,1.15854,0.117485,-0.612034
81,0.360094,1.606458,-0.846733,-1.211343,1.355131,-1.553862,1.332313,-0.675355
526,0.74476,-0.636257,1.356496,-1.290952,0.785983,-0.397651,0.341992,-0.675355
830,-1.140293,1.345678,1.465876,-0.120224,2.141895,-1.735203,-0.406362,-0.279597


In [17]:
y_train.head()

196    25.72
631    17.54
81     25.20
526    23.64
830    33.76
Name: Strength, dtype: float64

**<h2>Training and Testing the Network<h2>**

**_2. Taking the epochs to be 50_**

In [18]:
model= regression_model()

In [19]:
no_of_epochs= 50
model.fit(X_train, y_train, epochs= no_of_epochs, verbose=1) #verbose = 1, which includes both progress bar and one line per epoch.

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f1ac15e8f98>

**_3. Evaluating the model on the test data and computing the mean squared error between the predicted concrete strength and the actual concrete strength._**

In [20]:
loss= model.evaluate(X_test, y_test)
loss



119.42559881117738

Computing the mean_squared error

In [21]:
from sklearn.metrics import mean_squared_error

In [22]:
y_pred=model.predict(X_test)

In [23]:
MSE = mean_squared_error(y_test, y_pred)
mean = np.mean(MSE)
standard_deviation = np.std(MSE)
print(mean, standard_deviation)

119.42560288969909 0.0


<h3>4. Create a list of 50 mean squared errors.<h3>

In [24]:
total_MSE = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_MSE):
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE no. {} IS {}:".format(i+1,MSE))
    y_pred = model.predict(X_test)
    MSE = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(MSE)

mean_squared_errors = np.array(mean_squared_errors)


MSE no. 1 IS 77.29472064663291:
MSE no. 2 IS 54.88505966532192:
MSE no. 3 IS 46.18125945156061:
MSE no. 4 IS 40.41512550428076:
MSE no. 5 IS 37.59701029459635:
MSE no. 6 IS 35.67461918704332:
MSE no. 7 IS 36.9023902423945:
MSE no. 8 IS 37.39774676819835:
MSE no. 9 IS 37.31124922372762:
MSE no. 10 IS 36.4053902734059:
MSE no. 11 IS 35.841359666250284:
MSE no. 12 IS 33.57525616864942:
MSE no. 13 IS 33.23331550178404:
MSE no. 14 IS 33.70248655671055:
MSE no. 15 IS 32.3936095808702:
MSE no. 16 IS 33.21037319640126:
MSE no. 17 IS 32.760383649165576:
MSE no. 18 IS 34.44945815003034:
MSE no. 19 IS 32.79441458199017:
MSE no. 20 IS 32.77433485506422:
MSE no. 21 IS 32.20488733458288:
MSE no. 22 IS 31.30714988708496:
MSE no. 23 IS 30.63528795458352:
MSE no. 24 IS 30.836309994694485:
MSE no. 25 IS 31.264220537105423:
MSE no. 26 IS 30.050636612481668:
MSE no. 27 IS 31.020517083819243:
MSE no. 28 IS 30.47375118539557:
MSE no. 29 IS 29.970853280866802:
MSE no. 30 IS 30.779569422157067:
MSE no. 31 IS 

<h3>5. Report the mean and the standard deviation of the mean squared errors.<h3>

In [25]:
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("The mean of all the MSE for the normalized data and having 3 hidden layers is {}\n".format(mean))
print("Standard Deviation of all the MSE for the normalized data and having 3 hidden layers is {}: ".format(standard_deviation))



The mean of all the MSE for the normalized data and having 3 hidden layers is 33.417440862674724

Standard Deviation of all the MSE for the normalized data and having 3 hidden layers is 7.869824409832897: 


**<H1>THANK YOU<H1>**