<h1 align=center><font size = 5>Regression Models with Keras</font></h1>

In this project, I will build a regression model using the deep learning Keras library, and then I will experiment with increasing the number of training epochs and changing number of hidden layers and will see how changing these parameters impacts the performance of the model.

## Downlad And Clean the Dataset

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

In [1]:
import numpy as np
import pandas as pd 

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
concrete_data.shape

(1030, 9)

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
#Check for null values
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## Split the data into Predictors and Target 

In [6]:
col = concrete_data.columns
concrete_columns = col[col!='Strength'] 

In [7]:
predictors = concrete_data[concrete_columns]

In [8]:
target = concrete_data['Strength']

In [9]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [10]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## Normalizing the predictors data

In [11]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.795140,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.551340
3,0.491187,0.795140,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069
...,...,...,...,...,...,...,...,...
1025,-0.045623,0.487998,0.564271,-0.092126,0.451190,-1.322363,-0.065861,-0.279597
1026,0.392628,-0.856472,0.959602,0.675872,0.702285,-1.993711,0.496651,-0.279597
1027,-1.269472,0.759210,0.850222,0.521336,-0.017520,-1.035561,0.080068,-0.279597
1028,-1.168042,1.307430,-0.846733,-0.279443,0.852942,0.214537,0.191074,-0.279597


# Part B

Here we will import appropriate keras libraries and build a baseline model

This model contains :

- One hidden layer of 10 nodes, and a ReLU activation function

- For compilation, adam optimizer and the mean squared error as the loss function are used.

In [12]:
import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [13]:
def regression_model():
    # creation of model
    model = Sequential()
    n_cols = predictors_norm.shape[1]
    model.add(Dense(10, activation ='relu', input_shape = (n_cols,)))
    model.add(Dense(1))
    #compile model
    model.compile(optimizer = 'adam',loss = 'mean_squared_error')
    return model

This is what we are going to do next:
1. Randomly split the data into a training and test sets by holding 30% of the data for testing. I have used the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

In [14]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
all_mse = []
for steps in range(50):
    # Split the data
    X_train,X_test,y_train,y_test = train_test_split(predictors_norm,target,test_size = 0.3)
    # Train the model
    model = regression_model()
    model.fit(X_train,y_train, validation_data=(X_test,y_test),epochs = 50,verbose =0)
    y_pred = model.predict(X_test)
    # Evaluate the model on test data
    mse = mean_squared_error(y_test,y_pred)
    print('MSE for {} step is: {}'.format(steps+1,mse))
    all_mse.append(mse)
print(all_mse)

MSE for 1 step is: 312.46630036611157
MSE for 2 step is: 292.61445059683075
MSE for 3 step is: 269.6607846510145
MSE for 4 step is: 331.9015688692918
MSE for 5 step is: 406.9262200009624
MSE for 6 step is: 388.63756979089806
MSE for 7 step is: 693.3047570020991
MSE for 8 step is: 327.71848207378673
MSE for 9 step is: 212.35751482137897
MSE for 10 step is: 380.8145473682204
MSE for 11 step is: 521.9743476101793
MSE for 12 step is: 457.0622539417305
MSE for 13 step is: 682.6585852070385
MSE for 14 step is: 367.44617531517093
MSE for 15 step is: 319.0091207904404
MSE for 16 step is: 553.711361936188
MSE for 17 step is: 512.093403482732
MSE for 18 step is: 584.7673014871624
MSE for 19 step is: 509.1496388759596
MSE for 20 step is: 517.7069986811024
MSE for 21 step is: 275.7664875725157
MSE for 22 step is: 658.6285598667354
MSE for 23 step is: 341.09528757925585
MSE for 24 step is: 594.6889879569343
MSE for 25 step is: 515.8383599815483
MSE for 26 step is: 284.5166917977887
MSE for 27 step 

Mean And Standard Deviation of all MSE (Reported)

In [15]:
mean= np.mean(all_mse)
std = np.std(all_mse)
mean_50 = np.around(mean,decimals=4)
std_50 = np.around(std,decimals=4)
print('Mean of all 50 mean squared errors is : ', mean_50)
print('Standard Deviation of all 50 mean squared error is : ',std_50)

Mean of all 50 mean squared errors is :  392.9822
Standard Deviation of all 50 mean squared error is :  125.6754
