
<div>
    &nbsp;
    <br>
</div>


# Regression in Keras
We will be using the dataset provided in the assignment

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate


## Load and Clean Dataset
<br>
<b>Import python required libraries.

In [29]:
import pandas as pd
import numpy as np

<b>Import Keras

In [30]:
import keras

from keras.models import Sequential
from keras.layers import Dense

<b>Import sklearn libs

In [31]:
from sklearn.model_selection import train_test_split

<b>Let's read the dataset into a pandas dataframe.

In [32]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa

<b>Check how many data pointsiin the dataset

In [33]:
concrete_data.shape

(1030, 9)

Let's check missing values in the dataset

In [34]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data in the dataset looks clean and is ready to be used to build our model.

<b>Summary of the dataset

In [35]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


<b>Splitting data into predictors and target</b>
    
The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [36]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick check of the predictors and the target dataframes.

In [37]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [38]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [39]:
n_cols = predictors.shape[1] # number of predictors
n_cols

8

<b>Building a our regressoin model.</b>
<br>It has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

In [40]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

<b>Splitting the data into a training and test sets by holding 30% of the data for testing</b>

In [41]:
predictors_train, predictors_test, target_train, target_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

Let's call the function now to create our model.

In [42]:
# build the model
model = regression_model()

Next, we will train the model for 50 epochs.

In [43]:
# fit the model
epochs = 50
model.fit(predictors_train, target_train, epochs=epochs, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.callbacks.History at 0x289c8e44668>

Next we need to evaluate the model on the test data

In [44]:
loss_val = model.evaluate(predictors_test, target_test)
target_pred = model.predict(predictors_test)
print("Loss value from test data : {}".format(loss_val))

Loss value from test data : 413.6012930762035


Next, we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [45]:
from sklearn.metrics import mean_squared_error

In [46]:
mse = mean_squared_error(target_test, target_pred)
mean = np.mean(mse)
std = np.std(mse)
print("Mean : {}, Standard Deviation : {}".format(mean, std))

Mean : 413.6013007622767, Standard Deviation : 0.0


Ceate a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [47]:
no_of_mses = 50
epochs = 50
mse_list = []
for i in range(0, no_of_mses):
    predictors_train, predictors_test, target_train, target_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(predictors_train, target_train, epochs=epochs, verbose=0)
    loss_val = model.evaluate(predictors_test, target_test, verbose=0)
    print("{}. Mean squared error {} ".format(i+1, loss_val))
    target_pred = model.predict(predictors_test)
    mse = mean_squared_error(target_test, target_pred)
    mse_list.append(mse)

mse_array = np.array(mse_list)
mean = np.mean(mse_array)
std = np.std(mse_array)

print('\n*************************************************\n')
print("Mean and Standard deviation of {} mean squared errors without normalized data. Total number of epochs for each training is : {} ".format(no_of_mses, epochs))
print('\n *************************************************\n')
print("Mean : {} and Standard Deviation : {} ".format(mean,std))
print('\n=================================================\n')


1. Mean squared error 108.43270411074741 
2. Mean squared error 111.62944351739482 
3. Mean squared error 93.42067795046711 
4. Mean squared error 104.40515000920466 
5. Mean squared error 84.8316264723497 
6. Mean squared error 74.17859686694099 
7. Mean squared error 74.27852188653544 
8. Mean squared error 67.57202789158497 
9. Mean squared error 55.13566167146257 
10. Mean squared error 55.27983758673313 
11. Mean squared error 56.207173134516744 
12. Mean squared error 44.98101826393103 
13. Mean squared error 61.153713559641425 
14. Mean squared error 54.478682237149826 
15. Mean squared error 55.054148961039424 
16. Mean squared error 52.552983330291454 
17. Mean squared error 49.547627668164694 
18. Mean squared error 51.65475223133865 
19. Mean squared error 47.895432469142676 
20. Mean squared error 47.857161623760334 
21. Mean squared error 45.54852756636043 
22. Mean squared error 52.24282881357137 
23. Mean squared error 45.2308139245487 
24. Mean squared error 46.20223770