# Coursera assignment: Build a Regression Model in Keras

## A. Build a baseline model (5 marks)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

Submit your Jupyter Notebook with your code and comments.

### Program Start

The predictors in the data of concrete strength include:

- Cement
- Blast Furnace Slag
- Fly Ash
- Water
- Superplasticizer
- Coarse Aggregate
- Fine Aggregate

Get the CSV file online

In [1]:
!wget -O concrete_data.csv https://cocl.us/concrete_data

--2020-04-09 03:44:54--  https://cocl.us/concrete_data
Resolving cocl.us (cocl.us)... 158.85.108.86, 158.85.108.83, 169.48.113.194
Connecting to cocl.us (cocl.us)|158.85.108.86|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv [following]
--2020-04-09 03:44:55--  https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv
Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196
Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58988 (58K) [text/csv]
Saving to: ‘concrete_data.csv’


2020-04-09 03:44:55 (1.31 MB/s) - ‘concrete_data.csv’ saved [58988/58988]



In [2]:
import pandas as pd
import numpy as np

In [3]:
df_concrete_data = pd.read_csv('concrete_data.csv')
print("The shape is" , df_concrete_data.shape)
df_concrete_data.head()

The shape is (1030, 9)


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [4]:
df_concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
# always check the null data entry
df_concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The dataframe is good to go

## Split data into Predictor and Target

In [6]:
concrete_data_columns = df_concrete_data.columns

df_predictor = df_concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
df_target = df_concrete_data['Strength'] # Strength column

In [7]:
df_predictor.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
df_target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [9]:
print("The shape of the predictor is", df_predictor.shape)

n_cols = df_predictor.shape[1] # number of predictors
print("Input shape:", n_cols)

The shape of the predictor is (1030, 8)
Input shape: 8


## Build model using Keras

In [10]:
import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [11]:
# define a regression model

def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

## Train Test Split (holding 30% of the data for testing)

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_predictor, df_target, test_size=0.3, random_state=42)

print("# of test samples :", X_test.shape[0])
print("# of training samples:",X_train.shape[0])

# of test samples : 309
# of training samples: 721


## Train Model

In [13]:
# build the model
model = regression_model()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________


In [14]:
# Train the model with 50 epochs (fit)
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f1a5e0bb6d8>

## Evaluate The Model

In [15]:
scores = model.evaluate(X_test, y_test)
# loss_val = model.evaluate(X_test, y_test, verbose=0) # to show the intermediate stage
y_pred = model.predict(X_test)
 
scores



908.5862559099413

## Compute the mean squared error

In [16]:
from sklearn.metrics import mean_squared_error

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

908.5862444680388 0.0


## Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [17]:
epochs = 50

total_mean_squared_errors = 50
mean_squared_errors = []

for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(df_predictor, df_target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("Mean Squared Errors "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)


Mean Squared Errors 1: 298.3789247185667
Mean Squared Errors 2: 134.99038832087348
Mean Squared Errors 3: 109.17553066512913
Mean Squared Errors 4: 121.54464632793538
Mean Squared Errors 5: 121.36958821614583
Mean Squared Errors 6: 108.52132472559857
Mean Squared Errors 7: 139.07430457451582
Mean Squared Errors 8: 98.33745918150473
Mean Squared Errors 9: 119.21499495521718
Mean Squared Errors 10: 112.80297130597062
Mean Squared Errors 11: 107.09846496582031
Mean Squared Errors 12: 103.01061735184061
Mean Squared Errors 13: 115.70322788571849
Mean Squared Errors 14: 116.21136198074686
Mean Squared Errors 15: 108.96208474859836
Mean Squared Errors 16: 110.1343563042798
Mean Squared Errors 17: 104.38500601265423
Mean Squared Errors 18: 124.60925534936602
Mean Squared Errors 19: 96.07111941340672
Mean Squared Errors 20: 142.77576977381042
Mean Squared Errors 21: 121.81986135340817
Mean Squared Errors 22: 105.56562950850305
Mean Squared Errors 23: 107.37049665728819
Mean Squared Errors 24: 

## Report the mean and the standard deviation of the mean squared errors.

In [18]:
mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

Mean: 119.70534892302643
Standard Deviation: 27.74090062286817
