## Download and Clean Dataset

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

We will be using the dataset provided in the assignment

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's read the dataset into a <em>pandas</em> dataframe.

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.

#### Let's check how many data points we have.

In [3]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

#### Split data into predictors and target

In [6]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [9]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [10]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalize the data

In [11]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [12]:
n_cols = predictors_norm.shape[1] # number of predictors
n_cols

8

## Import Keras

In [13]:
import keras

In [14]:
from keras.models import Sequential
from keras.layers import Dense

In [15]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))

    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

## Train and Test the Network

In [18]:
# build the model
model = regression_model()

Train the model for 50 epochs

In [19]:
# fit the model
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Epoch 1/50
23/23 - 1s - loss: 1586.9479 - 843ms/epoch - 37ms/step
Epoch 2/50
23/23 - 0s - loss: 1567.8469 - 42ms/epoch - 2ms/step
Epoch 3/50
23/23 - 0s - loss: 1548.7368 - 34ms/epoch - 1ms/step
Epoch 4/50
23/23 - 0s - loss: 1529.8589 - 36ms/epoch - 2ms/step
Epoch 5/50
23/23 - 0s - loss: 1510.6420 - 45ms/epoch - 2ms/step
Epoch 6/50
23/23 - 0s - loss: 1491.4749 - 35ms/epoch - 2ms/step
Epoch 7/50
23/23 - 0s - loss: 1471.7322 - 36ms/epoch - 2ms/step
Epoch 8/50
23/23 - 0s - loss: 1451.4570 - 36ms/epoch - 2ms/step
Epoch 9/50
23/23 - 0s - loss: 1430.8984 - 55ms/epoch - 2ms/step
Epoch 10/50
23/23 - 0s - loss: 1409.3481 - 36ms/epoch - 2ms/step
Epoch 11/50
23/23 - 0s - loss: 1387.4720 - 36ms/epoch - 2ms/step
Epoch 12/50
23/23 - 0s - loss: 1364.9342 - 34ms/epoch - 1ms/step
Epoch 13/50
23/23 - 0s - loss: 1341.4254 - 46ms/epoch - 2ms/step
Epoch 14/50
23/23 - 0s - loss: 1317.5692 - 40ms/epoch - 2ms/step
Epoch 15/50
23/23 - 0s - loss: 1292.9131 - 38ms/epoch - 2ms/step
Epoch 16/50
23/23 - 0s - loss: 1

<keras.callbacks.History at 0x7ffa349cd5d0>

In [21]:
value_loss = model.evaluate(X_test, y_test)
y_predict = model.predict(X_test)
print(value_loss)

395.0867004394531


Compute the mean squared error

In [22]:
from sklearn.metrics import mean_squared_error

In [23]:
mse = mean_squared_error(y_test, y_predict)
mean = np.mean(mse)
standard_deviation = np.std(mse)
print(mean, standard_deviation)

395.08669702478545 0.0


In [28]:
total_mse = 50
epochs = 50
mse_list = []
for i in range(0, total_mse):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mse_list.append(mean_square_error)

mse_list = np.array(mse_list)
mean = np.mean(mse_list)
standard_deviation = np.std(mse_list)

print("\n"+"Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 31.853252410888672
MSE 2: 34.54195785522461
MSE 3: 26.743545532226562
MSE 4: 29.86432647705078
MSE 5: 30.861968994140625
MSE 6: 33.47861099243164
MSE 7: 35.79701232910156
MSE 8: 29.702604293823242
MSE 9: 30.289812088012695
MSE 10: 30.99469757080078
MSE 11: 31.550439834594727
MSE 12: 28.356761932373047
MSE 13: 35.14707946777344
MSE 14: 35.28018569946289
MSE 15: 29.67169189453125
MSE 16: 25.742422103881836
MSE 17: 30.450796127319336
MSE 18: 31.746219635009766
MSE 19: 27.568599700927734
MSE 20: 29.252843856811523
MSE 21: 28.731002807617188
MSE 22: 28.430152893066406
MSE 23: 25.767080307006836
MSE 24: 28.80436134338379
MSE 25: 30.923486709594727
MSE 26: 30.69460105895996
MSE 27: 26.397815704345703
MSE 28: 26.275312423706055
MSE 29: 33.14794921875
MSE 30: 32.684696197509766
MSE 31: 29.152671813964844
MSE 32: 26.442453384399414
MSE 33: 26.002721786499023
MSE 34: 28.89815330505371
MSE 35: 29.155851364135742
MSE 36: 32.199039459228516
MSE 37: 26.915563583374023
MSE 38: 31.67724800109863