## Download and Clean Dataset

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

We will be using the dataset provided in the assignment

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's read the dataset into a <em>pandas</em> dataframe.

In [3]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.

#### Let's check how many data points we have.

In [4]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [5]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [6]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

#### Split data into predictors and target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [7]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick sanity check of the predictors and the target dataframes.

In [8]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [9]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.

In [10]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [11]:
n_cols = predictors_norm.shape[1] # number of predictors

## Import Keras

#### Let's go ahead and import the Keras library

In [12]:
import keras

In [13]:
from keras.models import Sequential
from keras.layers import Dense

In [14]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))

    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

## Train and Test the Network

In [18]:
# build the model
model = regression_model()

In [19]:
# fit the model
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Epoch 1/50
23/23 - 1s - loss: 1642.6193 - 726ms/epoch - 32ms/step
Epoch 2/50
23/23 - 0s - loss: 1625.6422 - 29ms/epoch - 1ms/step
Epoch 3/50
23/23 - 0s - loss: 1609.6696 - 31ms/epoch - 1ms/step
Epoch 4/50
23/23 - 0s - loss: 1593.8514 - 23ms/epoch - 996us/step
Epoch 5/50
23/23 - 0s - loss: 1578.4797 - 34ms/epoch - 1ms/step
Epoch 6/50
23/23 - 0s - loss: 1562.9274 - 24ms/epoch - 1ms/step
Epoch 7/50
23/23 - 0s - loss: 1547.4756 - 21ms/epoch - 915us/step
Epoch 8/50
23/23 - 0s - loss: 1531.6064 - 22ms/epoch - 940us/step
Epoch 9/50
23/23 - 0s - loss: 1515.4249 - 22ms/epoch - 966us/step
Epoch 10/50
23/23 - 0s - loss: 1498.5701 - 30ms/epoch - 1ms/step
Epoch 11/50
23/23 - 0s - loss: 1481.2640 - 24ms/epoch - 1ms/step
Epoch 12/50
23/23 - 0s - loss: 1462.7396 - 24ms/epoch - 1ms/step
Epoch 13/50
23/23 - 0s - loss: 1443.6896 - 20ms/epoch - 881us/step
Epoch 14/50
23/23 - 0s - loss: 1422.9419 - 23ms/epoch - 1ms/step
Epoch 15/50
23/23 - 0s - loss: 1401.2277 - 25ms/epoch - 1ms/step
Epoch 16/50
23/23 - 0s

<keras.callbacks.History at 0x7efee2fe52d0>

Evaluate the model on the test data

In [20]:
value_loss = model.evaluate(X_test, y_test)
y_predict = model.predict(X_test)
print(value_loss)

411.6815185546875


Compute the mean squared error

In [21]:
from sklearn.metrics import mean_squared_error

In [23]:
mse = mean_squared_error(y_test, y_predict)
mean = np.mean(mse)
standard_deviation = np.std(mse)
print(mean, standard_deviation)

411.68151272522397 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [27]:
total_mse = 50
epochs = 100
mean_list = []
for i in range(0, total_mse):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_list.append(mean_square_error)

mean_list = np.array(mean_list)
mean = np.mean(mean_list)
standard_deviation = np.std(mean_list)

print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 43.61479187011719
MSE 2: 45.24892044067383
MSE 3: 32.268672943115234
MSE 4: 34.62186813354492
MSE 5: 37.006141662597656
MSE 6: 36.75034713745117
MSE 7: 40.32994079589844
MSE 8: 30.128551483154297
MSE 9: 32.30052947998047
MSE 10: 31.251461029052734
MSE 11: 31.36478614807129
MSE 12: 31.694019317626953
MSE 13: 36.269752502441406
MSE 14: 36.453887939453125
MSE 15: 28.543113708496094
MSE 16: 27.857086181640625
MSE 17: 31.275650024414062
MSE 18: 31.021648406982422
MSE 19: 30.077129364013672
MSE 20: 32.27180862426758
MSE 21: 27.95002555847168
MSE 22: 29.337949752807617
MSE 23: 29.52682876586914
MSE 24: 28.214445114135742
MSE 25: 31.930389404296875
MSE 26: 32.66525650024414
MSE 27: 27.63722038269043
MSE 28: 29.50016975402832
MSE 29: 33.03107452392578
MSE 30: 32.064186096191406
MSE 31: 27.355655670166016
MSE 32: 27.09772491455078
MSE 33: 27.72898292541504
MSE 34: 29.490209579467773
MSE 35: 32.199092864990234
MSE 36: 36.55816650390625
MSE 37: 26.634899139404297
MSE 38: 31.284866333007812
