## Download and Clean Dataset

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [36]:
import pandas as pd
import numpy as np

We will be using the dataset provided in the assignment

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's read the dataset into a <em>pandas</em> dataframe.

In [37]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.

#### Let's check how many data points we have.

In [38]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [39]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [41]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

#### Split data into predictors and target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [42]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick sanity check of the predictors and the target dataframes.

In [43]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [44]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [45]:
n_cols = predictors.shape[1] # number of predictors
n_cols

8

## Import Keras

#### Let's go ahead and import the Keras library

In [46]:
import keras

In [47]:
from keras.models import Sequential
from keras.layers import Dense

In [48]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))

    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [49]:
from sklearn.model_selection import train_test_split

In [50]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Train and Test the Network

In [51]:
model = regression_model()

Train the model for 50 epochs.


In [52]:
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f4c8f5f0ee0>

Next we need to evaluate the model on the test data.

In [53]:
value_loss = model.evaluate(X_test, y_test)
y_predict = model.predict(X_test)
print(value_loss)

766.6919555664062


Compute the mean squared error

In [54]:
from sklearn.metrics import mean_squared_error

In [55]:
mse = mean_squared_error(y_test, y_predict)
mean = np.mean(mse)
standard_deviation = np.std(mse)
print("\n",mean, standard_deviation)


 766.6918965492175 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [56]:
total_mse = 50
epochs = 50
mse_list = []
for i in range(0, total_mse):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mse_list.append(mean_square_error)

mean_squared_errors = np.array(mse_list)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print("\n"+"Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 479.666259765625
MSE 2: 391.8470764160156
MSE 3: 240.0368194580078
MSE 4: 174.65576171875
MSE 5: 153.21290588378906
MSE 6: 100.96630096435547
MSE 7: 95.82848358154297
MSE 8: 63.980133056640625
MSE 9: 58.31065368652344
MSE 10: 50.848480224609375
MSE 11: 45.13578796386719
MSE 12: 44.296875
MSE 13: 49.43349075317383
MSE 14: 53.03451156616211
MSE 15: 43.50576400756836
MSE 16: 37.47605895996094
MSE 17: 43.242645263671875
MSE 18: 43.59335708618164
MSE 19: 40.94395446777344
MSE 20: 42.996700286865234
MSE 21: 40.4398307800293
MSE 22: 39.10750961303711
MSE 23: 39.997581481933594
MSE 24: 41.860355377197266
MSE 25: 42.929046630859375
MSE 26: 44.75600051879883
MSE 27: 42.03479766845703
MSE 28: 37.91429138183594
MSE 29: 47.31024169921875
MSE 30: 42.43468475341797
MSE 31: 42.110469818115234
MSE 32: 41.79135513305664
MSE 33: 40.460243225097656
MSE 34: 44.27375793457031
MSE 35: 42.02717971801758
MSE 36: 47.51178741455078
MSE 37: 42.50537109375
MSE 38: 49.02501678466797
MSE 39: 44.94594573974609