**Let's start by importing the pandas and the Numpy libraries.**

In [23]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [24]:
from google.colab import files
uploaded = files.upload()

Saving concrete_data.csv to concrete_data (2).csv


We will be using the dataset provided in the assignment

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

Let's read the dataset into a pandas dataframe

In [25]:
import io
df = pd.read_csv(io.BytesIO(uploaded['concrete_data.csv']))

In [26]:
print(df.shape)
df.head()

(1030, 9)


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.

In [27]:
df.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [28]:
df.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64


The data looks very clean and is ready to be used to build our model.

**Split data into predictors and target**

**The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.**

In [29]:
target = np.array(df['Strength'])
target = target.reshape(-1,1)

In [30]:
predictors = df.iloc[:, df.columns != 'Strength']

In [31]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


**Let's import scikit-learn in order to randomly split the data into a training and test sets**

In [32]:
from sklearn.model_selection import train_test_split

**Splitting the data into a training and test sets by holding 30% of the data for testing**

In [42]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size = 0.3, random_state = 42)

In [43]:
X_train.shape

(721, 8)

In [44]:
y_train.shape

(721, 1)

In [46]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [47]:
import keras
from keras.models import Sequential
from keras.layers import Dense 

As you can see, the TensorFlow backend was used to install the Keras library.

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [48]:
# define model
model = Sequential()

# create model
model.add(Dense(10, activation = 'relu', input_shape = (8,)))
model.add(Dense(1))

# compile model
model.compile(loss = 'mean_squared_error',
              optimizer = keras.optimizers.Adam(learning_rate=0.01))

**The above model has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function**

In [49]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 11        
Total params: 101
Trainable params: 101
Non-trainable params: 0
_________________________________________________________________


**Next, we will train the model for 50 epochs.**

In [50]:
model.fit(X_train, y_train, epochs = 50, verbose = 1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.callbacks.History at 0x7f6ca1bac828>

**Next we need to evaluate the model on the test data**

In [51]:
loss_val = model.evaluate(X_test, y_test)
loss_val



44.864296477975195

In [52]:
y_pred = model.predict(X_test)

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn

In [53]:
from sklearn.metrics import mean_squared_error

In [54]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

44.86429665030754 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [56]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors without normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 89.41072894840178
MSE 2: 91.37424009749033
MSE 3: 86.28222088366265
MSE 4: 87.45288937763102
MSE 5: 89.41414610236208
MSE 6: 89.48154701306981
MSE 7: 111.88950552832347
MSE 8: 83.65436228039195
MSE 9: 100.54051445525826
MSE 10: 98.43388860511162
MSE 11: 88.89543783934757
MSE 12: 76.17717813288124
MSE 13: 87.8869971982098
MSE 14: 92.8661318782078
MSE 15: 83.07697422064624
MSE 16: 84.80121887848986
MSE 17: 83.92685417990083
MSE 18: 83.64178017428006
MSE 19: 75.71739811573214
MSE 20: 90.80676449772609
MSE 21: 79.45960198559807
MSE 22: 86.38368175716462
MSE 23: 83.12835273619223
MSE 24: 85.12336261835685
MSE 25: 85.02880382846475
MSE 26: 96.61033901967663
MSE 27: 100.24842132025167
MSE 28: 87.8856613751754
MSE 29: 94.96006379852788
MSE 30: 92.45942389232056
MSE 31: 102.162035698258
MSE 32: 75.338090816362
MSE 33: 80.1502518638438
MSE 34: 95.76087082551135
MSE 35: 85.06674159769102
MSE 36: 94.39223149370608
MSE 37: 92.09910773613692
MSE 38: 94.03248262868344
MSE 39: 89.68582017522029