### part A: **Build a baseline model**

importing numpy and pandas libraries

In [1]:
import numpy as np
import pandas as pd

Let's read the data into a pandas dataframe.

In [2]:
data = pd.read_csv('concrete_data.csv')
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


##### split data into training and test sets

In [3]:
data_columns = data.columns

predictors = data[data_columns[data_columns != 'Strength']] # all columns except Strength
target = data['Strength'] # only Strength column

In [4]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [5]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.30, random_state=42)

In [7]:
n_cols = predictors.shape[1] # number of predictors
print(n_cols)

8


### Building a Regression Model

importing keras library

In [8]:
import keras 

Using TensorFlow backend.


Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [9]:
from keras.models import Sequential
from keras.layers import Dense

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [10]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error') # adam optimizer 
    return model

### Train and Evaluate the Network

In [11]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the fit method. We will leave out 30% of the data for validation and we will train the model for 50 epochs.

In [12]:
# fit the model
model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 0s - loss: 13795.2434 - val_loss: 9676.8000
Epoch 2/50
 - 0s - loss: 5170.4318 - val_loss: 7715.2289
Epoch 3/50
 - 0s - loss: 4681.2597 - val_loss: 7376.8979
Epoch 4/50
 - 0s - loss: 4406.4818 - val_loss: 6987.5006
Epoch 5/50
 - 0s - loss: 4154.2408 - val_loss: 6563.5857
Epoch 6/50
 - 0s - loss: 3923.3313 - val_loss: 6128.3593
Epoch 7/50
 - 0s - loss: 3701.9551 - val_loss: 5771.1440
Epoch 8/50
 - 0s - loss: 3500.8530 - val_loss: 5373.9006
Epoch 9/50
 - 0s - loss: 3285.1456 - val_loss: 5004.9109
Epoch 10/50
 - 0s - loss: 3084.3723 - val_loss: 4630.3224
Epoch 11/50
 - 0s - loss: 2887.2430 - val_loss: 4275.6813
Epoch 12/50
 - 0s - loss: 2695.3895 - val_loss: 3955.8096
Epoch 13/50
 - 0s - loss: 2519.2506 - val_loss: 3608.4172
Epoch 14/50
 - 0s - loss: 2336.4174 - val_loss: 3320.7865
Epoch 15/50
 - 0s - loss: 2172.7401 - val_loss: 3036.7290
Epoch 16/50
 - 0s - loss: 2013.4987 - val_loss: 2741.2249
Epoch 17/50
 - 0s - loss: 1855.885

<keras.callbacks.callbacks.History at 0x21bfc606dc8>

In [13]:
model.evaluate(X_test,y_test, verbose=1)



383.1999774426704

   Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

   Let's import the mean_squared_error function from Scikit-learn.

In [14]:
from sklearn.metrics import mean_squared_error

In [15]:
y_pred = model.predict(X_test)

In [16]:
y_pred

array([[57.917442 ],
       [50.090668 ],
       [57.296593 ],
       [49.627205 ],
       [61.573402 ],
       [51.68225  ],
       [41.048195 ],
       [61.348915 ],
       [49.314583 ],
       [51.143333 ],
       [39.766747 ],
       [13.405937 ],
       [40.173492 ],
       [34.57502  ],
       [28.165886 ],
       [29.457558 ],
       [40.62139  ],
       [19.454712 ],
       [28.589165 ],
       [23.103767 ],
       [25.582916 ],
       [51.373344 ],
       [55.705406 ],
       [17.634804 ],
       [29.066414 ],
       [52.09037  ],
       [17.052635 ],
       [25.526833 ],
       [31.48954  ],
       [32.31048  ],
       [ 9.39048  ],
       [43.95951  ],
       [24.6436   ],
       [34.727074 ],
       [21.898201 ],
       [12.699135 ],
       [48.464188 ],
       [42.21615  ],
       [44.479942 ],
       [29.585625 ],
       [20.605698 ],
       [52.224915 ],
       [46.60746  ],
       [55.49897  ],
       [15.466438 ],
       [58.288834 ],
       [52.232193 ],
       [77.46

In [17]:
mse = mean_squared_error(y_test, y_pred)
mean = np.mean(mse)
standard_deviation = np.std(mse)
print('mse: ', mse)

mse:  383.1999822079114


Let's create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [19]:
n = 50
epochs = 50
mean_squared_errors = []
for i in range(0, n):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.30, random_state=42)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    mse = model.evaluate(X_test, y_test, verbose=0)
    print("mse " +str(i+1)+": ", mse)
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)
    
mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print("Mean : "+str(mean))
print("Standard Deviation : "+str(standard_deviation))
    

mse 1:  146.83918031365354
mse 2:  79.69963342857979
mse 3:  66.97847210782245
mse 4:  62.96653700646459
mse 5:  61.488126180704356
mse 6:  58.89988451713883
mse 7:  58.59510982229486
mse 8:  69.43359650300158
mse 9:  55.666722652595794
mse 10:  53.29456144166224
mse 11:  52.226677570528196
mse 12:  58.28093059008947
mse 13:  50.589997819326456
mse 14:  48.87046574466051
mse 15:  48.760212438392024
mse 16:  48.89734571882822
mse 17:  50.74771276183885
mse 18:  49.73261722700496
mse 19:  52.41838830497272
mse 20:  48.891177995305235
mse 21:  51.14467449249959
mse 22:  49.392465696365704
mse 23:  53.17185180318394
mse 24:  48.87644127657499
mse 25:  49.82361857096354
mse 26:  53.181613020912344
mse 27:  55.25504623956279
mse 28:  52.03930785046426
mse 29:  48.30930930819712
mse 30:  52.08312861124674
mse 31:  49.56491049362232
mse 32:  48.594852879595216
mse 33:  48.65456897772631
mse 34:  50.22946948449589
mse 35:  48.8505683207589
mse 36:  50.61722946166992
mse 37:  50.46784490443356
m

In [20]:
print('mean squared errors: ', mean_squared_errors)

mean squared errors:  [146.83918245  79.69963497  66.97847155  62.96653828  61.48812669
  58.8998865   58.59511093  69.43359606  55.66672318  53.29456125
  52.226679    58.28093209  50.59000025  48.87046805  48.76021413
  48.89734832  50.74771428  49.73261921  52.41838958  48.8911796
  51.14467469  49.39246777  53.17185395  48.87644344  49.82361936
  53.18161414  55.2550478   52.03930971  48.30931096  52.08313034
  49.56491084  48.59485387  48.65457012  50.22947053  48.85057017
  50.6172296   50.46784639  50.82966265  48.87689269  49.4349342
  49.42382557  48.41162982  48.55203433  48.84395813  52.58652597
  48.95981567  48.39727508  51.70453508  49.67854363  49.64214327]


In [22]:
print("Mean : ",mean)
print("Standard Deviation : ", standard_deviation)

Mean :  54.577521522686155
Standard Deviation :  14.525859399922513
