Build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing.
2. Train the model on the training data using 50 epochs.
3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.
4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.
5. Report the mean and the standard deviation of the mean squared errors.

In [2]:
import numpy as np
import pandas as pd

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement
2. Blast Furnace Slag
3. Fly Ash
4. Water
5. Superplasticizer
6. Coarse Aggregate
7. Fine Aggregate
8. Age

###### Retrieving Data from external source

In [3]:
#Download data and read it into pandas dataframe
data_concrete = pd.read_csv('https://cocl.us/concrete_data')
data_concrete.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [4]:
data_concrete.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


###### Data Cleaning Process

In [5]:
data_concrete.isnull().sum() # Checking if anu null value is present for any column.Data appears to be clean and ready for modelling

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

In [6]:
#Split the data into predictors and target
data_concrete_columns = data_concrete.columns

predictors = data_concrete[data_concrete_columns[data_concrete_columns != 'Strength']] # all columns except Strength
target = data_concrete['Strength'] # Strength column


In [7]:
#Saving the number of columns to use in the model
n_cols=predictors.shape[1]


In [8]:
#import the packages from the Keras library that is needed to build our regressoin model.
import keras
from keras.models import Sequential
from keras.layers import Dense

#Importing sklearn libraries for evaluating the model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

from sklearn.model_selection import train_test_split





###### Build a Neural Network
Hidden layer -1
Nodes -10
Activation function - ReLU
Optimizer - adam
Loss function - mean squared error

In [9]:
mse_A = []

def regression_model():
    
    #Randomly split the data into a training and test sets by holding 30% of the data for testing.

    xtrain,xtest,ytrain,ytest=train_test_split(predictors,target,test_size=0.3,random_state=48)
# create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,))) # One hidden layer with 10 nodes    
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    #fit the model
    model.fit(xtrain, ytrain, epochs=50, verbose=0)

    #predict output on test set
    ypred = model.predict(xtest)
    
    mse_A.append(mean_squared_error(ytest, ypred))
   # r2_A.append(r2_score(ytest, ypred))

Train and Test the network

In [10]:
for i in range(50):
    regression_model()






In [11]:
# Convert Float String List to Float Values
# Using float() + list comprehension
res = [float(ele) for ele in mse_A]

print("Mean Square Errors in 50 iterations:",str(res))
#print('mse_StdDev: {:.2f}'.format(mse_A))

Mean Square Errors in 50 iterations: [110.22171476690248, 106.4343698395193, 187.27090374125777, 787.753486386086, 119.64424828235562, 126.33565668468269, 105.91964109176035, 417.60892954814165, 2245.8019165048513, 224.97863183413776, 151.52514934259838, 113.14818978380707, 102.9524170083629, 266.4310357123616, 938.6453360425038, 266.05003488037573, 178.23567498687873, 119.05951381206422, 396.6946947070694, 122.18535367545171, 1599.4841702967053, 281.8785395633696, 105.42685555376404, 274.122310660639, 110.98175739600232, 96.8399830527336, 172.00195302086644, 1546.0809666901932, 161.42541674981075, 478.25697325332925, 243.64687320413842, 705.8004274383064, 125.72041683340251, 131.42980114636973, 110.09569535277807, 1451.6875405210744, 118.9533983973538, 120.52726877562203, 379.7011728515682, 73.69413356008675, 96.32452342266936, 118.67180801654608, 128.83318015286085, 121.90318980431196, 125.9197101456137, 439.24306496671926, 153.17135062292576, 225.9604741498732, 230.59930331766768, 1

In [12]:
print('mse_Mean: {:.2f}'.format(np.mean(mse_A)))
print('mse_StdDev: {:.2f}'.format(np.std(mse_A)))

mse_Mean: 343.09
mse_StdDev: 449.68
