### Regression Model In Keras Part B


First, the appropriate libraries are imported, i.e. Pandas and NumPy.

In [1]:
import pandas as pd #This code imports pandas libraries
import numpy as np #This code imports NumPy libraries

Next, we load the csv file using the pandas library into a dataframe.

In [2]:
c_data = pd.read_csv('https://cocl.us/concrete_data') #This code reads the data from the csv file into a pandas dataframe
c_data.head() #We then display the first 5 rows of data

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


We then check the shape of the dataframe.

In [3]:
c_data.shape

(1030, 9)


we determine if the data is erroneous.

In [4]:
c_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
c_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

From this, we can see that the column 'Strength' is dependent on the values on the other columns. Therefore we split the dataframe into two: the target (strength)
and the predictors (the other columns).

In [6]:
c_data_columns = c_data.columns
predictors = c_data[c_data_columns[c_data_columns != 'Strength']] # creates pandas dataframe for all columns except Strength
target = c_data['Strength'] # Strength column in the data frame
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


We can then see the first 5 values of the predictor

In [7]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## Normalize the data
We subtract the mean of the predictors from the predictors and divide by the standard deviation.

In [8]:
predictors = (predictors-predictors.mean())/predictors.std()
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Additionally, we must save the number of predictors in such a manner as it is necessitated by the following function.

In [9]:
n_cols = predictors.shape[1] # number of predictors

Next We import Keras and all the requisite packages.

In [10]:
import keras
from keras.models import Sequential
from keras.layers import Dense

## Regression Model

We then make our regression model

In [11]:
# define regression model
def regression_model():
    # create model
    model = Sequential()#Creates a sequnetial model
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))# creates 10 nodes using the Relu function as an activation function
    model.add(Dense(1)) #The output layer
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error') #The optimizer used was adam and mean squared error is used as the loss in back propagation.
    return model

## Training and Testing
We now implement a test/train split to first train the data then test it.

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(predictors,  target, test_size=0.3, random_state = 4)

The model is then called and trained for 50 epochs.

In [13]:
R_model = regression_model()
R_model.fit(X_train, y_train, epochs=50, verbose=2)

Epoch 1/50
23/23 - 0s - loss: 1618.0070 - 421ms/epoch - 18ms/step
Epoch 2/50
23/23 - 0s - loss: 1602.9303 - 26ms/epoch - 1ms/step
Epoch 3/50
23/23 - 0s - loss: 1588.1389 - 32ms/epoch - 1ms/step
Epoch 4/50
23/23 - 0s - loss: 1573.2458 - 32ms/epoch - 1ms/step
Epoch 5/50
23/23 - 0s - loss: 1558.2402 - 31ms/epoch - 1ms/step
Epoch 6/50
23/23 - 0s - loss: 1542.6643 - 141ms/epoch - 6ms/step
Epoch 7/50
23/23 - 0s - loss: 1526.5453 - 25ms/epoch - 1ms/step
Epoch 8/50
23/23 - 0s - loss: 1509.7190 - 31ms/epoch - 1ms/step
Epoch 9/50
23/23 - 0s - loss: 1492.0255 - 31ms/epoch - 1ms/step
Epoch 10/50
23/23 - 0s - loss: 1473.6003 - 31ms/epoch - 1ms/step
Epoch 11/50
23/23 - 0s - loss: 1454.1224 - 34ms/epoch - 1ms/step
Epoch 12/50
23/23 - 0s - loss: 1433.7861 - 30ms/epoch - 1ms/step
Epoch 13/50
23/23 - 0s - loss: 1412.8386 - 32ms/epoch - 1ms/step
Epoch 14/50
23/23 - 0s - loss: 1391.0161 - 33ms/epoch - 1ms/step
Epoch 15/50
23/23 - 0s - loss: 1368.4532 - 30ms/epoch - 1ms/step
Epoch 16/50
23/23 - 0s - loss: 

<keras.callbacks.History at 0x222343ee7a0>

## Evaluation
Finally, we evaluate the model. To do this we use the mean squared error.

In [14]:
from sklearn.metrics import mean_squared_error
loss_value = R_model.evaluate(X_test, y_test)
y_predicted = R_model.predict(X_test)
MSE = mean_squared_error(y_test, y_predicted)
std_dev = np.std(MSE)
mean = np.mean(MSE)

print("The mean was found to be:", mean)
print("The standard deviation was found to be:", std_dev)

The mean was found to be: 392.1530619136785
The standard deviation was found to be: 0.0


We then determine the mean and standard deviation of the mean squared error for 50 iterations.

In [15]:
Sum_MSE = 50
MSE_List= []
for n in range(0, Sum_MSE):
    X_train, X_test, y_train, y_test = train_test_split(predictors,  target, test_size=0.3, random_state = 4)
    R_model.fit(X_train, y_train, epochs=50, verbose=0)
    MSE = R_model.evaluate(X_test, y_test, verbose=0)
    print("Mean Squared Error",str(n+1),":",MSE)
    y_predicted = R_model.predict(X_test)
    MSE = mean_squared_error(y_test, y_predicted)
    MSE_List.append(MSE)

MSE_List = np.array(MSE_List)
mean = np.mean(MSE_List)
std_dev = np.std(MSE_List)


print("The mean of the mean squared errors were determined to be:", mean)
print("The standard deviation of the mean squared errors were determined to be", std_dev)
    

Mean Squared Error 1 : 188.77041625976562
Mean Squared Error 2 : 140.55567932128906
Mean Squared Error 3 : 107.03218841552734
Mean Squared Error 4 : 85.83992004394531
Mean Squared Error 5 : 67.27242279052734
Mean Squared Error 6 : 55.853736877441406
Mean Squared Error 7 : 50.62457275390625
Mean Squared Error 8 : 47.9051628112793
Mean Squared Error 9 : 46.195045471191406
Mean Squared Error 10 : 44.887996673583984
Mean Squared Error 11 : 44.192684173583984
Mean Squared Error 12 : 43.55779266357422
Mean Squared Error 13 : 43.20327377319336
Mean Squared Error 14 : 42.79388427734375
Mean Squared Error 15 : 42.20158767700195
Mean Squared Error 16 : 41.86423873901367
Mean Squared Error 17 : 41.60579299926758
Mean Squared Error 18 : 41.281402587890625
Mean Squared Error 19 : 40.9407958984375
Mean Squared Error 20 : 40.50836944580078
Mean Squared Error 21 : 40.44917678833008
Mean Squared Error 22 : 39.9564323425293
Mean Squared Error 23 : 39.79179763793945
Mean Squared Error 24 : 39.57030487060

## Conclusion
The mean in part B was lower than that of part A and the standard deviation was lightly higher indicating a slightly reduced precision and greater accuracy.