# Keras Regression Model for Concrete Data - Part A
## André J Guardia

This notebook consists on the creation of a regression model using the Keras Library. The data can be found here: https://cocl.us/concrete_data. 


The following steps are to be followed in the creation of this model:

### Model Specs

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

### Instructions

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

We start by installing the required libraries, if they are not already installed please uncomment the cell below:

In [67]:
#!pip install numpy 
#!pip install pandas
#!pip install tensorflow
!pip install -U scikit-learn

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


We now import the libraries

In [68]:
import pandas as pd
import numpy as np

Now we must download and import the data into a pandas dataframe:

In [69]:
df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


We quickly check for the correctness of the data (no empty values)

In [70]:
df.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

### Data Split into Predictors Dataframe and Target Dataframe

It is important to note that the target variable for this model will be the concrete strength. 

In [71]:
df_col = df.columns
predictors = df[df_col[df_col != 'Strength']] # Selecting all columns except Strength
target = df['Strength'] # Selecting only the Strength column

We quickly check that the columns selected are appropriate and correct:

In [72]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [73]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

We must quickly save the value of the number of columns in the unnormalized dataset, as the regression model will require this valiue:

In [74]:
n_cols = predictors.shape[1]

## Building the Neural Network

We now focus on building the neural network using the Keras library. We start with the key imports:

In [75]:
import keras
from keras.models import Sequential
from keras.layers import Dense

We now define the regression model function:

In [76]:
def regression_model():
    
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The function above creates a regression model with one layer and 10 nodes, using the ReLU Activation function, the adam optimizer and the mean squared error loss metric

### Training and Testing Network

We must now build the model by calling the function above.

In [77]:
model = regression_model()

We must split the training and test data randomly, leaving 30% of all data for testing. A loop will be used to run through 50 iterations of the model, saving the mean squared error for every model.

In [88]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
mse = []

for k in range(0,50):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)
    model.fit(X_train, y_train, epochs=50, verbose=2)
    y_prediction = model.predict(X_test)
    mse.append(mean_squared_error(y_test, y_prediction))

Epoch 1/50
23/23 - 0s - loss: 46.9262
Epoch 2/50
23/23 - 0s - loss: 47.0877
Epoch 3/50
23/23 - 0s - loss: 46.7086
Epoch 4/50
23/23 - 0s - loss: 47.0618
Epoch 5/50
23/23 - 0s - loss: 46.7935
Epoch 6/50
23/23 - 0s - loss: 49.4172
Epoch 7/50
23/23 - 0s - loss: 46.6672
Epoch 8/50
23/23 - 0s - loss: 46.8087
Epoch 9/50
23/23 - 0s - loss: 48.8991
Epoch 10/50
23/23 - 0s - loss: 46.3622
Epoch 11/50
23/23 - 0s - loss: 46.4406
Epoch 12/50
23/23 - 0s - loss: 46.5589
Epoch 13/50
23/23 - 0s - loss: 46.5613
Epoch 14/50
23/23 - 0s - loss: 46.7127
Epoch 15/50
23/23 - 0s - loss: 47.2225
Epoch 16/50
23/23 - 0s - loss: 47.3001
Epoch 17/50
23/23 - 0s - loss: 48.0855
Epoch 18/50
23/23 - 0s - loss: 46.5933
Epoch 19/50
23/23 - 0s - loss: 46.8897
Epoch 20/50
23/23 - 0s - loss: 46.1037
Epoch 21/50
23/23 - 0s - loss: 48.7916
Epoch 22/50
23/23 - 0s - loss: 47.5209
Epoch 23/50
23/23 - 0s - loss: 48.0031
Epoch 24/50
23/23 - 0s - loss: 48.1729
Epoch 25/50
23/23 - 0s - loss: 48.6568
Epoch 26/50
23/23 - 0s - loss: 48.

# Results: 

The mean and standard deviation for the set of 50 MSE from all 60 regression models is presented below:

In [89]:
MSE = pd.DataFrame()
MSE[' '] = mse

print('The Mean MSE for the 50 trained models is:', MSE.mean())
print('The Standard Deviation of the MSE for the 50 trained models is:', MSE.std())

The Mean MSE for the 50 trained models is:      48.272245
dtype: float64
The Standard Deviation of the MSE for the 50 trained models is:      4.482024
dtype: float64


For more statistics on the MSE of these 50 models check out below:

In [90]:
MSE.describe()

Unnamed: 0,Unnamed: 1
count,50.0
mean,48.272245
std,4.482024
min,39.313979
25%,44.772157
50%,48.390026
75%,50.332825
max,63.084291
