### PART B: Normalize the data 

#### Download and Clean Dataset

[Dataset Link](https://cocl.us/concrete_data)

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them.

The predictors(ingredients) in the data of concrete strength include:

1. **Cement**

2. **Blast Furnace Slag**

3. **Fly Ash**

4. **Water**

5. **Superplasticizer**

6. **Coarse Aggregate**

7. **Fine Aggregate**

In [1]:
# Importing Numpy and Pandas Libraries to read the dataset

import pandas as pd
import numpy as np

In [2]:
# Reading the dataset into a pandas Dataframe

concrete_data = pd.read_csv("concrete_data.csv")
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
# Checking infomation about the DataSet

print(f"Datasetshape is: {concrete_data.shape}")
print(concrete_data.isnull().sum())
concrete_data.describe()

Datasetshape is: (1030, 9)
Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


The data looks very clean and is ready to be used to build model

###### Split Dataset into Preditors and Target

In [4]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [5]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [6]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

### PART B: Normalize the data (5 marks) 

- **Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.**

**How does the mean of the mean squared errors compare to that from Step A?**

###### Normalised version of data

In [7]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [8]:
n_cols = predictors_norm.shape[1] # number of predictors
n_cols

8

###### Import Keras

In [9]:
import tensorflow

from tensorflow import keras

from keras.models import Sequential
from keras.layers import Dense

###### Build a Neural Network

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [10]:
# Define our regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

Let's import scikit-learn in order to randomly split the data into a training and test sets



In [11]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

###### Train and Test the Network

Let's call the function now to create our model.

In [12]:
# build the model

model = regression_model()

Next, we will train the model for 50 epochs

In [13]:
# fit the model

model.fit(X_train, y_train, epochs=50, verbose=2)

Epoch 1/50
23/23 - 2s - loss: 1571.2642 - 2s/epoch - 77ms/step
Epoch 2/50
23/23 - 0s - loss: 1555.2582 - 77ms/epoch - 3ms/step
Epoch 3/50
23/23 - 0s - loss: 1539.0620 - 66ms/epoch - 3ms/step
Epoch 4/50
23/23 - 0s - loss: 1522.6378 - 66ms/epoch - 3ms/step
Epoch 5/50
23/23 - 0s - loss: 1505.3832 - 68ms/epoch - 3ms/step
Epoch 6/50
23/23 - 0s - loss: 1487.7897 - 62ms/epoch - 3ms/step
Epoch 7/50
23/23 - 0s - loss: 1469.0372 - 95ms/epoch - 4ms/step
Epoch 8/50
23/23 - 0s - loss: 1449.6458 - 86ms/epoch - 4ms/step
Epoch 9/50
23/23 - 0s - loss: 1429.0688 - 89ms/epoch - 4ms/step
Epoch 10/50
23/23 - 0s - loss: 1407.0046 - 82ms/epoch - 4ms/step
Epoch 11/50
23/23 - 0s - loss: 1384.1653 - 85ms/epoch - 4ms/step
Epoch 12/50
23/23 - 0s - loss: 1359.7056 - 83ms/epoch - 4ms/step
Epoch 13/50
23/23 - 0s - loss: 1334.5549 - 70ms/epoch - 3ms/step
Epoch 14/50
23/23 - 0s - loss: 1307.5354 - 67ms/epoch - 3ms/step
Epoch 15/50
23/23 - 0s - loss: 1280.0247 - 69ms/epoch - 3ms/step
Epoch 16/50
23/23 - 0s - loss: 1251

<keras.callbacks.History at 0x2763c642190>

Next we need to evaluate the model on the test data.



In [14]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val




284.0368957519531

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [15]:
from sklearn.metrics import mean_squared_error

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

284.03691914663034 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [16]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE " + str(i+1) + ": " + str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " + str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " + str(epochs) + "\n")
print("Mean: " + str(mean))
print("Standard Deviation: " + str(standard_deviation))

MSE 1: 144.49273681640625
MSE 2: 146.42945861816406
MSE 3: 103.2083740234375
MSE 4: 95.10758209228516
MSE 5: 87.09583282470703
MSE 6: 87.16259002685547
MSE 7: 88.88221740722656
MSE 8: 70.29978942871094
MSE 9: 76.50054931640625
MSE 10: 67.99784088134766
MSE 11: 70.08578491210938
MSE 12: 64.58624267578125
MSE 13: 73.62296295166016
MSE 14: 73.44638061523438
MSE 15: 56.86621856689453
MSE 16: 49.011985778808594
MSE 17: 50.75994110107422
MSE 18: 51.189029693603516
MSE 19: 44.312129974365234
MSE 20: 46.37179946899414
MSE 21: 41.09765625
MSE 22: 42.93537139892578
MSE 23: 39.371761322021484
MSE 24: 41.631385803222656
MSE 25: 43.61747741699219
MSE 26: 42.67311477661133
MSE 27: 41.09260559082031
MSE 28: 40.86280059814453
MSE 29: 46.31666946411133
MSE 30: 43.631805419921875
MSE 31: 42.321372985839844
MSE 32: 39.152557373046875
MSE 33: 38.008846282958984
MSE 34: 44.056907653808594
MSE 35: 41.40340805053711
MSE 36: 47.357669830322266
MSE 37: 42.862422943115234
MSE 38: 45.564453125
MSE 39: 42.9363784

###### Inference

- The mean of the mean squared errors decrease compared to that from Step A
- Normalizing the input data can be used to tune the model and get better results