### PART A: Build a baseline model 

#### Download and Clean Dataset

[Dataset Link](https://cocl.us/concrete_data)

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them.

The predictors(ingredients) in the data of concrete strength include:

1. **Cement**

2. **Blast Furnace Slag**

3. **Fly Ash**

4. **Water**

5. **Superplasticizer**

6. **Coarse Aggregate**

7. **Fine Aggregate**

In [1]:
# Importing Numpy and Pandas Libraries to read the dataset

import pandas as pd
import numpy as np

In [2]:
# Reading the dataset into a pandas Dataframe

concrete_data = pd.read_csv("concrete_data.csv")
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
# Checking infomation about the DataSet

print(f"Datasetshape is: {concrete_data.shape}")
print(concrete_data.isnull().sum())
concrete_data.describe()

Datasetshape is: (1030, 9)
Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


The data looks very clean and is ready to be used to build a model.

## PART A: Build a baseline model (5 marks) 

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. **Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.**

2. **Train the model on the training data using 50 epochs.**

3. **Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.**

4. **Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.**

5. **Report the mean and the standard deviation of the mean squared errors.**

Submit your Jupyter Notebook with your code and comments.

###### Split Dataset into Preditors and Target

In [4]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [5]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [6]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [7]:
n_cols = predictors.shape[1]
n_cols

8

###### Import Keras

In [8]:
import tensorflow

from tensorflow import keras

In [9]:
from keras.models import Sequential
from keras.layers import Dense

###### Build a Neural Network

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [10]:
# Define our regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

Let's import scikit-learn in order to randomly split the data into a training and test sets

In [11]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

###### Train and Test the Network

Let's call the function now to create our model.

In [12]:
# build the model

model = regression_model()

Next, we will train the model for 50 epochs.

In [13]:
# fit the model

model.fit(X_train, y_train, epochs=50, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x27c4fbc9340>

We evaluate our model on test dataset

In [14]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



834.64404296875

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn

In [15]:
from sklearn.metrics import mean_squared_error

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

834.6440590868693 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [16]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE " + str(i+1) + ": " + str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " + str(total_mean_squared_errors) + " mean squared errors without normalized data. Total number of epochs for each training is: " + str(epochs) + "\n")
print("Mean: " + str(mean))
print("Standard Deviation: " + str(standard_deviation))

MSE 1: 182.8684844970703
MSE 2: 137.76898193359375
MSE 3: 113.6091079711914
MSE 4: 133.7347869873047
MSE 5: 127.13549041748047
MSE 6: 112.77537536621094
MSE 7: 142.32391357421875
MSE 8: 113.53304290771484
MSE 9: 148.361572265625
MSE 10: 110.90178680419922
MSE 11: 102.64631652832031
MSE 12: 101.43494415283203
MSE 13: 122.32654571533203
MSE 14: 121.1536865234375
MSE 15: 114.41082763671875
MSE 16: 105.7099609375
MSE 17: 104.52316284179688
MSE 18: 95.53245544433594
MSE 19: 94.68311309814453
MSE 20: 124.23605346679688
MSE 21: 97.1854476928711
MSE 22: 115.74805450439453
MSE 23: 123.714111328125
MSE 24: 105.0172119140625
MSE 25: 110.14683532714844
MSE 26: 101.20457458496094
MSE 27: 119.10635375976562
MSE 28: 109.84941101074219
MSE 29: 110.60637664794922
MSE 30: 121.09532928466797
MSE 31: 135.5966796875
MSE 32: 115.98591613769531
MSE 33: 102.268798828125
MSE 34: 118.87645721435547
MSE 35: 124.54346466064453
MSE 36: 125.6996078491211
MSE 37: 135.93345642089844
MSE 38: 117.10539245605469
MSE 39: