### PART C: Incrase the number of epochs

#### Download and Clean Dataset

[Dataset Link](https://cocl.us/concrete_data)

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them.

The predictors(ingredients) in the data of concrete strength include:

1. **Cement**

2. **Blast Furnace Slag**

3. **Fly Ash**

4. **Water**

5. **Superplasticizer**

6. **Coarse Aggregate**

7. **Fine Aggregate**

In [1]:
# Importing Numpy and Pandas Libraries to read the dataset

import pandas as pd
import numpy as np

In [2]:
# Reading the dataset into a pandas Dataframe

concrete_data = pd.read_csv("concrete_data.csv")
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
# Checking infomation about the DataSet

print(f"Datasetshape is: {concrete_data.shape}")
print(concrete_data.isnull().sum())
concrete_data.describe()

Datasetshape is: (1030, 9)
Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


The data looks very clean and is ready to be used to build model


###### Split Dataset into Preditors and Target

In [5]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [6]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

###### Normalised Version of Data

In [7]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [8]:
n_cols = predictors_norm.shape[1]
n_cols

8

###### Import Keras

In [10]:
import tensorflow

from tensorflow import keras

from keras.models import Sequential
from keras.layers import Dense

In [11]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)


###### Train and Test Network

In [13]:
# Build the model

model = regression_model()

## PART C: Incrase the number of epochs (5 marks)

- **Repeat Part B but use 100 epochs this time for training.**

**How does the mean of the mean squared errors compare to that from Step B?**

In [14]:
# fit the model

model.fit(X_train, y_train, epochs=100, verbose=2)

Epoch 1/100
23/23 - 1s - loss: 1549.2274 - 1s/epoch - 47ms/step
Epoch 2/100
23/23 - 0s - loss: 1531.5012 - 55ms/epoch - 2ms/step
Epoch 3/100
23/23 - 0s - loss: 1513.5525 - 57ms/epoch - 2ms/step
Epoch 4/100
23/23 - 0s - loss: 1494.7628 - 57ms/epoch - 2ms/step
Epoch 5/100
23/23 - 0s - loss: 1475.5527 - 57ms/epoch - 2ms/step
Epoch 6/100
23/23 - 0s - loss: 1455.0990 - 61ms/epoch - 3ms/step
Epoch 7/100
23/23 - 0s - loss: 1434.2142 - 66ms/epoch - 3ms/step
Epoch 8/100
23/23 - 0s - loss: 1412.1907 - 76ms/epoch - 3ms/step
Epoch 9/100
23/23 - 0s - loss: 1389.6721 - 68ms/epoch - 3ms/step
Epoch 10/100
23/23 - 0s - loss: 1365.7859 - 62ms/epoch - 3ms/step
Epoch 11/100
23/23 - 0s - loss: 1341.0283 - 56ms/epoch - 2ms/step
Epoch 12/100
23/23 - 0s - loss: 1315.4617 - 48ms/epoch - 2ms/step
Epoch 13/100
23/23 - 0s - loss: 1288.6793 - 62ms/epoch - 3ms/step
Epoch 14/100
23/23 - 0s - loss: 1260.9147 - 52ms/epoch - 2ms/step
Epoch 15/100
23/23 - 0s - loss: 1232.4102 - 55ms/epoch - 2ms/step
Epoch 16/100
23/23 -

<keras.callbacks.History at 0x1a67c0b3dc0>

Next, we need to evaluate the model on the test data.

In [15]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



154.64654541015625

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [16]:
from sklearn.metrics import mean_squared_error

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)


154.64654947767298 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.



In [17]:
total_mean_squared_errors = 50
epochs = 100
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 73.36727905273438
MSE 2: 69.91366577148438
MSE 3: 40.832733154296875
MSE 4: 42.681121826171875
MSE 5: 43.94369888305664
MSE 6: 44.93804168701172
MSE 7: 45.662193298339844
MSE 8: 34.23462677001953
MSE 9: 38.76713943481445
MSE 10: 38.632415771484375
MSE 11: 37.342342376708984
MSE 12: 35.85148239135742
MSE 13: 41.809329986572266
MSE 14: 43.283809661865234
MSE 15: 33.97501754760742
MSE 16: 30.650592803955078
MSE 17: 36.559059143066406
MSE 18: 35.90058135986328
MSE 19: 36.24504089355469
MSE 20: 35.655303955078125
MSE 21: 35.79656982421875
MSE 22: 30.951562881469727
MSE 23: 31.730566024780273
MSE 24: 30.43558120727539
MSE 25: 33.01444625854492
MSE 26: 34.875396728515625
MSE 27: 31.06159210205078
MSE 28: 32.93156051635742
MSE 29: 35.76521301269531
MSE 30: 34.47502136230469
MSE 31: 29.951499938964844
MSE 32: 29.04045295715332
MSE 33: 29.560293197631836
MSE 34: 31.33584976196289
MSE 35: 33.22304153442383
MSE 36: 39.587284088134766
MSE 37: 27.794214248657227
MSE 38: 33.91962432861328
MSE 

###### Inference

- The mean of the mean squared errors is close to that found in part B (50 epochs)
- Increasing the number of epoch doesn't lead to better results