# Building a Regression Model with Keras: Predicting Concrete Compressive Strength from Multivariate Data

Author: [Arnau Gómez](https://www.arnaugomez.com)

Peer-graded Assignment: Build a Regression Model in Keras

Exercise C: Increate the number of epochs (5 marks) 

## Initial setup

Import dependencies

In [1]:
# Python version: 3.11.9
%pip install --upgrade pip
%pip install pandas==2.2.3 keras==3.7.0 tensorflow==2.18.0 numpy==2.0.2 scikit-learn==1.6.0

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Import dependencies

In [2]:
import pandas as pd
import numpy as np
import keras

import warnings
warnings.simplefilter('ignore', FutureWarning)

Import data

In [3]:
filepath='https://cocl.us/concrete_data'
concrete_data = pd.read_csv(filepath)


concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Split data into predictors and target

In [4]:
predictors = concrete_data.drop(columns=['Strength'])
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [5]:
target = concrete_data['Strength']
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalize the data by substracting the mean and dividing by the standard deviation.

In [6]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Compute number of input features

In [7]:
n_predictors = predictors.shape[1]

## Build the keras model

In [8]:
from keras.models import Sequential
from keras.layers import Dense, Input

# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Input(shape=(n_predictors,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# build the model
model = regression_model()

# display model summary
model.summary()


## Train and test the model

Train and test it once:

In [9]:
model.fit(predictors_norm, target, validation_split=0.3, epochs=100, verbose=2)

Epoch 1/100
23/23 - 0s - 13ms/step - loss: 1705.5961 - val_loss: 1248.2408
Epoch 2/100
23/23 - 0s - 2ms/step - loss: 1689.5896 - val_loss: 1237.0475
Epoch 3/100
23/23 - 0s - 1ms/step - loss: 1673.6882 - val_loss: 1225.8470
Epoch 4/100
23/23 - 0s - 1ms/step - loss: 1656.9634 - val_loss: 1214.5431
Epoch 5/100
23/23 - 0s - 1ms/step - loss: 1639.6401 - val_loss: 1202.9633
Epoch 6/100
23/23 - 0s - 1ms/step - loss: 1621.1881 - val_loss: 1190.8662
Epoch 7/100
23/23 - 0s - 1ms/step - loss: 1601.4955 - val_loss: 1178.4128
Epoch 8/100
23/23 - 0s - 1ms/step - loss: 1580.3988 - val_loss: 1165.5964
Epoch 9/100
23/23 - 0s - 1ms/step - loss: 1558.0848 - val_loss: 1151.9738
Epoch 10/100
23/23 - 0s - 1ms/step - loss: 1533.9250 - val_loss: 1137.9211
Epoch 11/100
23/23 - 0s - 1ms/step - loss: 1508.2274 - val_loss: 1123.2195
Epoch 12/100
23/23 - 0s - 1ms/step - loss: 1481.0731 - val_loss: 1107.8225
Epoch 13/100
23/23 - 0s - 1ms/step - loss: 1452.1056 - val_loss: 1092.0026
Epoch 14/100
23/23 - 0s - 1ms/ste

<keras.src.callbacks.history.History at 0x11ee4d2d0>

Train and test it 50 times:

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared error

# Initialize list of mean squared errors
mse_list = []

# Repeat the process 50 times
for _ in range(50):
    # Split the data
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=None)
    
    # Build the model
    model = regression_model()
    
    # Train the model
    model.fit(X_train, y_train, epochs=100, verbose=0)
    
    # Predict on the test data
    y_pred = model.predict(X_test, verbose=0)
    
    # Compute mean squared error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)

print(mse_list)


[164.13018576548336, 163.93970739075806, 157.10082952076482, 155.07162197607636, 185.60367781516922, 143.7784289015595, 176.84046657604068, 190.38762596428705, 170.8662229557219, 170.91157760917704, 167.83522408905884, 164.16086619850032, 151.76301683874445, 159.5327550331351, 195.57727032620355, 132.5662930771953, 163.11123620145975, 153.25316402425068, 154.64237170441078, 168.81259921192932, 149.18104048886528, 211.31463842097386, 148.1868874141703, 172.66507463901286, 159.58744003542145, 181.13559442411506, 160.7115859319323, 159.83924537808912, 168.41776633599451, 187.04354855777942, 160.86929135947338, 153.20664925291024, 149.48695020193918, 201.72721597734116, 186.43801064558298, 176.62769853205842, 165.5836322649659, 177.03049537556245, 163.91888443183913, 157.2735102023235, 176.05126771636054, 170.72255176201477, 155.9165159258511, 157.95646334140858, 160.96676210129075, 169.32413941279273, 166.35435983685073, 175.64552367278552, 155.199964632281, 164.9722532519587]


## Report the mean and standard deviation of the mean squared errors

In [11]:
mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)
print(f'Mean MSE: {mean_mse}')
print(f'Standard Deviation of MSE: {std_mse}')

Mean MSE: 166.6648020540774
Standard Deviation of MSE: 14.741742504554889


## How does the mean of the mean squared errors compare to that from Step B?

The mean of the mean squared errors (MSE) from the current exercise is lower than the mean MSE from Step B. This indicates that increasing the number of epochs from 50 to 100 has further improved the model's performance. Additionally, the standard deviation of the MSE is slightly lower, suggesting that the model's performance is more consistent across different splits of the data.

| Metric | Step A | Step B | Step C |
|--------|--------|--------|------------------|
| Mean MSE | 416.851 | 349.821 | 166.665 |
| Standard Deviation of MSE | 595.123 | 90.653 | 14.742 |
