# Building a Regression Model with Keras: Predicting Concrete Compressive Strength from Multivariate Data

Author: [Arnau Gómez](https://www.arnaugomez.com)

Peer-graded Assignment: Build a Regression Model in Keras

Exercise B: Normalize the data (5 marks) 

## Initial setup

Import dependencies

In [1]:
# Python version: 3.11.9
%pip install --upgrade pip
%pip install pandas==2.2.3 keras==3.7.0 tensorflow==2.18.0 numpy==2.0.2 scikit-learn==1.6.0

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Import dependencies

In [2]:
import pandas as pd
import numpy as np
import keras

import warnings
warnings.simplefilter('ignore', FutureWarning)

Import data

In [3]:
filepath='https://cocl.us/concrete_data'
concrete_data = pd.read_csv(filepath)


concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Split data into predictors and target

In [4]:
predictors = concrete_data.drop(columns=['Strength'])
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [5]:
target = concrete_data['Strength']
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalize the data by substracting the mean and dividing by the standard deviation.

In [6]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Compute number of input features

In [7]:
n_predictors = predictors.shape[1]

## Build the keras model

In [8]:
from keras.models import Sequential
from keras.layers import Dense, Input

# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Input(shape=(n_predictors,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# build the model
model = regression_model()

# display model summary
model.summary()


## Train and test the model

Train and test it once:

In [9]:
model.fit(predictors_norm, target, validation_split=0.3, epochs=50, verbose=2)

Epoch 1/50
23/23 - 0s - 13ms/step - loss: 1731.4468 - val_loss: 1255.0121
Epoch 2/50
23/23 - 0s - 2ms/step - loss: 1713.4030 - val_loss: 1245.4978
Epoch 3/50
23/23 - 0s - 2ms/step - loss: 1695.7999 - val_loss: 1236.3209
Epoch 4/50
23/23 - 0s - 2ms/step - loss: 1678.5880 - val_loss: 1227.5881
Epoch 5/50
23/23 - 0s - 2ms/step - loss: 1661.3440 - val_loss: 1218.7772
Epoch 6/50
23/23 - 0s - 2ms/step - loss: 1643.9304 - val_loss: 1209.7920
Epoch 7/50
23/23 - 0s - 2ms/step - loss: 1625.6577 - val_loss: 1200.9196
Epoch 8/50
23/23 - 0s - 2ms/step - loss: 1607.2524 - val_loss: 1191.6821
Epoch 9/50
23/23 - 0s - 2ms/step - loss: 1587.4323 - val_loss: 1182.2162
Epoch 10/50
23/23 - 0s - 2ms/step - loss: 1567.3054 - val_loss: 1172.1654
Epoch 11/50
23/23 - 0s - 2ms/step - loss: 1545.6317 - val_loss: 1161.7627
Epoch 12/50
23/23 - 0s - 2ms/step - loss: 1522.5461 - val_loss: 1150.8761
Epoch 13/50
23/23 - 0s - 2ms/step - loss: 1498.4250 - val_loss: 1139.5253
Epoch 14/50
23/23 - 0s - 2ms/step - loss: 1472

<keras.src.callbacks.history.History at 0x15ecd2610>

Train and test it 50 times:

In [10]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Initialize list of mean squared errors
mse_list = []

# Repeat the process 50 times
for _ in range(50):
    # Split the data
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=None)
    
    # Build the model
    model = regression_model()
    
    # Train the model
    model.fit(X_train, y_train, epochs=50, verbose=0)
    
    # Predict on the test data
    y_pred = model.predict(X_test, verbose=0)
    
    # Compute mean squared error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)

print(mse_list)


[447.5086131574025, 268.57870797416757, 413.47829442915514, 281.7059535602799, 223.31971090343646, 405.57259836145545, 215.86714270893117, 282.6000935739751, 231.72949247610555, 298.03126274329054, 315.7538921783185, 390.3782890435606, 380.70186992669085, 305.33268885549563, 326.0161860037992, 404.52231443784916, 261.7712672920805, 314.99541707397816, 309.9796269918049, 436.8202400016809, 357.12161168047436, 241.5973463606945, 279.2813266923387, 290.8744544948272, 474.4292765435885, 432.9668269460551, 383.65210793438007, 349.71235086309315, 368.1564935304093, 350.79736250177785, 253.43575618032733, 317.4752008809047, 331.19631735314266, 698.9278068021634, 556.5946791358642, 343.4763889519286, 426.2921580416793, 247.72409507556492, 294.5180114819561, 347.53156504453347, 339.2646999027009, 406.1650795620335, 252.0881896978829, 331.24772744116257, 502.29230962110034, 377.1370405226866, 363.96918539278687, 370.2253406995328, 453.11271874368674, 235.12424510107078]


## Report the mean and standard deviation of the mean squared errors

In [11]:
mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)
print(f'Mean MSE: {mean_mse}')
print(f'Standard Deviation of MSE: {std_mse}')

Mean MSE: 349.82102669747604
Standard Deviation of MSE: 90.6532358242123


## How does the mean of the mean squared errors compare to that from Step A?

The mean of the mean squared errors (MSE) from Step B is lower than the mean MSE from Step A. This indicates that the model's performance has improved with the normalization of the predictor values. Additionally, the standard deviation of the MSE is significantly lower, suggesting that the model's performance is more consistent across different splits of the data.

| Metric | Step A | Step B |
|--------|--------|-------------------|
| Mean MSE | 416.851 | 349.821 |
| Standard Deviation of MSE | 595.123 | 90.653 |
