# Building a Regression Model with Keras: Predicting Concrete Compressive Strength from Multivariate Data

Author: [Arnau Gómez](https://www.arnaugomez.com)

Peer-graded Assignment: Build a Regression Model in Keras

Exercise A: Build a baseline model (5 marks) 

## Initial setup

Import dependencies

In [12]:
# Python version: 3.11.9
%pip install --upgrade pip
%pip install pandas==2.2.3 keras==3.7.0 tensorflow==2.18.0 numpy==2.0.2 scikit-learn==1.6.0

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Import dependencies

In [13]:
import pandas as pd
import numpy as np
import keras

import warnings
warnings.simplefilter('ignore', FutureWarning)

Import data

In [14]:
filepath='https://cocl.us/concrete_data'
concrete_data = pd.read_csv(filepath)


concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Split data into predictors and target

In [15]:
predictors = concrete_data.drop(columns=['Strength'])
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [16]:
target = concrete_data['Strength']
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Compute number of input features

In [17]:
n_predictors = predictors.shape[1]

## Build the keras model

In [18]:
from keras.models import Sequential
from keras.layers import Dense, Input

# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Input(shape=(n_predictors,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# build the model
model = regression_model()

# display model summary
model.summary()


## Train and test the model

Train and test it once:

In [19]:
model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)

Epoch 1/50
23/23 - 0s - 21ms/step - loss: 1689.1893 - val_loss: 1274.7238
Epoch 2/50
23/23 - 0s - 2ms/step - loss: 1367.5935 - val_loss: 1028.1956
Epoch 3/50
23/23 - 0s - 2ms/step - loss: 1121.8281 - val_loss: 831.7171
Epoch 4/50
23/23 - 0s - 2ms/step - loss: 929.7089 - val_loss: 704.5690
Epoch 5/50
23/23 - 0s - 2ms/step - loss: 771.3909 - val_loss: 610.4437
Epoch 6/50
23/23 - 0s - 2ms/step - loss: 655.2761 - val_loss: 460.7195
Epoch 7/50
23/23 - 0s - 1ms/step - loss: 554.9186 - val_loss: 433.8589
Epoch 8/50
23/23 - 0s - 1ms/step - loss: 459.3813 - val_loss: 345.5811
Epoch 9/50
23/23 - 0s - 2ms/step - loss: 385.7492 - val_loss: 351.0993
Epoch 10/50
23/23 - 0s - 2ms/step - loss: 299.1420 - val_loss: 270.9643
Epoch 11/50
23/23 - 0s - 2ms/step - loss: 228.0839 - val_loss: 239.6441
Epoch 12/50
23/23 - 0s - 2ms/step - loss: 174.6632 - val_loss: 215.5782
Epoch 13/50
23/23 - 0s - 2ms/step - loss: 148.8073 - val_loss: 192.4835
Epoch 14/50
23/23 - 0s - 2ms/step - loss: 138.6071 - val_loss: 186.

<keras.src.callbacks.history.History at 0x1748b2590>

Train and test it 50 times:

In [20]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Initialize list of mean squared errors
mse_list = []

# Repeat the process 50 times
for _ in range(50):
    # Split the data
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=None)
    
    # Build the model
    model = regression_model()
    
    # Train the model
    model.fit(X_train, y_train, epochs=50, verbose=0)
    
    # Predict on the test data
    y_pred = model.predict(X_test)
    
    # Compute mean squared error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)


[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━

## Report the mean and standard deviation of the mean squared errors

In [21]:
mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)
print(f'Mean MSE: {mean_mse}')
print(f'Standard Deviation of MSE: {std_mse}')

Mean MSE: 326.64285909721934
Standard Deviation of MSE: 398.3149510918722
