## Keras Regression Model


A. Build a baseline model (5 marks)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the
train_test_split
helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.



Ensure the <em>pandas</em> and the Numpy libraries are available to run this notebook



In [1]:
# Uncomment if your environment does not have the necessary libraries

#!pip install numpy==1.21.4
#!pip install pandas==1.3.4
#!pip install keras==2.1.6
import ssl
import pandas as pd
import numpy as np
from keras.src.utils.module_utils import tensorflow
from sklearn.preprocessing import StandardScaler
from sympy.physics.quantum.gate import normalized

#print(ssl.get_default_verify_paths())



### Ignore future and user warnings in my PyCharm environment

#### Feel free to ignore if running in a different environment.

In [2]:
import warnings
# Suppress UserWarning messages
warnings.filterwarnings("ignore", category=UserWarning)

# Suppress FutureWarning messages
warnings.filterwarnings("ignore", category=FutureWarning)

This assignment uses the concrete data set that was used during one of the labs.

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>


Read the data file <em>pandas</em> dataframe and run some commands to understand the data.


In [3]:
# Ensure concrete data is available in the local directory where you are running this notebook or set the path accordingly.
# I had issues trying to access the file from the S3 Bucket so I saved it locally.
local = True
if local:
    filepath = 'concrete_data.csv'
else:
    filepath = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv'

concrete_data = pd.read_csv(filepath)

concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.


#### Let's check how many data points we have.


In [4]:
concrete_data.shape

(1030, 9)

There are approximately 1000 samples to train our model on. Because of relatively limited data set we have to be careful not to overfit the training data.


In [5]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [6]:
# Check the dataset for any missing values.

concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data appears clean and is ready to be used to build our model. We will now methodically go through steps 1 - 5 defined above.


#### 1. Randomly split the data into a training and test sets by holding 30% of the data for testing. Use the train_test_split helper function from Scikit-learn.


In [7]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import StandardScaler
# Load dataset
df = pd.read_csv("concrete_data.csv")

# Split into input (X) and output (y)
X = df.drop(columns=["Strength"])  # Features
y = df["Strength"]  # Target variable

# Split data (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 4)

#### 2. Train the model on the training data using 50 epochs.


In [8]:
# Define the neural network model
model = Sequential([
            Dense(10, activation='relu', input_shape=(X_train.shape[1],)),  # 10-node hidden layer
            Dense(1)  # Output layer
        ])

# Compile model
model.compile(optimizer=Adam(), loss='mean_squared_error')

# Train model (suppress verbose for faster execution)
model.fit(X_train, y_train, epochs = 50, verbose = 0)


<keras.src.callbacks.history.History at 0x1462c3810>

#### 3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.


In [9]:

# Make predictions
y_pred = model.predict(X_test).flatten()

# Compute MSE 
mse = mean_squared_error(y_test, y_pred)

# Print results
print("Mean Squared Error: {0:.2f}".format(mse))

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
Mean Squared Error: 158.95


#### 4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

The number of repeats and number of epochs serve different purposes in training and evaluating the neural network. Below is a breif discussion on the relationship between the number of repeats and the number of epochs.

Number of Repeats (num_repeats = 50)
This refers to how many times the entire training and evaluation process is repeated.
Each repeat involves:
Splitting the data into training (70%) and testing (30%).
Creating a new neural network model from scratch.
Training the model on the training set.
Evaluating the model on the test set and recording the Mean Squared Error (MSE).
Since we repeat this 50 times, we get 50 different MSE values, which helps measure the model's stability and performance across different train-test splits.

Purpose: To evaluate how the model performs across multiple random train-test splits.

Number of Epochs (epochs=50)
An epoch is one complete pass through the entire training dataset.
During each epoch, the model:
Takes all training samples.
Performs forward and backward passes to adjust weights.
Updates model parameters using the optimizer.
Since we train for 50 epochs, the model sees the training data 50 times before evaluation.

Purpose: To allow the model to learn by adjusting weights over multiple passes through the data.

The code above is reused, but it is extended to handle a variety of scenarios (e.g. different number of repeats, change to the Keras model, and type of data (i.e. raw or normalized).


In [10]:
# Load dataset
df = pd.read_csv("concrete_data.csv")

# Split into input (X) and output (y)
X = df.drop(columns=["Strength"])  # Features
y = df["Strength"]  # Target variable

In [11]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import tensorflow as tf

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

def runModel(X, y, num_repeats, num_epochs, num_hidden_layers):
    mse_list = []  # Store MSE values
    for i in range(num_repeats):
        # Split data (70% training, 30% testing)
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 5)

        # Define the neural network model
        model = Sequential()

        # Add the first layer with input shape
        model.add(Dense(10, activation = 'relu', input_shape = (X_train.shape[1],)))

        # Add additional hidden layers
        for _ in range(num_hidden_layers - 1):
            model.add(Dense(10, activation = 'relu'))

        # Add the output layer
        model.add(Dense(1))

        # Compile model
        model.compile(optimizer = Adam(), loss = 'mean_squared_error')

        # Train model (suppress verbose for faster execution)
        model.fit(X_train, y_train, epochs = num_epochs, verbose = 0)

        # Make predictions
        y_pred = model.predict(X_test).flatten()

        # Compute MSE and store it
        mse = mean_squared_error(y_test, y_pred)
        mse_list.append(mse)

    # Compute mean and standard deviation of MSEs
    mse_mean = np.mean(mse_list)
    mse_std = np.std(mse_list)

    return mse_list, mse_mean, mse_std

#### 5. Report the mean and the standard deviation of the mean squared errors


In [12]:
# Evaluate the results

# Input Parameters
num_repeats = 50
num_epochs = 50
num_hidden_layers = 1

# Call the function and capture the results
mse_list, mse_mean, mse_std = runModel(X, y, num_repeats, num_epochs, num_hidden_layers)

# Print the results
print("MSE List:", mse_list)
print("Mean MSE: {:.2f}".format(mse_mean))
print("Standard Deviation of MSE: {:.2f}".format(mse_std))

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━

B. Normalize the data (5 marks)

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

In [183]:
'''The expression below normalizes the data by subtracting the mean of each feature (predictor) and dividing by the standard deviation of 
each feature. This process is known as z-score normalization or standardization. Note: X_scaler = StandardScaler().fit(X).transform(X) 
also accomplishes the same result although there could be very small differences in the determined values due to floating point precision.'''
X_normalized = (X - np.mean(X, axis=0)) / np.std(X, axis=0, ddof=1)
# Input Parameters
num_repeats = 50
num_epochs = 50
num_hidden_layers = 1

# Now we can leverage the code written above to process the normalized data.
mse_list, mse_mean, mse_std = runModel(X_normalized, y, num_repeats, num_epochs, num_hidden_layers)

# Print the results
#print("MSE List:", mse_list)
print("Mean MSE: {:.2f}".format(mse_mean))
print("Standard Deviation of MSE: {:.2f}".format(mse_std))

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━

How does the mean of the mean squared errors compare to that from Step A?

I ran the model using the normal data and also using a normalized version of the data and obtained the following results:

Input data: Mean MSE: 442.48 Standard Deviation of MSE: 624.46
Normalized data: Mean MSE: 383.18 Standard Deviation of MSE: 102.62

The standard deviation of the MSE is significantlylower for the normalized data which is an indicator that the model's performance is more consistent across different runs when using normalized data. Also, the mean MSE is also lower for the normalized data compared to the raw input data which suggests that normalization led to improved model performance in terms of the mean MSE.


C. Increase the number of epochs

Repeat Part B but use 100 epochs this time for training.

How does the mean of the mean squared errors compare to that from Step B?



In [184]:
X_normalized = (X - np.mean(X, axis=0)) / np.std(X, axis=0, ddof=1)

# Part C requires that the number of epochs be increased from 50 to 100
# Input Parameters
num_repeats = 50
num_epochs = 100
num_hidden_layers = 1

# Now we can leverage the code written above to process the normalized data.
mse_list, mse_mean, mse_std = runModel(X_normalized, y, num_repeats, num_epochs, num_hidden_layers)

# Print the results
#print("MSE List:", mse_list)
print("Mean MSE: {:.2f}".format(mse_mean))
print("Standard Deviation of MSE: {:.2f}".format(mse_std))

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━

Increasing the number of epochs from 50 to 100 led to a significant drop in mean MSE from 383.18 to 166.79 and a decrease in standard deviation from 102.62 to 18.94. This indicates that the model has learned more effectively with additional training, resulting in better performance and more consistent outcomes across runs. More epochs allow the model to adjust its weights further, improving its ability to capture patterns in the data. However, while this is generally expected behavior, it's important to monitor for overfitting, where the model might start memorizing the training data rather than generalizing well to unseen data. Techniques like early stopping can help manage this risk. Overall, the improvements suggest that the model benefits from more training, but careful monitoring is essential to ensure continued generalization.

D. Increase the number of hidden layers

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

How does the mean of the mean squared errors compare to that from Step B?



In [185]:
# Input Parameters
num_repeats = 50
num_epochs = 50 # Part B called for 50 epochs so let's use 50 for Part D as well
num__hidden_layers = 3

mse_list, mse_mean, mse_std = runModel(X_normalized, y, num_repeats, num_epochs, num_hidden_layers)

# Print the results
print("Mean MSE: {:.2f}".format(mse_mean))
print("Standard Deviation of MSE: {:.2f}".format(mse_std))

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━

Adding more hidden layers (3 layers with 10 nodes each) while keeping the number of epochs at 50 resulted in a slight decrease in mean MSE from 383.18 to 369.67, with the standard deviation remaining at 102. The additional layers did not help the model capture more complex patterns in the data given the very modest improvement in performance. The unchanged standard deviation indicates that the variability in model performance across runs is consistent, suggesting that the added complexity did not significantly affect the model's stability. While deeper models can potentially improve learning, they also require careful tuning of hyperparameters like learning rate and regularization to avoid overfitting. The results imply that while the model benefits from increased capacity, further adjustments or more training epochs might be needed to realize substantial performance gains.