# Neural Network Regression on Concrete Data

**Short description:**  
This notebook trains and evaluates simple neural network regression models to predict concrete compressive strength. It compares a shallow network (one hidden layer) and a deeper network (three hidden layers), computes mean squared error (MSE) across repeated train/test splits, and reports mean and standard deviation of MSE.

**Objectives**
- Load the Concrete dataset and separate predictors and target.
- Build and train neural network regressors with Keras (one-layer and three-layer architectures).
- Evaluate models by repeating random train/test splits multiple times and computing MSE distributions.
- Observe the effect of feature normalization and number of training epochs on model performance.

**Notice about documentation:**  
The original notebook submission (course assignment) was kept intact. I have **only modified documentation (comments, headings, markdown)** and made **minimal, necessary corrections** to ensure the notebook runs without errors. All rights related to the lab/workshop design and original exercise belong exclusively to **IBM Corporation**. This notebook includes additional documentation for clarity, but the intellectual property of the original exercise is retained by IBM.

---

## Table of contents

1. Dependencies & execution instructions  
2. Data loading & basic inspection  
3. Model definitions (one-layer and three-layer networks)  
4. Experiment function: repeated train/test splits and MSE collection  
5. Experiments: unnormalized vs normalized data; 50 vs 100 epochs  
6. Reporting results (mean/std of MSE)  
7. Notes & reproducibility


## 1) Dependencies & execution instructions

This section installs and imports required Python packages.  

**Recommended local execution steps:**

1. Create and activate a Python virtual environment:
   - `python -m venv venv`
   - `source venv/bin/activate` (macOS / Linux) or `venv\Scripts\activate` (Windows)
2. Install dependencies:
   - `pip install -r requirements.txt`
3. Launch Jupyter Notebook:
   - `jupyter notebook`
4. Open this notebook and run cells top-to-bottom.

**Note:** Training neural networks may take time depending on CPU/GPU availability.

In [1]:
import pandas as pd
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


## 2) Data loading & basic inspection

Load the concrete dataset from the provided URL, inspect the head and shape, split predictors and target, and display a few example rows.

In [2]:
# the database is loaded
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [101]:
concrete_data.shape

(1030, 9)

In [102]:
# the predictors and target variable are defined
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']]
target = concrete_data['Strength']

In [103]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [104]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [105]:
# the number of predictor variables is defined
n_cols = predictors.shape[1]

In [106]:
# function for computing the mean and standard deviation of the mse_list
def mean_std(mse_list):
    mse_mean = np.mean(mse_list)
    mse_std = np.std(mse_list)

    return mse_mean, mse_std

## 3) Model definitions

Define two Keras Sequential models:
- `regression_model_1()` — one hidden Dense layer (10 units, ReLU) + output Dense(1).
- `regression_model_2()` — three hidden Dense layers (each 10 units, ReLU) + output Dense(1).

Both models are compiled with `adam` optimizer and `mean_squared_error` loss.

In [107]:
# function to create the neural network with one hidden layer of 10 nodes, ReLU activation function, adam optimizer and the mean squared error  as the loss function
def regression_model_1():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [None]:
# function to create the neural network with three hidden layers of 10 nodes, ReLU activation function, adam optimizer and the mean squared error  as the loss function
def regression_model_2():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

## 4) Experiment function

`model_results(X, y, epochs, model)`:
- Repeats the following 50 times:
  - Random train/test split (30% test)
  - Train the provided Keras model for `epochs`
  - Predict on test set and compute MSE
- Returns the list of 50 MSE values.

Note: Each repetition re-uses the same `model` instance passed in. If you want fully independent initializations per repetition, instantiate the model inside the loop (not changed here to preserve original code).

In [108]:
# function for spliting the data, training and evaluating the model, creating the list of mse, and doing it 50 times
# the arguments are: X (predictor variables), y (target variable), epochs (number of epochs for training the model), and model (either model 1 or 2)
def model_results(X, y, epochs, model):
    mse_list = []

    for i in range(50):
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    
        model.fit(X_train, y_train, epochs=epochs, verbose=2)
    
        y_pred = model.predict(X_test)
        mse = mean_squared_error(y_test, y_pred)
        mse_list.append(mse)

    return mse_list

## 5) Experiments

Run experiments described in the original pipeline:
- Model 1 (one-layer) on original predictors, 50 epochs (mse_list_A).
- Normalize predictors and run Model 1, 50 epochs (mse_list_B).
- Normalize predictors and run Model 1, 100 epochs (mse_list_C).
- Model 2 (three-layer) on normalized predictors, 50 epochs (mse_list_D).

For each experiment compute mean and std of MSE values.

### Model 1 without normalized predictors and 50 epochs

In [109]:
# creating the list of mse for model_1 and 50 epochs
model_1 = regression_model_1()
mse_list_A = model_results(predictors, target, 50, model_1)

Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


23/23 - 2s - 66ms/step - loss: 1898.9360
Epoch 2/50
23/23 - 0s - 4ms/step - loss: 739.7112
Epoch 3/50
23/23 - 0s - 4ms/step - loss: 668.6503
Epoch 4/50
23/23 - 0s - 4ms/step - loss: 630.7272
Epoch 5/50
23/23 - 0s - 4ms/step - loss: 599.1901
Epoch 6/50
23/23 - 0s - 4ms/step - loss: 569.2962
Epoch 7/50
23/23 - 0s - 4ms/step - loss: 543.8708
Epoch 8/50
23/23 - 0s - 3ms/step - loss: 518.1779
Epoch 9/50
23/23 - 0s - 4ms/step - loss: 494.1522
Epoch 10/50
23/23 - 0s - 4ms/step - loss: 467.1937
Epoch 11/50
23/23 - 0s - 4ms/step - loss: 441.3365
Epoch 12/50
23/23 - 0s - 3ms/step - loss: 418.0471
Epoch 13/50
23/23 - 0s - 3ms/step - loss: 395.4468
Epoch 14/50
23/23 - 0s - 4ms/step - loss: 374.4590
Epoch 15/50
23/23 - 0s - 4ms/step - loss: 353.1048
Epoch 16/50
23/23 - 0s - 4ms/step - loss: 334.0754
Epoch 17/50
23/23 - 0s - 4ms/step - loss: 313.6978
Epoch 18/50
23/23 - 0s - 4ms/step - loss: 294.8700
Epoch 19/50
23/23 - 0s - 4ms/step - loss: 280.0932
Epoch 20/50
23/23 - 0s - 4ms/step - loss: 268.044

Normalization of predictors

In [111]:
# normalizing the data
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


### Model 1 with normalized predictors and 50 epochs

In [112]:
# creating the list of mse for model_1, 50 epochs and the normalized data
mse_list_B = model_results(predictors_norm, target, 50, model_1)

Epoch 1/50
23/23 - 0s - 4ms/step - loss: 1536.9999
Epoch 2/50
23/23 - 0s - 4ms/step - loss: 1469.4036
Epoch 3/50
23/23 - 0s - 3ms/step - loss: 1411.1653
Epoch 4/50
23/23 - 0s - 3ms/step - loss: 1354.5145
Epoch 5/50
23/23 - 0s - 3ms/step - loss: 1300.2220
Epoch 6/50
23/23 - 0s - 7ms/step - loss: 1247.5341
Epoch 7/50
23/23 - 0s - 3ms/step - loss: 1196.2292
Epoch 8/50
23/23 - 0s - 4ms/step - loss: 1145.7946
Epoch 9/50
23/23 - 0s - 3ms/step - loss: 1098.0558
Epoch 10/50
23/23 - 0s - 4ms/step - loss: 1050.2742
Epoch 11/50
23/23 - 0s - 4ms/step - loss: 1004.8798
Epoch 12/50
23/23 - 0s - 4ms/step - loss: 960.8311
Epoch 13/50
23/23 - 0s - 3ms/step - loss: 919.0916
Epoch 14/50
23/23 - 0s - 3ms/step - loss: 879.3350
Epoch 15/50
23/23 - 0s - 3ms/step - loss: 841.9765
Epoch 16/50
23/23 - 0s - 3ms/step - loss: 806.2346
Epoch 17/50
23/23 - 0s - 3ms/step - loss: 772.9826
Epoch 18/50
23/23 - 0s - 5ms/step - loss: 742.0641
Epoch 19/50
23/23 - 0s - 4ms/step - loss: 713.6810
Epoch 20/50
23/23 - 0s - 3ms/

### Model 1 with normalized predictors and 100 epochs

In [114]:
# creating the list of mse for model_1, 100 epochs and the normalized data
mse_list_C = model_results(predictors_norm, target, 100, model_1)

Epoch 1/100
23/23 - 0s - 4ms/step - loss: 33.1694
Epoch 2/100
23/23 - 0s - 4ms/step - loss: 32.8338
Epoch 3/100
23/23 - 0s - 3ms/step - loss: 32.8199
Epoch 4/100
23/23 - 0s - 3ms/step - loss: 32.7531
Epoch 5/100
23/23 - 0s - 4ms/step - loss: 32.7589
Epoch 6/100
23/23 - 0s - 4ms/step - loss: 32.7583
Epoch 7/100
23/23 - 0s - 3ms/step - loss: 32.7506
Epoch 8/100
23/23 - 0s - 3ms/step - loss: 32.6782
Epoch 9/100
23/23 - 0s - 3ms/step - loss: 32.6610
Epoch 10/100
23/23 - 0s - 4ms/step - loss: 32.6702
Epoch 11/100
23/23 - 0s - 4ms/step - loss: 32.6542
Epoch 12/100
23/23 - 0s - 3ms/step - loss: 32.6754
Epoch 13/100
23/23 - 0s - 4ms/step - loss: 32.5741
Epoch 14/100
23/23 - 0s - 3ms/step - loss: 32.5917
Epoch 15/100
23/23 - 0s - 4ms/step - loss: 32.5444
Epoch 16/100
23/23 - 0s - 4ms/step - loss: 32.5629
Epoch 17/100
23/23 - 0s - 4ms/step - loss: 32.5373
Epoch 18/100
23/23 - 0s - 7ms/step - loss: 32.5613
Epoch 19/100
23/23 - 0s - 3ms/step - loss: 32.4767
Epoch 20/100
23/23 - 0s - 4ms/step - los

### Model 2 with normalized predictors and 50 epochs

In [None]:
# creating the list of mse for model_2, 50 epochs and the normalized data
model_2 = regression_model_2()
mse_list_D = model_results(predictors_norm, target, 50, model_2)

Epoch 1/50


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


23/23 - 2s - 103ms/step - loss: 1532.3062
Epoch 2/50
23/23 - 0s - 3ms/step - loss: 1514.9410
Epoch 3/50
23/23 - 0s - 4ms/step - loss: 1493.9015
Epoch 4/50
23/23 - 0s - 4ms/step - loss: 1460.0490
Epoch 5/50
23/23 - 0s - 4ms/step - loss: 1402.4320
Epoch 6/50
23/23 - 0s - 3ms/step - loss: 1308.1195
Epoch 7/50
23/23 - 0s - 3ms/step - loss: 1164.8206
Epoch 8/50
23/23 - 0s - 4ms/step - loss: 984.0485
Epoch 9/50
23/23 - 0s - 4ms/step - loss: 783.5726
Epoch 10/50
23/23 - 0s - 4ms/step - loss: 606.3700
Epoch 11/50
23/23 - 0s - 3ms/step - loss: 483.8041
Epoch 12/50
23/23 - 0s - 3ms/step - loss: 408.6103
Epoch 13/50
23/23 - 0s - 3ms/step - loss: 361.8578
Epoch 14/50
23/23 - 0s - 3ms/step - loss: 329.2640
Epoch 15/50
23/23 - 0s - 3ms/step - loss: 301.2474
Epoch 16/50
23/23 - 0s - 3ms/step - loss: 278.4857
Epoch 17/50
23/23 - 0s - 3ms/step - loss: 259.5164
Epoch 18/50
23/23 - 0s - 4ms/step - loss: 243.7226
Epoch 19/50
23/23 - 0s - 4ms/step - loss: 229.6302
Epoch 20/50
23/23 - 0s - 3ms/step - loss: 

## 6) Reporting results

Print mean and standard deviation of the MSE lists for each experiment. These metrics summarize expected performance and variability across random splits.

In [110]:
# calculating the mean and std for the mse of mse_list_A
mse_mean_A, mse_std_A = mean_std(mse_list_A)

print(f"Mean MSE: {mse_mean_A}")
print(f"Standard Deviation of MSE: {mse_std_A}")

Mean MSE: 58.16437494758243
Standard Deviation of MSE: 22.02944620289316


In [113]:
# calculating the mean and std for the mse of mse_list_B
mse_mean_B, mse_std_B = mean_std(mse_list_B)

print(f"Mean MSE: {mse_mean_B}")
print(f"Standard Deviation of MSE: {mse_std_B}")

Mean MSE: 47.936572431336856
Standard Deviation of MSE: 40.10708042710215


The mean of the mean squared errors from B (47.936572431336856) is less than the mean of the mean squared errors from A (58.16437494758243) due to the normalization of the data used for training model 1.

Normalization adjusts the scales of all features to be more comparable. When the data isn´t normalized, one feature might have values in the range [1, 1000], while another might have values in [0, 1]. This can cause the network to favor the larger-scale feature during training.

In [115]:
# calculating the mean and std for the mse of mse_list_C
mse_mean_C, mse_std_C = mean_std(mse_list_C)

print(f"Mean MSE: {mse_mean_C}")
print(f"Standard Deviation of MSE: {mse_std_C}")

Mean MSE: 33.18396910580179
Standard Deviation of MSE: 2.621427863414613


The mean of the mean squared errors from C (33.18396910580179) is considerably less than the mean of the mean squared errors from B (47.936572431336856) due to the 50 additional epochs used for training model 1.

Increasing the number of epochs allows the neural network to train for a longer period of time, which typically results in better model performance. By increasing the number of epochs from 50 to 100, the model gets more opportunities to adjust its weights. During each epoch, the model fine-tunes its internal parameters (weights and biases) to minimize the loss function (MSE).

With 50 epochs (in part B), the model might not have fully converged, meaning it may not have found the optimal set of weights yet. The learning process might be cut short before the model reaches the minimum of the loss function.

In [118]:
# calculating the mean and std for the mse of mse_list_D
mse_mean_D, mse_std_D = mean_std(mse_list_D)

print(f"Mean MSE: {mse_mean_D}")
print(f"Standard Deviation of MSE: {mse_std_D}")

Mean MSE: 29.404595510174452
Standard Deviation of MSE: 11.153002784027382


The mean of the mean squared errors from D (29.404595510174452) is considerably less than the mean of the mean squared errors from B (47.936572431336856) due to the 2 additional hidden layers of model 2 in comparison with model 1.

Increasing the number of hidden layers allows the neural network to model more complex relationships within the data. With three hidden layers, the model can learn more hierarchical and abstract features compared to a single hidden layer.

Each additional layer allows the network to build upon the features learned in the previous layer, helping it capture nonlinear and intricate patterns in the data that a simpler network might miss.

## 7) Notes & reproducibility

- To ensure independent re-initialization of weights per repetition, instantiate the model inside the loop in `model_results`.
- If runtime is too long, reduce the number of repetitions or epochs for quicker iteration.
