# CNN model training for all LAQN stations

## what this notebook does

This notebook trains convolutional neural network (CNN) models to predict air pollution levels across ALL London Air Quality Network stations. I train CNN models for every station pollutant combination in the LAQN dataset.

the goal is to compare network wide CNN performance against the random forest results from rf_training_laqn_all.ipynb.

## why all stations instead of one?

the single station approach (EN5_NO2) was useful for proof of concept. The dissertation needs to show how the models perform across the entire LAQN network. This means training separate models for each of the 141 station pollutant combinations.


## structure of this notebook

| section | what it does |
|---------|-------------|
| 1 | setup and imports |
| 2 | load prepared data from ml_prep_all |
| 3 | understand the data shapes |
| 4 | identify all target columns |
| 5 | build CNN model function |
| 6 | set up callbacks |
| 7 | train models for all targets with checkpoints |
| 8 | load results from colab training |
| 9 | investigate broken models |
| 10 | baseline evaluation after exclusion |
| 11 | results summary and save |
| 12 | prediction visualisations |
| 13 | residual analysis |
| 14 | final summary |

## 1) Setup and Imports

Importing everything needed for CNN training. tensorflow/keras handles the neural network numpy for arrays matplotlib and seaborn for plotting scikit learn for metrics.

In [None]:
#Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import joblib
import warnings
import time
import gc
from collections import Counter
warnings.filterwarnings('ignore')

#Scikit learn for metrics r2, MSE, MAE
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

#Tensorflow and keras for CNN
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import models, layers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

print(f'Tensorflow version: {tf.__version__}')

tensorflow version: 2.19.0

In [None]:
#paths
base_dir = Path.cwd().parent.parent / 'data' / 'laqn'
data_dir = base_dir / 'ml_prep_all'
output_dir = Path.cwd().parent.parent / 'data' / 'ml' / 'LAQN_all' / 'cnn_model'
output_dir.mkdir(parents=True, exist_ok=True)

print(f'loading data from: {data_dir}')
print(f'saving outputs to: {output_dir}')

### GPU availability

checking if GPU is available. CNN training is faster on GPU but will still work on CPU.

source: Use a GPU: Tensorflow Core (no date) TensorFlow. Available at: https://www.tensorflow.org/guide/gpu

In [None]:
#Check gpu availability
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f'GPU available: {len(gpus)} device(s)')
    for gpu in gpus:
        print(f'  - {gpu.name}')
else:
    print('No GPU found, using CPU training will be slower but still works.')

no GPU found, using CPU training will be slower but still works


## 2) load prepared data

The data was prepared in `ml_prep_laqn_all.ipynb`it created sequences where each sample has 12 hours of history to predict the next hour for all stations.

### why 3D data for CNN?

Random forest needs flat 2D data: (samples, features). CNN needs 3D data: (samples, timesteps, features). the 3D shape lets CNN learn patterns across time, not just treat each timestep as an independent feature.


| Data Shape | Model | Structure |
|------------|-------|----------|
| 2D | random forest | each row is a flat list of numbers with no structure |
| 3D | CNN | each sample is a grid where rows are hours and columns are features |

In [None]:
#Load the 3d sequences for cnn
X_train = np.load(data_dir / 'X_train.npy')
X_val = np.load(data_dir / 'X_val.npy')
X_test = np.load(data_dir / 'X_test.npy')

y_train = np.load(data_dir / 'y_train.npy')
y_val = np.load(data_dir / 'y_val.npy')
y_test = np.load(data_dir / 'y_test.npy')

#Load feature_names and scaler
feature_names = joblib.load(data_dir / 'feature_names.joblib')
scaler = joblib.load(data_dir / 'scaler.joblib')

print('Data loaded successfully.')
print(f'\nShapes:')
print(f'X_train: {X_train.shape}')
print(f'X_val: {X_val.shape}')
print(f'X_test: {X_test.shape}')
print(f'y_train: {y_train.shape}')
print(f'y_val: {y_val.shape}')
print(f'y_test: {y_test.shape}')

    data loaded successfully

    shapes:
    X_train: (17107, 12, 145)
    X_val: (3656, 12, 145)
    X_test: (3657, 12, 145)
    y_train: (17107, 145)
    y_val: (3656, 145)
    y_test: (3657, 145)

## 3) Understanding the Shapes

X_train shape:

| Dimension | What it represents |
|-----------|--------------|
| first | number of samples (individual training examples) |
| second | timesteps (12 hours of history) |
| third | features (all station pollutant columnS +temporal) |

y_train shape is (samples, features). The model can predict all features for the next hour.

In [None]:
#Extract dimensions
n_samples, timesteps, n_features = X_train.shape

print(f'\ndata dimensions:')
print(f'  samples: {n_samples:,}')
print(f'  timesteps: {timesteps}')
print(f'  features: {n_features}')
print(f'\nfeature names ({len(feature_names)} total):')
print(f'  first 10: {feature_names[:10]}')
print(f'  last 10: {feature_names[-10:]}')

  data dimensions:
  samples: 17,107
  timesteps: 12
  features: 145

  feature names (145 total):
    first 10: ['BG1_NO2', 'BG1_SO2', 'BG2_NO2', 'BG2_PM10', 'BQ7_NO2', 'BQ7_O3', 'BQ7_PM10', 'BQ7_PM25', 'BQ9_PM10', 'BQ9_PM25']
    last 10: ['WAC_PM10', 'WM5_NO2', 'WM6_NO2', 'WM6_PM10', 'WMD_NO2', 'WMD_PM25', 'hour', 'day_of_week', 'month', 'is_weekend']

## 4) Identify All Target Columns

Need to identify which columns are pollutant predictions (targets) and which are temporal features. Temporal features like hour, day_of_week are inputs only, not things we want to predict.

### Pollutant Naming Convention

Each target column follows the pattern: `{SiteCode}_{PollutantCode}`

Example:

| Example | Meaning |
|---------|--------|
| BG1_NO2 | NO2 at site BG1 |
| EN5_PM25 | PM2.5 at site EN5 |

### The 6 regulatory pollutants:

| Pollutant | Code | UK Annual Limit |
|-----------|------|----------------|
| Nitrogen Dioxide | NO2 | 40 ug/m3 |
| PM2.5 Particulate | PM25 | 20 ug/m3 |
| PM10 Particulate | PM10 | 40 ug/m3 |
| Ozone | O3 | n/a |
| Sulphur Dioxide | SO2 | n/a |
| Carbon Monoxide | CO | n/a |


In [None]:
#Temporal vs pollutant columns
temporal_cols = ['hour', 'day_of_week', 'month', 'is_weekend']

#Get pollutant target columns everything except temporal
target_names = [name for name in feature_names if name not in temporal_cols]
target_indices = [i for i, name in enumerate(feature_names) if name not in temporal_cols]

#Create target mapping dictionary
target_mapping = {name: i for i, name in enumerate(feature_names) if name not in temporal_cols}

print(f'total features: {len(feature_names)}')
print(f'temporal features: {len(temporal_cols)}')
print(f'pollutant targets: {len(target_names)}')

#Count by pollutant type
pollutant_codes = ['NO2', 'PM25', 'PM10', 'O3', 'SO2', 'CO']
print(f'\nbreakdown by pollutant:')
for poll in pollutant_codes:
    count = len([n for n in target_names if f'_{poll}' in n])
    print(f'  {poll}: {count} stations')

    total features: 145
    temporal features: 4
    pollutant targets: 141

    breakdown by pollutant:
      NO2: 58 stations
      PM25: 24 stations
      PM10: 42 stations
      O3: 11 stations
      SO2: 4 stations
      CO: 2 stations


In [None]:
#Prepare y arrays with only pollutant targets excluded temporal
y_train_targets = y_train[:, target_indices]
y_val_targets = y_val[:, target_indices]
y_test_targets = y_test[:, target_indices]

print(f'target arrays prepared:')
print(f'  y_train_targets: {y_train_targets.shape}')
print(f'  y_val_targets: {y_val_targets.shape}')
print(f'  y_test_targets: {y_test_targets.shape}')
print(f'\nwill train {len(target_names)} separate CNN models')

    target arrays prepared:
      y_train_targets: (17107, 141)
      y_val_targets: (3656, 141)
      y_test_targets: (3657, 141)

    will train 141 separate CNN models


## 5)Build CNN Model Function

Building a function that creates CNN models. Using the best hyperparameters found from the single station tuning:

| parameter | value | why |
|-----------|-------|-----|
| filters_1 | 128 | more capacity to learn patterns |
| kernel_1 | 2 | short term patterns matter most |
| dropout | 0.1 | less regularisation needed |
| filters_2 | 64 | second layer with fewer filters |
| kernel_2 | 2 | consistent with first layer |
| dense_units | 50 | same as baseline |
| learning_rate | 0.001 | adam default works well |

These parameters came from keras tuner results in the single station CNN notebook.

source: Geron, A. (2023) Hands on machine learning with scikit learn, Keras and TensorFlow. Ch. 15.

In [None]:
def build_cnn_model(timesteps,
                    features, filters_1=128,
                    filters_2=64,
                    kernel_size=2,
                    dropout_rate=0.1,
                    dense_units=50,
                    learning_rate=0.001):
    """
    Build a 1D CNN for time series prediction.
    Based on tuned hyperparameters from single station experiment.

    params:
        timesteps: number of historical hours (12)
        features: number of input features
        filters_1: filters in first conv layer
        filters_2: filters in second conv layer
        kernel_size: size of convolutional kernel
        dropout_rate: dropout rate for regularisation
        dense_units: neurons in dense layer
        learning_rate: adam learning rate

    returns:
        compiled keras model
    """
    model = models.Sequential([
        #Input layer
        layers.Input(shape=(timesteps, features)),

        #first conv layer
        layers.Conv1D(
            filters=filters_1,
            kernel_size=kernel_size,
            activation='relu',
            padding='causal'
        ),
        layers.Dropout(dropout_rate),

        #2. conv layer
        layers.Conv1D(
            filters=filters_2,
            kernel_size=kernel_size,
            activation='relu',
            padding='causal'
        ),
        layers.Dropout(dropout_rate),

        #Flatten and dense
        layers.Flatten(),
        layers.Dense(dense_units, activation='relu'),
        layers.Dropout(dropout_rate),

        #Output layer single value
        layers.Dense(1)
    ])

    model.compile(
        optimizer=Adam(learning_rate=learning_rate, clipnorm=1.0),
        loss='mse',
        metrics=['mae']
    )

    return model

In [None]:
#Test model creation
test_model = build_cnn_model(timesteps, n_features)
print(f'model created with {test_model.count_params():,} parameters')
test_model.summary()

    model created with 92,197 parameters
    Model: "sequential"

    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
    ┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
    ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
    │ conv1d (Conv1D)                 │ (None, 12, 128)        │        37,248 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ dropout (Dropout)               │ (None, 12, 128)        │             0 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ conv1d_1 (Conv1D)               │ (None, 12, 64)         │        16,448 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ dropout_1 (Dropout)             │ (None, 12, 64)         │             0 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ flatten (Flatten)               │ (None, 768)            │             0 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ dense (Dense)                   │ (None, 50)             │        38,450 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ dropout_2 (Dropout)             │ (None, 50)             │             0 │
    ├─────────────────────────────────┼────────────────────────┼───────────────┤
    │ dense_1 (Dense)                 │ (None, 1)              │            51 │
    └─────────────────────────────────┴────────────────────────┴───────────────┘

    Total params: 92,197 (360.14 KB)

    Trainable params: 92,197 (360.14 KB)

    Non-trainable params: 0 (0.00 B)



### understanding the summary

the summary shows each layer, its output shape, and parameter count.

| term | meaning |
|------|--------|
| param # | number of learnable weights. more parameters = more capacity to learn, but also more risk of overfitting |
| output shape | (None, timesteps, filters). None is batch size, determined at runtime |
| total params | all weights the model will learn during training |

the model has 92,197 parameters which is appropriate for this task.

## 6)Set-up Training Callbacks

Callbacks control training behaviour.

| Callback | What it does | Why |
|----------|--------------|-----|
| EarlyStopping | stops when validation loss stops improving | prevents overfitting |
| ReduceLROnPlateau | reduces learning rate when stuck | helps find better minimum |

Not using ModelCheckpoint for each model because of 141 targets. Saving checkpoints manually every N models instead.

source: Team, K. (no date) Keras Documentation: Callbacks. Available at: https://keras.io/api/callbacks/

In [None]:
def get_callbacks():
    """Create callbacks for training."""
    return [
        EarlyStopping(
            monitor='val_loss',
            patience=10,
            restore_best_weights=True,
            verbose=0
        ),
        ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=5,
            min_lr=0.00001,
            verbose=0
        )
    ]

print('Callbacks configured:')
print('Early stopping (patience=10)')
print('Reduce LR on plateau (factor=0.5, patience=5)')

  callbacks configured:
    - early stopping (patience=10)
    - reduce LR on plateau (factor=0.5, patience=5)


## 7) Train Models For All Targets

Training a separate CNN model for each target. This will take a while because:

| aspect | detail |
|--------|--------|
| targets | 141 station pollutant combinations |
| epochs per model | up to 50 (early stopping) |


### Checkpoint

Saving results every 20 models in case something goes wrong(Since I ran this notebook on Colab this clever approach not work). this way if the notebook crashes, I don't lose everything.


source: Geron, A. (2023) Hands on machine learning with scikit learn, Keras and TensorFlow. Ch. 11.

**note:** Training was completed on Google Colab. The results are loaded in section 8 manually extracted from the output of it.

In [None]:
#training configuration
BATCH_SIZE = 32
MAX_EPOCHS = 50
CHECKPOINT_EVERY = 20

print(f'training configuration:')
print(f'  batch size: {BATCH_SIZE}')
print(f'  max epochs: {MAX_EPOCHS}')
print(f'  checkpoint every: {CHECKPOINT_EVERY} models')
print(f'  total targets: {len(target_names)}')

  training configuration:
    batch size: 32
    max epochs: 50
    checkpoint every: 20 models
    total targets: 141

### Training Loop as Reference Only

The training loop below was run on Google Colab and took 5.74 hours to complete. The results are loaded from the output logs in the next section.

In [None]:
#Training loop reference code
results = []
all_models = {}

start_time = time.time()

print(f'Started at: {time.strftime("%Y-%m-%d %H:%M:%S")}')
print(f'Targets to train: {len(target_names)}')
print(f'Training samples: {n_samples:,}')
print(f'Features: {n_features}')
print('=' * 40)

for i, target_name in enumerate(target_names):
    target_idx = target_mapping[target_name]
    model_start = time.time()

    #Build model
    model = build_cnn_model(timesteps, n_features)

    #Train
    history = model.fit(
        X_train, y_train[:, target_idx],
        validation_data=(X_val, y_val[:, target_idx]),
        epochs=MAX_EPOCHS,
        batch_size=BATCH_SIZE,
        callbacks=get_callbacks(),
        verbose=0
    )

    #Evaluate on test set
    y_pred = model.predict(X_test, verbose=0).flatten()
    y_actual = y_test[:, target_idx]

    test_r2 = r2_score(y_actual, y_pred)
    test_rmse = np.sqrt(mean_squared_error(y_actual, y_pred))
    test_mae = mean_absolute_error(y_actual, y_pred)

    #Extract pollutant from target name
    parts = target_name.rsplit('_', 1)
    pollutant = parts[1] if len(parts) > 1 else 'unknown'

    results.append({
        'target': target_name,
        'pollutant': pollutant,
        'test_r2': test_r2,
        'test_rmse': test_rmse,
        'test_mae': test_mae,
        'epochs': len(history.history['loss'])
    })

    #store model
    all_models[target_name] = model

    #progress update
    elapsed = time.time() - model_start
    remaining = len(target_names) - (i + 1)
    eta = (elapsed * remaining) / 60
    print(f'[{i+1:3d}/{len(target_names)}] {target_name:15s} | R2={test_r2:.3f} | Time={elapsed:.0f}s | ETA={eta:.0f}min')

    #checkpoint
    if (i + 1) % CHECKPOINT_EVERY == 0:
        checkpoint_df = pd.DataFrame(results)
        checkpoint_df.to_csv(output_dir / f'checkpoint_{i+1}.csv', index=False)
        print(f'   [Checkpoint saved at {i+1} models]')

    #memory cleanup
    tf.keras.backend.clear_session()
    gc.collect()

total_time = (time.time() - start_time) / 60
print('=' * 40)
print('Training complete!')
print(f'Total time: {total_time:.1f} minutes ({total_time/60:.2f} hours)')
print(f'Average per model: {total_time*60/len(target_names):.1f} seconds')


    ============================================================
    Started at: 2025-12-31 01:05:47
    Targets to train: 141
    Training samples: 17,107
    Features: 145
    ============================================================
    [  1/141] BG1_NO2         | R2=0.585 | Time=140s | ETA=326min
    [  2/141] BG1_SO2         | R2=0.656 | Time=119s | ETA=300min
    [  3/141] BG2_NO2         | R2=-1495952367336140489190212108288.000 | Time=134s | ETA=302min
    [  4/141] BG2_PM10        | R2=0.023 | Time=70s | ETA=265min
    [  5/141] BQ7_NO2         | R2=0.706 | Time=80s | ETA=247min
    [  6/141] BQ7_O3          | R2=0.910 | Time=172s | ETA=269min
    [  7/141] BQ7_PM10        | R2=0.760 | Time=108s | ETA=263min
    [  8/141] BQ7_PM25        | R2=0.758 | Time=115s | ETA=261min
    [  9/141] BQ9_PM10        | R2=0.730 | Time=133s | ETA=263min
    [ 10/141] BQ9_PM25        | R2=0.726 | Time=147s | ETA=267min
    [ 11/141] BT4_NO2         | R2=0.804 | Time=164s | ETA=273min
    [ 12/141] BT4_PM10        | R2=0.554 | Time=135s | ETA=273min
    [ 13/141] BT4_PM25        | R2=0.522 | Time=113s | ETA=268min
    [ 14/141] BT5_NO2         | R2=0.747 | Time=159s | ETA=271min
    [ 15/141] BT5_PM10        | R2=0.525 | Time=94s | ETA=265min
    [ 16/141] BT5_PM25        | R2=0.164 | Time=91s | ETA=258min
    [ 17/141] BT6_NO2         | R2=0.784 | Time=144s | ETA=259min
    [ 18/141] BT6_PM10        | R2=0.507 | Time=218s | ETA=267min
    [ 19/141] BT6_PM25        | R2=0.463 | Time=157s | ETA=268min
    [ 20/141] BT8_NO2         | R2=0.729 | Time=131s | ETA=266min
      [Checkpoint saved at 20 models]
    [ 21/141] BT8_PM10        | R2=0.551 | Time=182s | ETA=269min
    [ 22/141] BT8_PM25        | R2=0.579 | Time=111s | ETA=264min
    [ 23/141] BX1_NO2         | R2=0.788 | Time=171s | ETA=265min
    [ 24/141] BX1_O3          | R2=0.891 | Time=160s | ETA=265min
    [ 25/141] BX1_SO2         | R2=0.701 | Time=195s | ETA=268min
    [ 26/141] BX2_NO2         | R2=0.761 | Time=142s | ETA=266min
    [ 27/141] BX2_PM10        | R2=0.554 | Time=191s | ETA=267min
    [ 28/141] BX2_PM25        | R2=0.777 | Time=133s | ETA=264min
    [ 29/141] BY7_NO2         | R2=0.722 | Time=137s | ETA=262min
    [ 30/141] BY7_PM10        | R2=0.416 | Time=104s | ETA=257min
    [ 31/141] BY7_PM25        | R2=0.462 | Time=145s | ETA=255min
    [ 32/141] CD1_NO2         | R2=0.682 | Time=139s | ETA=253min
    [ 33/141] CD1_PM10        | R2=0.554 | Time=141s | ETA=251min
    [ 34/141] CD1_PM25        | R2=0.551 | Time=202s | ETA=252min
    [ 35/141] CE2_NO2         | R2=0.509 | Time=73s | ETA=246min
    [ 36/141] CE2_O3          | R2=0.887 | Time=228s | ETA=248min
    [ 37/141] CE2_PM10        | R2=0.466 | Time=64s | ETA=242min
    [ 38/141] CE2_PM25        | R2=0.701 | Time=91s | ETA=238min
    [ 39/141] CE3_NO2         | R2=0.378 | Time=66s | ETA=232min
    [ 40/141] CE3_PM10        | R2=0.434 | Time=179s | ETA=232min
      [Checkpoint saved at 40 models]
    [ 41/141] CE3_PM25        | R2=0.505 | Time=128s | ETA=229min
    [ 42/141] CR5_NO2         | R2=0.821 | Time=152s | ETA=228min
    [ 43/141] CR7_NO2         | R2=0.799 | Time=215s | ETA=228min
    [ 44/141] CR8_PM25        | R2=0.000 | Time=215s | ETA=229min
    [ 45/141] CW3_NO2         | R2=0.740 | Time=132s | ETA=226min
    [ 46/141] CW3_PM10        | R2=0.844 | Time=204s | ETA=226min
    [ 47/141] CW3_PM25        | R2=0.872 | Time=173s | ETA=225min
    [ 48/141] EA6_NO2         | R2=0.787 | Time=160s | ETA=223min
    [ 49/141] EA6_PM10        | R2=0.550 | Time=194s | ETA=222min
    [ 50/141] EA8_NO2         | R2=0.757 | Time=136s | ETA=219min
    [ 51/141] EA8_PM10        | R2=0.609 | Time=109s | ETA=216min
    [ 52/141] EI1_NO2         | R2=0.797 | Time=144s | ETA=214min
    [ 53/141] EI1_PM10        | R2=0.523 | Time=229s | ETA=214min
    [ 54/141] EI8_PM10        | R2=0.606 | Time=132s | ETA=211min
    [ 55/141] EN1_NO2         | R2=0.851 | Time=149s | ETA=209min
    [ 56/141] EN4_NO2         | R2=0.777 | Time=151s | ETA=206min
    [ 57/141] EN5_NO2         | R2=0.810 | Time=122s | ETA=203min
    [ 58/141] EN7_NO2         | R2=0.765 | Time=165s | ETA=202min
    [ 59/141] GB0_PM25        | R2=0.649 | Time=110s | ETA=198min
    [ 60/141] GB6_NO2         | R2=0.824 | Time=133s | ETA=196min
      [Checkpoint saved at 60 models]
    [ 61/141] GB6_O3          | R2=0.891 | Time=216s | ETA=195min
    [ 62/141] GB6_PM10        | R2=0.504 | Time=144s | ETA=192min
    [ 63/141] GN0_NO2         | R2=0.780 | Time=88s | ETA=189min
    [ 64/141] GN0_PM10        | R2=0.475 | Time=124s | ETA=186min
    [ 65/141] GN0_PM25        | R2=0.571 | Time=128s | ETA=183min
    [ 66/141] GN3_NO2         | R2=0.786 | Time=124s | ETA=180min
    [ 67/141] GN3_O3          | R2=0.884 | Time=190s | ETA=179min
    [ 68/141] GN3_PM10        | R2=0.334 | Time=127s | ETA=176min
    [ 69/141] GN3_PM25        | R2=0.706 | Time=111s | ETA=173min
    [ 70/141] GN4_NO2         | R2=0.802 | Time=152s | ETA=171min
    [ 71/141] GN4_PM10        | R2=0.537 | Time=168s | ETA=169min
    [ 72/141] GN5_NO2         | R2=0.691 | Time=102s | ETA=166min
    [ 73/141] GN5_PM10        | R2=0.559 | Time=168s | ETA=164min
    [ 74/141] GN6_NO2         | R2=0.761 | Time=197s | ETA=162min
    [ 75/141] GN6_PM10        | R2=0.607 | Time=107s | ETA=159min
    [ 76/141] GN6_PM25        | R2=-0.731 | Time=143s | ETA=157min
    [ 77/141] GR7_NO2         | R2=0.758 | Time=232s | ETA=156min
    [ 78/141] GR7_PM10        | R2=0.602 | Time=179s | ETA=154min
    [ 79/141] GR8_NO2         | R2=0.749 | Time=119s | ETA=151min
    [ 80/141] GR8_PM10        | R2=0.543 | Time=136s | ETA=149min
      [Checkpoint saved at 80 models]
    [ 81/141] GR9_NO2         | R2=0.839 | Time=191s | ETA=147min
    [ 82/141] GR9_PM10        | R2=0.592 | Time=213s | ETA=145min
    [ 83/141] GR9_PM25        | R2=0.582 | Time=140s | ETA=143min
    [ 84/141] HG1_NO2         | R2=0.734 | Time=174s | ETA=140min
    [ 85/141] HG4_NO2         | R2=0.860 | Time=138s | ETA=138min
    [ 86/141] HG4_O3          | R2=0.919 | Time=211s | ETA=136min
    [ 87/141] HP1_NO2         | R2=0.745 | Time=71s | ETA=133min
    [ 88/141] HP1_O3          | R2=0.902 | Time=134s | ETA=130min
    [ 89/141] HP1_PM10        | R2=0.803 | Time=169s | ETA=128min
    [ 90/141] HP1_PM25        | R2=0.847 | Time=210s | ETA=126min
    [ 91/141] HV1_NO2         | R2=0.774 | Time=134s | ETA=124min
    [ 92/141] HV3_NO2         | R2=0.733 | Time=112s | ETA=121min
    [ 93/141] HV3_PM10        | R2=0.777 | Time=231s | ETA=119min
    [ 94/141] IS2_NO2         | R2=0.715 | Time=158s | ETA=117min
    [ 95/141] IS2_PM10        | R2=0.273 | Time=91s | ETA=114min
    [ 96/141] IS6_NO2         | R2=0.775 | Time=161s | ETA=111min
    [ 97/141] IS6_PM10        | R2=0.310 | Time=228s | ETA=110min
    [ 98/141] KC1_CO          | R2=0.483 | Time=99s | ETA=107min
    [ 99/141] KC1_NO2         | R2=0.845 | Time=187s | ETA=105min
    [100/141] KC1_O3          | R2=0.895 | Time=153s | ETA=102min
      [Checkpoint saved at 100 models]
    [101/141] KC1_PM25        | R2=0.839 | Time=108s | ETA=99min
    [102/141] KC1_SO2         | R2=0.803 | Time=176s | ETA=97min
    [103/141] LB4_NO2         | R2=0.566 | Time=120s | ETA=94min
    [104/141] LB4_PM10        | R2=0.450 | Time=160s | ETA=92min
    [105/141] LB4_PM25        | R2=0.405 | Time=176s | ETA=90min
    [106/141] LB6_NO2         | R2=0.840 | Time=138s | ETA=87min
    [107/141] LB6_PM10        | R2=0.248 | Time=99s | ETA=84min
    [108/141] ME9_NO2         | R2=0.764 | Time=159s | ETA=82min
    [109/141] MY1_CO          | R2=0.830 | Time=144s | ETA=79min
    [110/141] MY1_NO2         | R2=0.722 | Time=126s | ETA=77min
    [111/141] MY1_O3          | R2=0.857 | Time=106s | ETA=74min
    [112/141] MY1_SO2         | R2=0.077 | Time=226s | ETA=72min
    [113/141] RI1_NO2         | R2=0.668 | Time=99s | ETA=69min
    [114/141] RI1_PM10        | R2=0.287 | Time=111s | ETA=67min
    [115/141] RI2_NO2         | R2=0.804 | Time=137s | ETA=64min
    [116/141] RI2_O3          | R2=0.906 | Time=151s | ETA=62min
    [117/141] RI2_PM10        | R2=0.453 | Time=107s | ETA=59min
    [118/141] SK5_NO2         | R2=0.836 | Time=156s | ETA=57min
    [119/141] SK5_PM10        | R2=0.634 | Time=133s | ETA=54min
    [120/141] TH2_NO2         | R2=0.830 | Time=177s | ETA=52min
      [Checkpoint saved at 120 models]
    [121/141] TH4_NO2         | R2=-243629731975496583525961826304.000 | Time=183s | ETA=50min
    [122/141] TH4_O3          | R2=-859351937836635163572058456064.000 | Time=171s | ETA=47min
    [123/141] TH4_PM10        | R2=-460376520887681331232768000000.000 | Time=101s | ETA=45min
    [124/141] TH4_PM25        | R2=0.000 | Time=201s | ETA=42min
    [125/141] TL4_NO2         | R2=0.630 | Time=171s | ETA=40min
    [126/141] TL5_NO2         | R2=0.420 | Time=116s | ETA=37min
    [127/141] TL6_NO2         | R2=0.696 | Time=112s | ETA=35min
    [128/141] TL6_PM25        | R2=0.241 | Time=68s | ETA=32min
    [129/141] WA7_NO2         | R2=0.530 | Time=131s | ETA=30min
    [130/141] WA7_PM10        | R2=0.398 | Time=109s | ETA=27min
    [131/141] WA9_PM10        | R2=0.608 | Time=178s | ETA=25min
    [132/141] WAA_NO2         | R2=0.701 | Time=144s | ETA=22min
    [133/141] WAA_PM10        | R2=0.527 | Time=112s | ETA=20min
    [134/141] WAB_NO2         | R2=0.798 | Time=86s | ETA=17min
    [135/141] WAB_PM10        | R2=0.378 | Time=66s | ETA=15min
    [136/141] WAC_PM10        | R2=0.539 | Time=102s | ETA=12min
    [137/141] WM5_NO2         | R2=0.802 | Time=148s | ETA=10min
    [138/141] WM6_NO2         | R2=0.632 | Time=120s | ETA=7min
    [139/141] WM6_PM10        | R2=-5526321308161753884842262528.000 | Time=221s | ETA=5min
    [140/141] WMD_NO2         | R2=0.534 | Time=87s | ETA=2min
      [Checkpoint saved at 140 models]
    [141/141] WMD_PM25        | R2=0.700 | Time=183s | ETA=0min
    ============================================================
    Training complete!
    Total time: 344.7 minutes (5.74 hours)
    Average per model: 146.7 seconds
    ============================================================

## 8) Load Results Colab Training Output

The training was completed on Google Colab on 31 December 2025. the results below are extracted from the training output logs, manually added.

total training time: 344.7 minutes (5.74 hours)

average time per model: 146.7 seconds

In [None]:
#Results extracted from colab output manually format target_name, test_r2
colab_results = [
    ('BG1_NO2', 0.585),
    ('BG1_SO2', 0.656),
    ('BG2_NO2', -1.496e+30),
    ('BG2_PM10', 0.023),
    ('BQ7_NO2', 0.706),
    ('BQ7_O3', 0.910),
    ('BQ7_PM10', 0.760),
    ('BQ7_PM25', 0.758),
    ('BQ9_PM10', 0.730),
    ('BQ9_PM25', 0.726),
    ('BT4_NO2', 0.804),
    ('BT4_PM10', 0.554),
    ('BT4_PM25', 0.522),
    ('BT5_NO2', 0.747),
    ('BT5_PM10', 0.525),
    ('BT5_PM25', 0.164),
    ('BT6_NO2', 0.784),
    ('BT6_PM10', 0.507),
    ('BT6_PM25', 0.463),
    ('BT8_NO2', 0.729),
    ('BT8_PM10', 0.551),
    ('BT8_PM25', 0.579),
    ('BX1_NO2', 0.788),
    ('BX1_O3', 0.891),
    ('BX1_SO2', 0.701),
    ('BX2_NO2', 0.761),
    ('BX2_PM10', 0.554),
    ('BX2_PM25', 0.777),
    ('BY7_NO2', 0.722),
    ('BY7_PM10', 0.416),
    ('BY7_PM25', 0.462),
    ('CD1_NO2', 0.682),
    ('CD1_PM10', 0.554),
    ('CD1_PM25', 0.551),
    ('CE2_NO2', 0.509),
    ('CE2_O3', 0.887),
    ('CE2_PM10', 0.466),
    ('CE2_PM25', 0.701),
    ('CE3_NO2', 0.378),
    ('CE3_PM10', 0.434),
    ('CE3_PM25', 0.505),
    ('CR5_NO2', 0.821),
    ('CR7_NO2', 0.799),
    ('CR8_PM25', 0.000),
    ('CW3_NO2', 0.740),
    ('CW3_PM10', 0.844),
    ('CW3_PM25', 0.872),
    ('EA6_NO2', 0.787),
    ('EA6_PM10', 0.550),
    ('EA8_NO2', 0.757),
    ('EA8_PM10', 0.609),
    ('EI1_NO2', 0.797),
    ('EI1_PM10', 0.523),
    ('EI8_PM10', 0.606),
    ('EN1_NO2', 0.851),
    ('EN4_NO2', 0.777),
    ('EN5_NO2', 0.810),
    ('EN7_NO2', 0.765),
    ('GB0_PM25', 0.649),
    ('GB6_NO2', 0.824),
    ('GB6_O3', 0.891),
    ('GB6_PM10', 0.504),
    ('GN0_NO2', 0.780),
    ('GN0_PM10', 0.475),
    ('GN0_PM25', 0.571),
    ('GN3_NO2', 0.786),
    ('GN3_O3', 0.884),
    ('GN3_PM10', 0.334),
    ('GN3_PM25', 0.706),
    ('GN4_NO2', 0.802),
    ('GN4_PM10', 0.537),
    ('GN5_NO2', 0.691),
    ('GN5_PM10', 0.559),
    ('GN6_NO2', 0.761),
    ('GN6_PM10', 0.607),
    ('GN6_PM25', -0.731),
    ('GR7_NO2', 0.758),
    ('GR7_PM10', 0.602),
    ('GR8_NO2', 0.749),
    ('GR8_PM10', 0.543),
    ('GR9_NO2', 0.839),
    ('GR9_PM10', 0.592),
    ('GR9_PM25', 0.582),
    ('HG1_NO2', 0.734),
    ('HG4_NO2', 0.860),
    ('HG4_O3', 0.919),
    ('HP1_NO2', 0.745),
    ('HP1_O3', 0.902),
    ('HP1_PM10', 0.803),
    ('HP1_PM25', 0.847),
    ('HV1_NO2', 0.774),
    ('HV3_NO2', 0.733),
    ('HV3_PM10', 0.777),
    ('IS2_NO2', 0.715),
    ('IS2_PM10', 0.273),
    ('IS6_NO2', 0.775),
    ('IS6_PM10', 0.310),
    ('KC1_CO', 0.483),
    ('KC1_NO2', 0.845),
    ('KC1_O3', 0.895),
    ('KC1_PM25', 0.839),
    ('KC1_SO2', 0.803),
    ('LB4_NO2', 0.566),
    ('LB4_PM10', 0.450),
    ('LB4_PM25', 0.405),
    ('LB6_NO2', 0.840),
    ('LB6_PM10', 0.248),
    ('ME9_NO2', 0.764),
    ('MY1_CO', 0.830),
    ('MY1_NO2', 0.722),
    ('MY1_O3', 0.857),
    ('MY1_SO2', 0.077),
    ('RI1_NO2', 0.668),
    ('RI1_PM10', 0.287),
    ('RI2_NO2', 0.804),
    ('RI2_O3', 0.906),
    ('RI2_PM10', 0.453),
    ('SK5_NO2', 0.836),
    ('SK5_PM10', 0.634),
    ('TH2_NO2', 0.830),
    ('TH4_NO2', -2.436e+29),
    ('TH4_O3', -8.594e+29),
    ('TH4_PM10', -4.604e+29),
    ('TH4_PM25', 0.000),
    ('TL4_NO2', 0.630),
    ('TL5_NO2', 0.420),
    ('TL6_NO2', 0.696),
    ('TL6_PM25', 0.241),
    ('WA7_NO2', 0.530),
    ('WA7_PM10', 0.398),
    ('WA9_PM10', 0.608),
    ('WAA_NO2', 0.701),
    ('WAA_PM10', 0.527),
    ('WAB_NO2', 0.798),
    ('WAB_PM10', 0.378),
    ('WAC_PM10', 0.539),
    ('WM5_NO2', 0.802),
    ('WM6_NO2', 0.632),
    ('WM6_PM10', -5.526e+27),
    ('WMD_NO2', 0.534),
    ('WMD_PM25', 0.700)
]

print(f'loaded {len(colab_results)} results from colab training')

In [None]:
#create results dataframe
results = []
for target, r2 in colab_results:
    parts = target.rsplit('_', 1)
    pollutant = parts[1] if len(parts) > 1 else 'unknown'
    site = parts[0] if len(parts) > 1 else target

    results.append({
        'target': target,
        'site': site,
        'pollutant': pollutant,
        'test_r2': r2
    })

results_df = pd.DataFrame(results)

print('Results dataframe created')
print(f'\nShape: {results_df.shape}')
print(f'\nFirst 5 rows:')
print(results_df.head().to_string(index=False))

## 9) Investigation of Broken Models

Some models produced extremely negative R2 values, indicating numerical issues. Before continuing with results analysis, I need to investigate and document these failures.

In [None]:
#Identify broken models
print('Identifying broken models:')
print('=' * 40)

#Broken threshold R2 < -10 is num failure
broken_threshold = -10

broken_models = results_df[results_df['test_r2'] < broken_threshold].copy()
valid_models = results_df[results_df['test_r2'] >= broken_threshold].copy()

print(f'\nTotal Models:  {len(results_df)}')
print(f'Valid Models:  {len(valid_models)}')
print(f'Broken Models: {len(broken_models)}')

print('Broken Model Informations:')
print('-' * 40)
print(broken_models[['target', 'pollutant', 'test_r2']].to_string(index=False))

In [None]:
#Investigate broken models by checking test set variance
print('Detailed investigation of broken models:')
print('=' * 40)

broken_targets = broken_models['target'].values

for target in broken_targets:
    print(f'\n>>> {target}')
    print('-' * 40)

    target_idx = target_mapping[target]

    #Check target data statistics
    y_train_target = y_train[:, target_idx]
    y_val_target = y_val[:, target_idx]
    y_test_target = y_test[:, target_idx]

    print(f'training   - min: {y_train_target.min():.6f}, max: {y_train_target.max():.6f}, '
          f'std: {y_train_target.std():.6f}')
    print(f'validation - min: {y_val_target.min():.6f}, max: {y_val_target.max():.6f}, '
          f'std: {y_val_target.std():.6f}')
    print(f'test       - min: {y_test_target.min():.6f}, max: {y_test_target.max():.6f}, '
          f'std: {y_test_target.std():.6f}')

    #Check for constant or near constant values
    if y_test_target.std() < 0.001:
        print('>>> very low variance in test set (std < 0.001)')

In [None]:
#Root cause analysis
print('Root cause analysis of broken models:')
print('=' * 40)

#Check broken models site
broken_sites = [t.rsplit('_', 1)[0] for t in broken_targets]
print(f'\nBroken model sites: {broken_sites}')

#Count and addup if the sites have multiple failures
site_counts = Counter(broken_sites)
multi_broken = {site: count for site, count in site_counts.items() if count > 1}

if multi_broken:
    print(f'\nSites with multiple broken models: {multi_broken}')
    print('This suggests data quality issues at these monitoring stations.')

### Findings: Why these models broke


## 10) Baseline Evaluation After Exclusion

Evaluating the ... valid models, excluding the ... broken models.

In [None]:
#Baseline evaluation excluding broken models
print('CNN baseline evaluation excluding broken models:')
print('=' * 40)

print(f'\nValid models: {len(valid_models)} out of {len(results_df)}')
print(f'Broken models excluded: {len(broken_models)}')

print('Test set performance with valid models only:')
print('-' * 40)

print(f'\nMean R2:   {valid_models["test_r2"].mean():.4f}')
print(f'Median R2: {valid_models["test_r2"].median():.4f}')
print(f'Std R2:    {valid_models["test_r2"].std():.4f}')
print(f'Min R2:    {valid_models["test_r2"].min():.4f}')
print(f'Max R2:    {valid_models["test_r2"].max():.4f}')

In [None]:
#Performance by pollutant type
print('\nPerformance by pollutant type:')
print('=' * 40)

pollutant_summary = valid_models.groupby('pollutant').agg({
    'test_r2': ['mean', 'std', 'min', 'max', 'count']
}).round(4)

pollutant_summary.columns = ['r2_mean', 'r2_std', 'r2_min', 'r2_max', 'n_models']
pollutant_summary = pollutant_summary.sort_values('r2_mean', ascending=False)

print(pollutant_summary.to_string())

In [None]:
#Top/bottom performing models
print('\nTop 10 best performing targets by R2:')
print('-' * 50)
top_10 = valid_models.nlargest(10, 'test_r2')[['target', 'pollutant', 'test_r2']]
print(top_10.to_string(index=False))

print('\nBottom 10 worst performing targets by R2:')
print('-' * 50)
bottom_10 = valid_models.nsmallest(10, 'test_r2')[['target', 'pollutant', 'test_r2']]
print(bottom_10.to_string(index=False))

## 11) results summary and save

saving all results to csv files for later analysis and comparison with random forest.

In [None]:
#Save results
results_df.to_csv(output_dir / 'cnn_all_results.csv', index=False)
valid_models.to_csv(output_dir / 'cnn_valid_results.csv', index=False)
broken_models.to_csv(output_dir / 'cnn_broken_models.csv', index=False)
pollutant_summary.to_csv(output_dir / 'cnn_pollutant_summary.csv')

print('results saved:')
print(f'  - cnn_all_results.csv ({len(results_df)} models)')
print(f'  - cnn_valid_results.csv ({len(valid_models)} models)')
print(f'  - cnn_broken_models.csv ({len(broken_models)} models)')
print(f'  - cnn_pollutant_summary.csv')
print(f'\noutput directory: {output_dir}')

## 12) prediction visualisations

plotting actual vs predicted values helps identify systematic errors or patterns the models miss.

**scatter plot interpretation:**

| pattern | meaning |
|---------|--------|
| points close to diagonal | good predictions |
| spread around line | prediction variance |
| curve away at high values | model underestimates pollution spikes |

In [None]:
#R2 distribution histogram and boxplot by pollutant exclude models with R2 < 0 for better visualisation
valid_for_plot = valid_models[valid_models['test_r2'] >= 0].copy()
print(f'models for visualisation: {len(valid_for_plot)} (excluding {len(valid_models) - len(valid_for_plot)} with R2 < 0)')

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

#Histogram
axes[0].hist(valid_for_plot['test_r2'], bins=20, edgecolor='black', alpha=0.7, color='steelblue')
axes[0].axvline(valid_for_plot['test_r2'].mean(), color='red', linestyle='--',
                label=f'mean = {valid_for_plot["test_r2"].mean():.3f}')
axes[0].set_xlabel('Test R2')
axes[0].set_ylabel('Number of models')
axes[0].set_title('Distribution of test R2 valid models')
axes[0].legend()

#Boxplot by pollutant
pollutant_order = ['O3', 'NO2', 'PM25', 'PM10', 'CO', 'SO2']
box_data = [valid_for_plot[valid_for_plot['pollutant'] == p]['test_r2'].values
            for p in pollutant_order if p in valid_for_plot['pollutant'].values]
box_labels = [p for p in pollutant_order if p in valid_for_plot['pollutant'].values]

axes[1].boxplot(box_data, tick_labels=box_labels)
axes[1].set_xlabel('pollutant')
axes[1].set_ylabel('test R2')
axes[1].set_title('Test R2 by pollutant type:')
axes[1].axhline(0.8, color='green', linestyle='--', alpha=0.5, label='R2 = 0.8')
axes[1].set_ylim(0, 1)
axes[1].legend()

plt.tight_layout()
plt.savefig(output_dir / 'cnn_r2_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

print('Saved cnn_r2_distribution.png')

### Obserations for R2 Distribution



## 13) Residual Analysis

Residuals are the difference between actual and predicted values. If the model is good, residuals should scatter randomly around zero with no pattern.

Residual = Actual - Predicted

source: Effect of transforming the targets in regression model (no date) scikit. Available at: https://scikit-learn.org/stable/auto_examples/compose/plot_transformed_target.html

**Note:** Since trained models are not available locally, residual analysis would require retraining. this section shows the methodology that would be used.

In [None]:
#Residual analysis methodology reference code
print('Residual analysis methodology:')
print('=' * 40)
print('''
to perform residual analysis, the following code would be used
with a trained model:

```python
#Get predictions
y_pred = model.predict(X_test).flatten()
y_actual = y_test[:, target_idx]

#Calculate residuals
residuals = y_actual - y_pred

#Plot residuals
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

#Residuals vs predicted
axes[0].scatter(y_pred, residuals, alpha=0.3)
axes[0].axhline(y=0, color='r', linestyle='--')
axes[0].set_xlabel('predicted')
axes[0].set_ylabel('residual')
axes[0].set_title('residuals vs predicted')

#Residual histogram
axes[1].hist(residuals, bins=50, edgecolor='black')
axes[1].set_xlabel('residual')
axes[1].set_ylabel('frequency')
axes[1].set_title('residual distribution')
```

expected patterns:
- good model: residuals randomly scattered around zero
- underfitting: systematic patterns in residuals
- heteroscedasticity: residual spread increases with predicted value
''')

## 14) Final Summary

In [None]:
#Final summary
print('CNN model training summary all LAQN targets')
print('=' * 40)

print(f'\nDataset:')
print(f'  Training samples:   {X_train.shape[0]:,}')
print(f'  Validation samples: {X_val.shape[0]:,}')
print(f'  Test samples:       {X_test.shape[0]:,}')
print(f'  Features:           {X_train.shape[2]:,}')
print(f'  Timesteps:          {X_train.shape[1]}')

print(f'\Models:')
print(f'  Total trained:      {len(results_df)}')
print(f'  Valid models:       {len(valid_models)}')
print(f'  Broken models:      {len(broken_models)} (excluded due to data quality issues)')

print(f'\nHyperparameters used:')
print(f'  filters_1:     128')
print(f'  filters_2:     64')
print(f'  kernel_size:   2')
print(f'  dropout:       0.1')
print(f'  dense_units:   50')
print(f'  learning_rate: 0.001')

print(f'\nTest set performance for only valid models:')
print(f'  Mean R2:   {valid_models["test_r2"].mean():.4f} (+/- {valid_models["test_r2"].std():.4f})')
print(f'  Median R2: {valid_models["test_r2"].median():.4f}')

#Best performed pollutant
best_poll = pollutant_summary['r2_mean'].idxmax()
best_poll_r2 = pollutant_summary.loc[best_poll, 'r2_mean']
print(f'\nBest performing pollutant: {best_poll} (mean R2 = {best_poll_r2:.4f})')

#Best performed individual model
best_idx = valid_models['test_r2'].idxmax()
best_target = valid_models.loc[best_idx, 'target']
best_r2 = valid_models.loc[best_idx, 'test_r2']
print(f'Best individual model: {best_target} (R2 = {best_r2:.4f})')

print(f'\noutputs saved to: {output_dir}')

### key findings

| finding | detail |
|---------|--------|
| O3 most predictable | strong diurnal photochemical cycle makes ozone easiest to predict |
| PM10 most variable | diverse local sources cause high variability between stations |
| 5 broken models | data quality issues at TH4, BG2, WM6 stations (constant test values) |
| CNN vs RF | CNN achieves comparable but slightly lower performance than random forest |

### comparison with random forest

| metric | random forest | CNN |
|--------|--------------|-----|
| mean test R2 | 0.814 | 0.633 |
| best pollutant | O3 | O3 |
| training time | ~2 hours | ~6 hours |

random forest outperforms CNN on this dataset. possible reasons:

1. strong temporal autocorrelation favours lag features (RF captures this directly)
2. CNN may need more data or deeper architecture for time series
3. hyperparameters tuned on single station may not generalise to all stations

### conclusion

CNN models achieve reasonable performance across the LAQN network but do not match random forest results. the 12 hour sequence length with Conv1D architecture captures temporal patterns but the simpler random forest approach with flattened lag features proves more effective for this hourly prediction task.