# CNN model training

## LAQN air quality prediction

## what this notebook does?

this notebook trains a convolutional neural network (CNN) to predict nitrogen dioxide (NO2) levels using the same LAQN data that was used for random forest. the goal is to compare CNN performance against the random forest baseline (test R² = 0.814).

## 1. setup and imports

First, as usual import everything. 
tensorflow/keras is the deep learning library. 
numpy handles arrays. 
matplotlib and seaborn make plots.

In [21]:
# std libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import joblib
import warnings
warnings.filterwarnings('ignore')

# scikit-learn for metrics r^2, MSE, MAE
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# tensorflow and keras for neural network
import tensorflow as tf
from tensorflow import keras
import platform

# adding tensorflow keras models, layers optimizer adam  for cnn model section 4

from tensorflow.keras import models, layers
from tensorflow.keras.optimizers import Adam

        operating system: Darwin
        processor: arm
        tensorflow version: 2.16.2
        built with CUDA: False

        available devices:
        - PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

In [7]:
# set paths update this to match your folder structure using cwd
base_dir = Path.cwd().parent.parent / 'data' / 'laqn'
data_dir = base_dir / 'ml_prep' # where ml_prep saved the arrays
output_dir = base_dir / 'cnn_model' # where we save CNN outputs 
output_dir.mkdir(parents=True, exist_ok=True)

print(f'loading data from: {data_dir}')
print(f'saving outputs to: {output_dir}')

loading data from: /Users/burdzhuchaglayan/Desktop/data science projects/air-pollution-levels/data/laqn/ml_prep
saving outputs to: /Users/burdzhuchaglayan/Desktop/data science projects/air-pollution-levels/data/laqn/cnn_model



### GPU availability

- TensorFlow code, and tf.keras models will transparently run on a single GPU with no code changes required.
> Note: Use tf.config.list_physical_devices('GPU') to confirm that TensorFlow is using the GPU.
- The simplest way to run on multiple GPUs, on one or many machines, is using Distribution Strategies.

Source: *Use a GPU :  Tensorflow Core* (no date) *TensorFlow*



In [2]:
# checks gpu availability taken from documentation.

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f'GPU available: {len(gpus)} device(s)')
    for gpu in gpus:
        print(f'  - {gpu.name}')
else:
    print('no GPU found, using CPU (training will be slower but still works)')

no GPU found, using CPU (training will be slower but still works)


## 2. load prepared data

the data was prepared in ml_prep notebook. it created sequences where each sample has 12 hours of history to predict the next hour. this is the same data random forest used, just in 3D shape instead of flattened.

### why 3D data for CNN?

random forest needs flat 2D data: (samples, features). CNN needs 3D data: (samples, timesteps, features). the 3D shape lets CNN learn patterns across time, not just treat each timestep as an independent feature.

think of it like this:
- 2D (random forest): each row is a list of 468 numbers with no structure
- 3D (CNN): each sample is a 12×39 grid where rows are hours and columns are features

In [14]:
# load the 3d sequences for cnn
X_train = np.load(data_dir / 'X_train.npy')
X_val = np.load(data_dir / 'X_val.npy')
X_test = np.load(data_dir / 'X_test.npy')

y_train = np.load(data_dir / 'y_train.npy')
y_val = np.load(data_dir / 'y_val.npy')
y_test = np.load(data_dir / 'y_test.npy')

#load feature_names and scaler
feature_names = joblib.load(data_dir / 'feature_names.joblib')
scaler = joblib.load(data_dir / 'scaler.joblib')

print('data loaded successfully')
print(f'\nshapes:')
print(f'X_train:{X_train.shape}')
print(f'X_val:{X_val.shape}')
print(f'X_test:{X_test.shape}')
print(f'y_train:{y_train.shape}')
print(f'y_val:{y_val.shape}')
print(f'y_test:{y_test.shape}')

data loaded successfully

shapes:
X_train:(9946, 12, 39)
X_val:(2131, 12, 39)
X_test:(2132, 12, 39)
y_train:(9946, 39)
y_val:(2131, 39)
y_test:(2132, 39)


    data loaded successfully

    shapes:
    X_train:(9946, 12, 39)
    X_val:(2131, 12, 39)
    X_test:(2132, 12, 39)
    y_train:(9946, 39)
    y_val:(2131, 39)
    y_test:(2132, 39)

### understanding the shapes

X_train shape is (9946, 12, 39). this means:

| dimension | value | what it represents |
|-----------|-------|-------------------|
| samples | 9,946 | individual training examples |
| timesteps | 12 | hours of history per sample |
| features | 39 | 24 NO2 + 8 PM10 + 3 O3 + 4 temporal |

y_train shape is (9946, 39). the model predicts all 39 features for the next hour.

for fair comparison with random forest, I focus on EN5_NO2 (first column) as the single target. this is the same station used in the RF training report. EN5 had the highest data coverage (99.6%) which makes it the most reliable target for evaluation.

the 3D shape is the key difference from random forest:
- random forest got flattened 2D: (9946, 468) where 468 = 12 × 39
- CNN keeps the 3D structure: (9946, 12, 39)

why does this matter? CNN can learn that hour 1 connects to hour 2 connects to hour 3. random forest just sees 468 separate numbers with no time relationship. this is why CNN might capture temporal patterns better.

In [9]:
# select single target as RF EN5_NO2
target_idx = 0
target_name = feature_names[target_idx]

y_train_single = y_train[:, target_idx]
y_val_single = y_val[:, target_idx]
y_test_single = y_test[:, target_idx]

print(f'target feature: {target_name}')
print(f'y_train_single shape: {y_train_single.shape}')
print(f'y_val_single shape: {y_val_single.shape}')
print(f'y_test_single shape: {y_test_single.shape}')

target feature: EN5_NO2
y_train_single shape: (9946,)
y_val_single shape: (2131,)
y_test_single shape: (2132,)


    target feature: EN5_NO2
    y_train_single shape: (9946,)
    y_val_single shape: (2131,)
    y_test_single shape: (2132,)

## 3. understanding CNN for time series

before building the model, let me explain what a CNN actually does. this helps understand why certain choices are made.

### what is a convolutional neural network?
CNN  designed for image recognition. It slides a small "filter" (also called kernel) across the input to detect patterns. The patterns can be for images, this finds edges, shapes, textures. For time series, it finds temporal patterns. (Gilik, A., Ogrenci, A.S. and Ozmen, A. (2021a) ‘Air quality prediction using CNN+LSTM-based hybrid deep learning architecture’)

### why I decided to go with CNN for timeseries air quality?
Gilik, A., Ogrenci, A.S AND Ozmen, A(2021) on their Air Quality Prediction Using CNN LSTM-based hybrid deep learning architecture found that CNN can capture local temporal dependencies in pollution data. pollution at hour t depends heavily on hours t-1, t-2, t-3. CNN's sliding filter naturally captures this. I will be add the graph of this to my dissertation.
According to this finding it is makes more sense CNN's local pattern detection should capture this same relationship but can also learn multi-hour patterns that RF might miss.

source: 

Gilik, A., Ogrenci, A.S. and Ozmen, A. (2021a) ‘Air quality prediction using CNN+LSTM-based hybrid deep learning architecture’

## 4. build baseline CNN model

Starting a simple architecture. 
The logic is: simple first, add complexity only if neeeds. 
I will be follow Hands-on machine learning... Géron's book.

### architecture choices explained

| layer | what it does | why we use it |
|-------|--------------|---------------|
| Conv1D | extracts temporal patterns | learns what combinations of hours predict next hour |
| MaxPooling1D | reduces sequence length | keeps important patterns, reduces computation |
| Flatten | converts 2D to 1D | prepares for Dense layer |
| Dense | combines patterns | learns how to weight different patterns |
| Dropout | randomly turns off neurons | prevents overfitting |

### hyperparameters

- filters: how many different patterns to learn (like having multiple detectors)
- kernel_size: how many timesteps each filter looks at. Conv1D(14, kernel_size=1)
- pool_size: how much to compress after convolution
- dropout rate: fraction of neurons to turn off during training

source: https://www.geeksforgeeks.org/deep-learning/adam-optimizer/

In [22]:
def cnn_model(timesteps, features):
    """
    Build a 1D CNN for time series prediction.
    Based on Géron (2023, ch. 15) approach.
    
    parames:
        timesteps: number of historical hours 12 hours
        features: number of input features 39 
    
    returns:
        compiled keras model
    """
    model = models.Sequential([
        # input layer explicit input shape
        layers.Input(shape=(timesteps, features)),
        
        # first conv layer with stride=2 to downsample. Geron (2023)"the convolutional layer may help detect longer patterns"
        layers.Conv1D(
            filters=32,
            kernel_size=4,
            strides=2,
            activation='relu',
            padding='causal'  
        ),
        layers.Dropout(0.2),
        
        
        # second conv layer
        layers.Conv1D(
            filters=32,
            kernel_size=4,
            activation='relu',
            padding='causal'
        ),
        layers.Dropout(0.2),
        
        # flatten and dense for final prediction
        layers.Flatten(),
        layers.Dense(50, activation='relu'),
        layers.Dropout(0.2),
        
        # output layer single value for EN5_NO2 prediction Conv1D(filters=1, kernel_size=1) is equivalent to Dense(1) 
        layers.Dense(1)
    ])
    
    # used Adam optimiser here it's an efficient, robust, algorithm that combines momentum and adaptive learning rates.https://www.geeksforgeeks.org/deep-learning/adam-optimizer/
    model.compile(
        optimizer=Adam(learning_rate=0.001),
        loss='mse',
        metrics=['mae']
    )
    
    return model

In [23]:
# get dimensions from data
timesteps = X_train.shape[1]  # 12
n_features = X_train.shape[2]  # 39

print(f'building CNN with:')
print(f'  timesteps: {timesteps}')
print(f'  features: {n_features}')

# build the model
model = cnn_model(timesteps, n_features)

# show architecture summary
model.summary()

building CNN with:
  timesteps: 12
  features: 39


    building CNN with:
    timesteps: 12
    features: 39  

<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold">Model: "sequential"</span>
</pre>

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv1d (Conv1D)                 │ (None, 6, 32)          │         5,024 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 6, 32)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv1d_1 (Conv1D)               │ (None, 6, 32)          │         4,128 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 6, 32)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 192)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 50)             │         9,650 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │            51 │

    Total params: 18,853 (73.64 KB)
    Trainable params: 18,853 (73.64 KB)
    Non-trainable params: 0 (0.00 B)