# **Air Pollution Forecasting Model**  
## **Hybrid LSTM-GRU-Transformer Model for PM2.5 Prediction**  
*This notebook implements a deep learning pipeline for forecasting air pollution levels, specifically PM2.5 concentrations. The approach integrates LSTM, GRU, and Transformer models to capture temporal dependencies in the data.*  

---
### **Objectives of This Notebook:**  
- Load and preprocess time-series air pollution data.  
- Build a hybrid model combining LSTM, GRU, and Transformer architectures.  
- Train the model using historical air pollution data.  
- Evaluate model performance using key metrics such as R², MAE, and RMSE.  
- Save the trained model for future use.  
---


## **Step 1: Import Required Libraries**  
To build and train the model efficiently, we import necessary libraries:  
- **`pandas`**: For data loading and preprocessing.  
- **`tensorflow.keras`**: For deep learning model creation.  
- **`sklearn.preprocessing`**: For feature scaling.  
- **`sklearn.metrics`**: For model evaluation.  
These libraries form the core framework for our machine learning pipeline.

In [1]:
import os
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import (
    Input, LSTM, GRU, Dense, Concatenate, MultiHeadAttention,
    LayerNormalization, Dropout, GlobalAveragePooling1D
)
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error






## **Step 2: Load and Inspect the Dataset**  
The dataset contains air pollution time-series data with various pollutant concentrations.  
**Key Actions in this Step:**  
- Load the preprocessed dataset from a CSV file.  
- Display the first five rows to inspect its structure.  
- Identify feature columns that will be used for model training.

In [None]:
# Define paths
DATA_DIR = "./data"
MODEL_DIR = "./models"
INPUT_FILE = os.path.join(DATA_DIR, "Enhanced_Time-Series_Air_Pollution_Data_Revised.csv")
MODEL_FILE = os.path.join(MODEL_DIR, "base_model.keras")

# Load dataset
data = pd.read_csv(INPUT_FILE)

print(data.head(5))



    Timestamp  PM2.5 (µg/m³)  PM10 (µg/m³)  NO (µg/m³)  NO2 (µg/m³)  \
0  2017-01-01       230.5000      329.4500       14.22        11.21   
1  2017-01-02       229.6352      328.2442       14.22        11.21   
2  2017-01-03       228.7705      327.0383       14.22        11.21   
3  2017-01-04       227.9057      325.8325       14.22        11.21   
4  2017-01-05       227.0409      324.6267       14.22        11.21   

   NOx (ppb)  NH3 (µg/m³)  SO2 (µg/m³)  CO (mg/m³)  Ozone (µg/m³)  ...  \
0    25.4300          NaN          NaN      0.7400        56.5000  ...   
1    25.5190          NaN          NaN      0.7421        56.3816  ...   
2    25.6081          NaN          NaN      0.7441        56.2632  ...   
3    25.6971          NaN          NaN      0.7462        56.1449  ...   
4    25.7862          NaN          NaN      0.7483        56.0265  ...   

   PM2.5 (µg/m³)_rolling_mean  PM2.5 (µg/m³)_lag_1  PM10 (µg/m³)_rolling_mean  \
0                    230.5000             230.5

## **Step 3: Feature Selection and Data Normalization**  
In this step, we define input features and the target variable (`PM2.5`).  
**Transformations Applied:**  
- **Min-Max Scaling:** Standardizes input features to the range `[0,1]` for stable model training.  
- **Reshaping Data:** Ensures the input dimensions align with TensorFlow's LSTM/GRU requirements.  
- **Train-Test Split:** Splits the data into training (80%) and testing (20%) sets.

In [3]:
# Define features and target
features = [
    'PM10 (µg/m³)', 'NOx (ppb)', 'CO (mg/m³)', 'Ozone (µg/m³)',
    'PM2.5 (µg/m³)_rolling_mean', 'PM2.5 (µg/m³)_lag_1',
    'PM10 (µg/m³)_rolling_mean', 'PM10 (µg/m³)_lag_1',
    'NOx (ppb)_rolling_mean', 'NOx (ppb)_lag_1',
    'CO (mg/m³)_rolling_mean', 'CO (mg/m³)_lag_1',
    'Ozone (µg/m³)_rolling_mean', 'Ozone (µg/m³)_lag_1'
]
target = 'PM2.5 (µg/m³)'

# Normalize features and target
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()

X = scaler_X.fit_transform(data[features])
y = scaler_y.fit_transform(data[target].values.reshape(-1, 1))
X = X.reshape(X.shape[0], 1, X.shape[1])

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=False)



## **Step 4: Define the Hybrid LSTM-GRU-Transformer Model**  
We design a hybrid deep learning model with three key components:  
- **LSTM Branch:** Captures long-term dependencies in the time series.  
- **GRU Branch:** Captures short-term dependencies while reducing computational cost.  
- **Transformer Encoder:** Uses multi-head attention to model complex temporal relationships.  
**Key Layers Used:**  
- **`MultiHeadAttention`**: Enhances feature extraction in the Transformer branch.  
- **`Dropout` & `LayerNormalization`**: Prevent overfitting and stabilize learning.  
- **`Dense` Layers**: Perform final feature transformation before prediction.

In [4]:
# Define model
def define_model(input_shape):
    inputs = Input(shape=input_shape)
    lstm_branch = LSTM(256)(inputs)
    gru_branch = GRU(64)(inputs)
    
    def transformer_encoder(inputs, num_heads=4, key_dim=16, ff_dim=256, dropout=0.3):
        attn_output = MultiHeadAttention(num_heads=num_heads, key_dim=key_dim)(
            query=inputs, key=inputs, value=inputs
        )
        attn_output = Dropout(dropout)(attn_output)
        attn_output = LayerNormalization(epsilon=1e-6)(attn_output + inputs)
        
        ffn_output = Dense(ff_dim, activation='relu')(attn_output)
        ffn_output = Dense(inputs.shape[-1])(ffn_output)
        ffn_output = Dropout(dropout)(ffn_output)
        
        return LayerNormalization(epsilon=1e-6)(ffn_output + attn_output)
    
    transformer_branch = transformer_encoder(inputs)
    transformer_branch = GlobalAveragePooling1D()(transformer_branch)
    concat = Concatenate()([lstm_branch, gru_branch, transformer_branch])
    
    dense = Dense(256, activation='relu')(concat)
    dense = Dropout(0.3)(dense)
    dense = Dense(64, activation='relu')(dense)
    dense = Dropout(0.3)(dense)
    outputs = Dense(1)(dense)
    
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer=Adam(learning_rate=0.0001), loss='mse', metrics=['mae'])
    return model



## **Step 5: Train the Model**  
The model is compiled with the Adam optimizer and Mean Squared Error (MSE) loss function.  
**Training Details:**  
- **Batch Size:** 16  
- **Epochs:** 20  
- **Optimizer:** Adam (learning rate = 0.0001)  
- **Validation Set:** 20% of data reserved for testing  
During training, the model learns from historical data to minimize the forecasting error.

In [5]:
# Train model
model = define_model((X_train.shape[1], X_train.shape[2]))
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=16, verbose=1)



Epoch 1/20
[1m4729/4729[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 6ms/step - loss: 0.0061 - mae: 0.0516 - val_loss: 5.3307e-04 - val_mae: 0.0149
Epoch 2/20
[1m4729/4729[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 6ms/step - loss: 0.0010 - mae: 0.0211 - val_loss: 4.6257e-04 - val_mae: 0.0139
Epoch 3/20
[1m4729/4729[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 5ms/step - loss: 8.3251e-04 - mae: 0.0186 - val_loss: 3.9448e-04 - val_mae: 0.0115
Epoch 4/20
[1m4729/4729[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 4ms/step - loss: 7.4331e-04 - mae: 0.0175 - val_loss: 4.0186e-04 - val_mae: 0.0120
Epoch 5/20
[1m4729/4729[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 4ms/step - loss: 7.0191e-04 - mae: 0.0168 - val_loss: 4.8881e-04 - val_mae: 0.0144
Epoch 6/20
[1m4729/4729[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 4ms/step - loss: 6.8549e-04 - mae: 0.0163 - val_loss: 3.8687e-04 - val_mae: 0.0118
Epoch 7/20
[1m4729/4729[0m 

<keras.src.callbacks.history.History at 0x197963dfbd0>

## **Step 6: Save the Trained Model**  
Once the model has been trained, it is saved for future predictions.  
- The model is stored in `.keras` format under the `models` directory.  
- This allows reloading the model without retraining from scratch.

In [6]:
# Save model
if not os.path.exists(MODEL_DIR):
    os.makedirs(MODEL_DIR)
model.save(MODEL_FILE)
print(f"Model saved to {MODEL_FILE}")




Model saved to ./models\base_model.keras


## **Step 7: Evaluate Model Performance**  
To assess model effectiveness, we compute key performance metrics:  
- **R² Score (Coefficient of Determination):** Measures how well the model explains variance in PM2.5 levels.  
- **Mean Absolute Error (MAE):** Measures the average absolute difference between predicted and actual values.  
- **Root Mean Squared Error (RMSE):** Penalizes larger errors more than MAE.  
**Results:**  
- **Train R²:** 0.9423  
- **Test R²:** 0.9538  
- **Train MAE:** 0.0106, **Test MAE:** 0.0112  
- **Train RMSE:** 0.0197, **Test RMSE:** 0.0195  
These results indicate that the model has strong predictive accuracy on unseen data.

In [7]:
# Evaluate model
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)
mae_train = mean_absolute_error(y_train, y_pred_train)
mae_test = mean_absolute_error(y_test, y_pred_test)
rmse_train = np.sqrt(mean_squared_error(y_train, y_pred_train))
rmse_test = np.sqrt(mean_squared_error(y_test, y_pred_test))

print(f"Train R²: {r2_train:.4f}, MAE: {mae_train:.4f}, RMSE: {rmse_train:.4f}")
print(f"Test R²: {r2_test:.4f}, MAE: {mae_test:.4f}, RMSE: {rmse_test:.4f}")

[1m2365/2365[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step
[1m592/592[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step
Train R²: 0.9423, MAE: 0.0106, RMSE: 0.0197
Test R²: 0.9538, MAE: 0.0112, RMSE: 0.0195


## **Conclusion and Future Work**  
**Key Takeaways:**  
- The hybrid LSTM-GRU-Transformer model effectively captures air pollution trends.  
- The high R² score suggests that the model generalizes well to new data.  
- The combination of LSTM, GRU, and Transformer provides robust feature extraction.  

**Future Enhancements:**  
- Experiment with hyperparameter tuning to improve forecasting accuracy.  
- Incorporate additional meteorological variables for better predictions.  
- Apply advanced techniques such as attention mechanisms for improved performance.