# Delhi AQI Fire Spike Analysis: Comprehensive Modeling Framework

## Part 1: Advection & Wind Transport Visualization

### 1.1 Simple Advection Model (Gaussian Plume + Wind Drift)
**Purpose:** Visualize how fire particles disperse and travel from Punjab to Delhi.

**Model Logic:**
```
Fire emission source (Punjab coordinates)
    ↓
Gaussian plume/puff dispersal (horizontal & vertical spread)
    ↓
Advection by hourly wind vectors (u,v components derived from scalar speed)
    ↓
Particle concentration grid (overlaid on map)
    ↓
Arrival at Delhi (temporal lag estimation)
```

**Implementation approach:**
- **Input:** Fire locations (lat/lon), FRP (fire intensity), hourly wind speed, wind direction (inferred or from reanalysis)
- **Processing:**
  1. Parse wind speed → assume dominant advection direction (seasonal pattern or reanalysis u/v if available)
  2. Create a 2D grid covering Punjab → Delhi corridor
  3. For each fire detection: initialize Gaussian source, evolve forward using advection-diffusion
  4. Concentration(x,y,t) = FRP × exp(-(distance²)/(4Dt)) moved by wind vector
  5. Track particles reaching Delhi (threshold concentration)
- **Output:** Animated heatmap showing particle transport, concentration arrival at Delhi, time-lag histogram

**Assumptions & caveats:**
- Wind direction inferred from reanalysis (ERA5) or seasonal climatology (NE winds Oct–Nov)
- Particle lifetime ~24–72 hrs (depends on particle size, humidity)
- Simple model; doesn't include complex vertical mixing, chemical reactions, deposition

**Tools:**
- `scipy.ndimage` for Gaussian kernel / diffusion
- `folium` or `plotly` for interactive grid visualization
- `xarray` for gridded data (if using reanalysis)

---

### 1.2 Lagrangian Particle Tracking (Optional Advanced)
**Purpose:** More realistic transport simulation using backward/forward trajectories.

**Model:**
- Use HYSPLIT-style backward trajectory from Delhi on high-AQI days
- Or simplified forward tracking: for each fire, compute hourly position drift (wind_speed × direction)
- Accumulate residence time over Delhi grid cell
- Correlate high residence time → high AQI

**Data needed:** u,v wind components (reanalysis ERA5, MERRA2, or station-derived)

---

## Part 2: Machine Learning Models for AQI Spike Prediction

### 2.1 Time-Series Feature Engineering (Foundation)

**Target variable:**
- `AQI_spike` = 1 if AQI > 150 OR AQI increase > 50 points in 24h; else 0
- Alternatively: `AQI_magnitude` = raw AQI value (regression)

**Fire-based features (lag 0–5 days):**
- Daily fire count in Punjab (all fires, high-confidence only)
- Total daily FRP (sum of fire intensities)
- FRP weighted by distance from Delhi (inverse distance or Gaussian decay)
- Fire count by region (south-central Punjab most relevant)
- Max brightness/temperature on day t, t-1, t-2

**Wind-based features (lag 0–5 days):**
- Daily mean wind speed (station or grid)
- Daily wind variability (std dev)
- Wind speed on day t-1, t-2, t-3 (lagged momentum)
- Wind direction consistency (concentration of direction vectors)
- Advection index: fire count × wind speed product (transport potential)

**Meteorological covariates:**
- Temperature, humidity, pressure (if available from station)
- Boundary layer height proxy (thermal instability)
- Season / day-of-year (harvest season Oct–Nov dominates)

**Autoregressive features:**
- AQI on day t-1, t-2, t-7 (week-ago baseline)
- AQI trend (t-1 vs t-3)
- Days since last high-AQI event

**Spatial features:**
- Fire density heatmap (fires per 0.5° grid cell over 7-day window)
- Wind-fire alignment: cos(angle between fire-to-Delhi direction & wind direction)

---

### 2.2 Candidate Model Architectures

#### **Model A: Gradient Boosting (XGBoost / LightGBM)** ⭐ Recommended for baseline
**Strengths:**
- Handles non-linear fire–wind–AQI relationships
- Natural feature importance interpretation
- Fast training, minimal hyperparameter tuning needed
- Robust to missing values (if careful)

**Setup:**
```
Input: [fire_count_t, frp_t, wind_speed_t, wind_speed_t-1, aqi_t-1, season, ...]
Output: AQI_t (regression) or AQI_spike_t (binary classification)
```

**Hyperparameters to tune:**
- Depth: 5–8 (avoid overfitting short lags)
- Learning rate: 0.01–0.1
- Lag window: 3–7 days (optimal likely 3–5)

**Expected performance:** R² ≈ 0.65–0.75 (regression); AUC ≈ 0.80–0.88 (classification spike detection)

---

#### **Model B: LSTM / Temporal Convolutional Network (TCN)** (Advanced)
**Strengths:**
- Captures long-range temporal dependencies (lag effects > 7 days)
- Native sequence-to-sequence modeling (no manual lag engineering)
- Can learn nonlinear advection timing

**Architecture:**
```
Input sequence: [fire_count, frp, wind_speed, pollutants] over 14–21 days
↓
LSTM layers (64–128 units, 2–3 stacks)
↓
Dense layers (32–16 units, dropout 0.3)
↓
Output: AQI_t+1 (next-day prediction)
```

**Pros & cons:**
- **Pro:** Automatic lag discovery, handles variable-length sequences
- **Con:** Requires more data (~1400 rows available; borderline), prone to overfitting, harder to interpret

**Expected performance:** Similar to XGBoost if well-tuned; marginal improvement if lag relationships are complex

---

#### **Model C: Linear/Regularized Models (Baseline + Interpretability)**
**Simple linear regression with LASSO/Ridge:**
```
AQI_t = β₀ + Σ βᵢ × (fire_count_{t-i}) + Σ γⱼ × (wind_speed_{t-j}) + ε
```

**Strengths:**
- Transparent, easy to explain
- Fast, no tuning needed
- Good baseline

**Expected R²:** 0.45–0.60 (adequate but XGBoost typically better)

---

#### **Model D: Hybrid Mechanistic-ML (Physics-Informed)**
**Idea:** Combine transport simulation + ML

**Approach:**
1. Run advection model → compute "particle arrival at Delhi" (probability / concentration on day t)
2. Feed particle arrival as feature into XGBoost (alongside fire count, wind speed)
3. Let ML learn residual patterns not captured by physics

**Expected benefit:** More interpretable, potentially better generalization (out-of-distribution fires)

---

### 2.3 Implementation Roadmap

**Step 1: Data preparation**
```
1. Merge delhi_aqi, punjab_fire, wind_speed on date
2. Aggregate fire → daily count, daily FRP (by region if possible)
3. Aggregate wind → daily mean, std per station or state-wide mean
4. Parse dates, handle missing values (forward-fill wind if gaps < 2 days)
5. Create lag features: aqi_t-1, aqi_t-2, ..., fire_count_t-1, ..., wind_speed_t-1, ...
6. Create target: aqi_spike (binary) or aqi_t (regression)
7. Train-test split: 80% 2020–2023, 20% 2024 (temporal split for time-series)
```

**Step 2: EDA + Feature correlation**
```
1. Correlation heatmap: fires vs aqi (by lag)
2. Lag plots: AQI vs fire count at lag 0–7 days
3. Wind direction rose + seasonal distribution
4. Fire season (Oct–Nov) vs non-season AQI/fire behavior
5. Identify "smoking gun" lags (e.g., fire_count_t-2 strongly predicts aqi_t)
```

**Step 3: Model training**
```
For each model (XGBoost, LSTM, Linear):
  1. Hyperparameter grid search (5-fold CV on train set)
  2. Train on 2020–2023
  3. Evaluate on 2024 hold-out
  4. Compute: R², RMSE, MAE, or AUC, precision, recall (classification)
```

**Step 4: Interpretation & diagnostics**
```
1. Feature importance (XGBoost SHAP values)
2. Partial dependence plots (effect of fire_count on AQI, holding wind fixed)
3. Residual analysis (high-error days → exogenous factors?)
4. Temporal plot: predicted vs actual AQI 2024
```

---

## Part 3: Visualization & Delivery

### 3.1 Advection/Wind Visualization
**Interactive components:**
- 2D grid map (Punjab → Delhi) with fire markers (colored by FRP)
- Animated wind vectors at hourly intervals
- Gaussian plume overlay (concentration heatmap advancing with time)
- Arrival time histogram at Delhi (when particles reach threshold)
- Time-slider to show state at any hour

**Tools:** `plotly` (animation), `folium` (base map), `matplotlib` (static snapshots)

---

### 3.2 Model Performance Dashboard
- Train/test loss curves
- Feature importance bar chart (top 10 predictors)
- Confusion matrix (spike classification) or residual distribution (regression)
- Predicted vs actual AQI time series (2024 validation)
- ROC curve / AUC score

---

### 3.3 Case Study: High-Spike Event
Pick a notable event (e.g., Nov 2020 or Oct 2023 spike):
- Timeline of fires (map + FRP over 7-day window)
- Wind speed & direction over period
- Observed AQI rise
- Model prediction accuracy
- Advection animation showing particle transport

---

## Summary: Model Selection Guide

| **Goal** | **Recommended Model** | **Why** | **Effort** |
|----------|----------------------|--------|-----------|
| Understand transport | Gaussian plume + wind drift | Simple, interpretable, physically grounded | Low |
| Predict AQI (baseline) | XGBoost | Best accuracy-to-complexity ratio | Medium |
| Predict AQI (advanced) | LSTM or hybrid mech-ML | Captures complex lags, interpretable | High |
| Explain findings | Linear + SHAP from XGBoost | Transparent coefficients | Low |
| Production system | Ensemble (XGBoost + LSTM) | Robustness, redundancy | High |

---

## Data & Code Checklist

**Before you start:**
- [ ] Load & inspect all 3 CSVs (check date ranges, missing %, unique stations)
- [ ] Confirm fire date format matches wind & AQI dates
- [ ] Verify wind speed units (km/h ✓) and fire coordinates (lat/lon ✓)
- [ ] Identify key fire regions (south-central Punjab: Sangrur, Patiala, Barnala)
- [ ] Establish AQI spike threshold (150 or 50-point rise?)

**Recommended libraries:**
- `pandas`, `numpy`, `scikit-learn`
- `xgboost`, `lightgbm` (boosting)
- `tensorflow`/`keras` (LSTM)
- `folium`, `plotly` (visualization)
- `shap` (model interpretation)
- `scipy.ndimage`, `scipy.interpolate` (advection simulation)