# Weather conditions vs air pollution

This notebook investigates how meteorological conditions are related to air pollution levels, focusing on PM2.5 and PM10.

Using the cleaned dataset produced by `src/main.py` and stored in `data/process_data/weather_stage1_loaded.csv`, we:
- select key weather variables (temperature, rain, humidity, pressure, wind speed),
- compute their correlation with PM2.5 and PM10, and
- create scatter plots to visualise how pollution changes with different weather conditions.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv(
    "../data/process_data/weather_stage1_loaded.csv",
    parse_dates=["timestamp"],
)

df.head()


In [None]:
cols = [
    "temp_c",
    "rain_mm",
    "humidity_pct",
    "pressure_hpa",
    "wind_speed_mps",
    "pm25",
    "pm10",
]

df_weather = df[cols].copy()
df_weather.describe()


In [None]:
corr = df_weather.corr()
corr


In [None]:
plt.figure(figsize=(5,4))
plt.scatter(df["temp_c"], df["pm25"], alpha=0.2)
plt.xlabel("Temperature (°C)")
plt.ylabel("PM2.5 (µg/m³)")
plt.title("PM2.5 vs Temperature")
plt.tight_layout()
plt.show()


In [None]:
plt.figure(figsize=(5,4))
plt.scatter(df["temp_c"], df["pm25"], alpha=0.2, label="PM2.5")
plt.scatter(df["temp_c"], df["pm10"], alpha=0.2, label="PM10")
plt.xlabel("Temperature (°C)")
plt.ylabel("Concentration (µg/m³)")
plt.title("PM vs Temperature")
plt.legend()
plt.tight_layout()
plt.show()


In [None]:
plt.figure(figsize=(5,4))
plt.scatter(df["wind_speed_mps"], df["pm25"], alpha=0.2)
plt.xlabel("Wind speed (m/s)")
plt.ylabel("PM2.5 (µg/m³)")
plt.title("PM2.5 vs Wind speed")
plt.tight_layout()
plt.show()


In [None]:
plt.figure(figsize=(5,4))
plt.scatter(df["wind_speed_mps"], df["pm10"], alpha=0.2)
plt.xlabel("Wind speed (m/s)")
plt.ylabel("PM10 (µg/m³)")
plt.title("PM10 vs Wind speed")
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(5,4))
plt.scatter(df["rain_mm"], df["pm25"], alpha=0.2)
plt.xlabel("Rain (mm)")
plt.ylabel("PM2.5 (µg/m³)")
plt.title("PM2.5 vs Rain")
plt.tight_layout()
plt.show()


### Summary

- Correlations between meteorological variables and PM are generally weak.
- Wind speed tends to be slightly negatively related to PM2.5 and PM10.
- Rainfall does not show a clear cleaning effect on PM in this dataset.
- Overall, the dataset does not exhibit strong or realistic weather–pollution patterns,
  which is consistent with the idea that the Kaggle data might be synthetic.
