
# üåç Impact of Weather Conditions on Air Pollution Levels in London

**Author:** [Your Name]  
**Module:** COM7064 Programming for Data Science  
**Date:** [Insert Date]

---

### 1. Introduction

Air pollution is one of the leading environmental health risks globally. In cities such as London, daily weather variations can significantly influence air pollutant concentrations.  
This project investigates how **temperature, humidity, and wind speed** affect **PM2.5 levels** in London during 2023‚Äì2024.  
Data were obtained from **OpenAQ (air quality)** and the **UK Met Office (weather)**. The analysis follows the **CRISP-DM** framework, including data preparation, exploratory data analysis (EDA), and regression modelling.


## 2. Data Collection and Understanding

In [None]:

import pandas as pd

# Load datasets (replace with actual file paths)
air = pd.read_csv("london_air_quality.csv")
weather = pd.read_csv("london_weather.csv")

# Inspect datasets
display(air.head())
display(weather.head())
air.info()
weather.info()


## 3. Data Cleaning and Preparation

In [None]:

# Convert date columns
air['date'] = pd.to_datetime(air['date'])
weather['date'] = pd.to_datetime(weather['date'])

# Merge datasets on date
df = pd.merge(air, weather, on='date', how='inner')

# Handle missing values
df = df.dropna()

# Optional: filter for 2023‚Äì2024 only
df = df[df['date'].between('2023-01-01', '2024-12-31')]

# Overview of merged dataset
df.describe()


## 4. Exploratory Data Analysis (EDA)

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

# Trend of PM2.5 over time
plt.figure(figsize=(10,5))
sns.lineplot(x='date', y='PM2.5', data=df)
plt.title('Daily PM2.5 Levels in London (2023‚Äì2024)')
plt.show()

# Pairwise relationships
sns.pairplot(df[['PM2.5', 'temperature', 'humidity', 'wind_speed']])
plt.show()

# Correlation matrix
corr = df[['PM2.5', 'temperature', 'humidity', 'wind_speed']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Between Weather and Air Pollution')
plt.show()


## 5. Statistical & Regression Analysis

In [None]:

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

# Select features and target
X = df[['temperature', 'humidity', 'wind_speed']]
y = df['PM2.5']

# Build regression model
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Evaluate model
print("R¬≤:", r2_score(y, y_pred))
print("RMSE:", mean_squared_error(y, y_pred, squared=False))

# Coefficients
coef_df = pd.DataFrame({'Variable': X.columns, 'Coefficient': model.coef_})
display(coef_df)


### Correlation Significance Test

In [None]:

from scipy.stats import pearsonr

r, p = pearsonr(df['PM2.5'], df['humidity'])
print(f"Correlation between PM2.5 and Humidity: r = {r:.2f}, p = {p:.4f}")



## 6. Evaluation & Data Story

- **Key Findings:**  
  - Which weather factors had the strongest influence on PM2.5?  
  - Were relationships positive or negative?  
  - Is the model accurate (based on R¬≤)?  

- **Limitations:**  
  - Data only covers 2023‚Äì2024, no long-term trends.  
  - Possible missing pollutants or local events (e.g., traffic).  
  - Other environmental variables not included.

- **Interpretation:**  
  Explain how these results could support policymakers or health experts.



## 7. Conclusion

This study demonstrates that weather conditions ‚Äî particularly wind speed and humidity ‚Äî have measurable effects on air pollution levels in London.  
Wind speed generally shows an inverse relationship with PM2.5, indicating that stronger winds help disperse pollutants.  
These findings can assist local authorities in forecasting poor air quality days and informing the public.  

### Future Work:
- Include other pollutants (NO‚ÇÇ, O‚ÇÉ).  
- Extend analysis to multiple cities.  
- Apply time-series forecasting models (e.g., ARIMA).

---

**References (Harvard Style)**  
- OpenAQ (2024). *Open Air Quality Data Portal.* Available at: https://openaq.org  
- Met Office (2024). *Climate and Weather Data.* Available at: https://www.metoffice.gov.uk  
- WHO (2023). *Ambient Air Pollution: A Global Assessment.* World Health Organization.  
