Predictive Modelling of Air Quality in London

This notebook collects air quality data for **Bethnal Green** using the [ERG API](https://api.erg.ic.ac.uk/AirQuality/help).

- üìç Location: Bethnal Green (site code: `BG1`)
- üìÜ Date range: 1 Jan 2025 ‚Äì 1 May 2025
- üíæ Output: JSON dataset for analysis


In [None]:
# Import required packages
import requests
import json
import os
import pandas as pd


here is the source of the api code : https://www.londonair.org.uk/LondonAir/API/

and codes: https://api.erg.ic.ac.uk/AirQuality/help

In [None]:
# Define API query parameters
site_code = 'BG1'  # Bethnal Green
start_date = '2025-01-01'
end_date = '2025-05-01'

url = (
    f'https://api.erg.ic.ac.uk/AirQuality/Data/Site/'
    f'SiteCode={site_code}/StartDate={start_date}/EndDate={end_date}/Json'
)

#  Set up output file
output_dir = './data'
os.makedirs(output_dir, exist_ok=True)
filename = f'air_quality_{site_code}_{start_date}_to_{end_date}.json'
filepath = os.path.join(output_dir, filename)


download and save JSON guide

In [None]:
#  Fetch and save JSON from ERG API
response = requests.get(url)

if response.status_code == 200:
    with open(filepath, 'w') as f:
        json.dump(response.json(), f, indent=2)
    print(f"‚úÖ Data saved to {filepath}")
else:
    print(f"‚ùå Failed to fetch data. Status code: {response.status_code}")


 5. Python code cell ‚Äî Load JSON into DataFrame

In [None]:
#  Load and inspect JSON structure
with open(filepath) as f:
    data = json.load(f)

# Check top-level keys to find the data section
list(data.keys())


Next Steps

- Explore the nested JSON structure (likely under a key like `'AirQualityData'`)
- Use `pd.json_normalize()` to flatten the records into a table
- Filter for key pollutants: NO‚ÇÇ, PM10, PM2.5
- Clean timestamps, remove nulls, and visualise