# Pollutant data trends

#### Notebook Purpose

This notebook displays the last month's worth of data for all pollutants for a city

# Brief explanation of Jupyter Notebook:

Just incase you are new to Jupyter notebooks, here is a very quick explanation of the basic structure of a notebook.

#### Code Cells:

The notebook is made up of cells. A common cell type is a code cell, where you write and execute code. 

When you run the cell, the output (such as text, numbers, or plots) is displayed directly below the cell.

#### Markdown Cells:

Markdown cells allow you to write formatted text using Markdown syntax. You can include headers, bullet points, links, and even equations (using LaTeX). These are useful for adding explanations, notes, or documentation.

Example: You might see sections labeled "Install Packages", "Global Variables", or "Get Cities" in a notebook to organize information and code.

#### Interactive Output:

In addition to displaying text, code cells can show rich output such as plots (using libraries like matplotlib),tables, or even interactive widgets. This allows you to explore data visually within the same environment.

Example: A notebook could display a line chart or scatter plot right after running code that generates the data.

# Install packages

Before you run any other code cells ensure all the packages are installed. 

Do this by running the code cell below. Do so by clicking the play symbol next to the code cell on the left had side in the margin. 

Depending on what you are using to view this notebook you may have to hover over the left side of the code cell in the margin to get it to apear. This is the case in visual studio.

**PLEASE RUN THE CODE CELL BELOW BEFORE ANYTHING ELSE**

In [None]:
%pip install ipywidgets
%pip install requests
%pip install matplotlib

# Global Variables

To connect to the VAirfy API we need to tell our notebook where it needs to send its requests to. We can do this by setting the url variable below.

**By default** it is set to use the production API. However should you have a local database and api set up and you wish to use that, you are able to set that below. 

**IF YOU ARE HAPPY WITH THE URL BELOW RUN THE CODE CELL BEFORE MOVING ON TO THE NEXT STEP**

In [2]:
AIR_QUALITY_API_URL = "http://64.225.143.231/api"

# Get Cities

Run code below to get the cities from the database. 

Once you have run the code cell a selection box should apear. Simple click on the city you want to use.

**PLEASE RUN THE CODE CELL BELOW BEFORE ANYTHING ELSE**

In [3]:
from ipywidgets import interact
import ipywidgets as widgets
import csv
cities = []
city: str
def update_chosen_city(city_input: str):
      global city
      city = city_input
with open('../../deployment/database/CAMS_locations_V1.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader)
    for row in csv_reader:
            cities.append(row[1])

cities_widget = widgets.Dropdown(
    options=cities,
    value="Lima",
    description='Cities',
)
interact(update_chosen_city, city_input=cities_widget)

interactive(children=(Dropdown(description='Cities', index=112, options=('Dubai', 'Abu Dhabi', 'Kabul', 'Bueno…

<function __main__.update_chosen_city(city_input: str)>

In [4]:
import requests
import os
from datetime import datetime
from dateutil.relativedelta import relativedelta

endpoint = AIR_QUALITY_API_URL + "/air-pollutant/measurements"

params = {
    "date_from": datetime.now() - relativedelta(months=1),
    "date_to": datetime.now(),
    "location_type": "city",
    "location_names": ["London"],
    "api_source": "OpenAQ",
}

response = requests.get(endpoint, params=params)

print(response.status_code)
api_request_result = response.json()

200


# Prepare Data to display

The below code will prepare data to display on the graph. It groups the data by pollutant and within that by measuring station. Please ensure you run this before running the graph code cell below.

**PLEASE RUN THE CODE CELL BELOW BEFORE ANYTHING ELSE**

In [5]:
processed_data = {"no2":{},
                  "o3":{},
                  "pm2_5":{},
                  "pm10":{},
                  "so2":{}}

def generate_data_scafold():
    return {
        "values":[],
        "times":[]
    }

def update_processed_data(measurement, pollutant, site_name, measurement_date):
    if pollutant in measurement:
        if site_name not in processed_data[pollutant] : processed_data[pollutant][site_name] = generate_data_scafold()
        processed_data[pollutant][site_name]["values"].append(measurement[pollutant])
        processed_data[pollutant][site_name]["times"].append(measurement_date)

for measurement in api_request_result:
    site_name = measurement["site_name"]
    measurement_date = datetime.strptime(measurement["measurement_date"], '%Y-%m-%dT%H:%M:%SZ')
    if "no2" in measurement: update_processed_data(measurement, "no2", site_name, measurement_date)
    if "o3" in measurement: update_processed_data(measurement, "o3", site_name, measurement_date)
    if "pm2_5" in measurement: update_processed_data(measurement, "pm2_5", site_name, measurement_date)
    if "pm10" in measurement: update_processed_data(measurement, "pm10", site_name, measurement_date)
    if "so2" in measurement: update_processed_data(measurement, "so2", site_name, measurement_date)

# Generating the Graphs

The below code cell will generate five graphs. One for each polutant. Each line represents a measuring station.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

def find_max_graph_width(current_max_width, current_data_length):
    if(current_max_width < current_data_length):
       return current_data_length * 0.5
    return current_max_width

for pollutant in processed_data:
    fig = plt.figure()
    ax = fig.add_subplot()
    max_width = 0
    for key in processed_data[pollutant]:
        ax.plot(processed_data[pollutant][key]["times"], processed_data[pollutant][key]["values"])
        ax.xaxis.set_major_locator(mdates.HourLocator())
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M:%S'))
        ax.set_xlabel('Date')
        ax.set_ylabel('Value µg/m³')
        max_width = find_max_graph_width(max_width, len(processed_data[pollutant][key]["times"]))
    fig.set_size_inches(max_width, 10)
    plt.title("A Months worth of " + pollutant + " data for " + city)
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right")
    plt.tight_layout()

plt.show()  # display