### Download File

# Air Quality and Pollution Terms

## 1. NOx (Nitrogen Oxides)
- Refers to a group of gases composed of nitrogen and oxygen.
- The most common nitrogen oxides are nitrogen dioxide (NO2) and nitric oxide (NO).
- Produced from vehicle emissions, industrial processes, and combustion of fossil fuels.
- Contributes to the formation of smog and can have harmful effects on human health and the environment.

## 2. NO2 (Nitrogen Dioxide)
- A specific type of nitrogen oxide.
- Reddish-brown gas with a characteristic sharp, biting odor.
- Primarily produced from burning fossil fuels.
- A significant air pollutant that can irritate the respiratory system and is associated with health problems, including asthma and other lung diseases.

## 3. PM10 (Particulate Matter 10 micrometers or less)
- Refers to particulate matter that is 10 micrometers or smaller in diameter.
- Can include dust, pollen, soot, and smoke.
- Can be inhaled and may cause health issues, particularly respiratory problems, as they can penetrate the lungs.

## 4. PM2.5 (Particulate Matter 2.5 micrometers or less)
- Consists of finer particulate matter that is 2.5 micrometers or smaller.
- Originates from various sources, including vehicle emissions, industrial processes, and natural sources like wildfires.
- Particularly concerning for health as it can penetrate deep into the lungs and enter the bloodstream, leading to serious health effects, including cardiovascular and respiratory diseases.

# Generate the dataset

Run all the cells in order to generate a directory in your local machine with the proper and formatted dataset.



#### Installs

In [None]:
!pip install requests pandas scikit_learn openpyxl

#### Method for downloading the file


In [5]:
import requests
from tempfile import TemporaryDirectory

emissions_url_excel = "https://data.london.gov.uk/download/london-atmospheric-emissions-inventory--laei--2019/17d21cd1-892e-4388-9fea-b48c1b61ee3c/LAEI-2019-Emissions-Summary-including-Forecast.zipc"

def download_dataset(url):
    tempdir = TemporaryDirectory(prefix="downloaded", suffix="datasets", dir=".")
    with requests.get(url) as response:
        with open(f"{tempdir.name}/datasets.zip", "wb") as f:
            f.write(response.content)
    return tempdir

### Extract the file

In [6]:
from zipfile import ZipFile

def unzip(path):
    with ZipFile(f"{path}/datasets.zip") as zipf:
        zipf.extractall(path)

### Preparation

In [9]:
import pandas
from pathlib import Path

dir_ = download_dataset(emissions_url_excel)
unzip(dir_.name)
files = Path(".").rglob("**/*/*.xlsx")
file = pandas.read_excel(next(files).as_posix(), sheet_name="Emissions by Grid ID")

#### Filling missing values with the mean

In [15]:
key_pollutants = ["nox", "n2o", "pm10", "pm2.5", "co2"]
filled_na_with_mean = file[file.Year < 2020].copy()

for column in key_pollutants:
    colmean = filled_na_with_mean[column].mean()
    filled_na_with_mean[column] = filled_na_with_mean[column].fillna(colmean)

#### Exporting the file

In [16]:
group_columns = ["Year", "Sector", *key_pollutants]

filled_na_with_mean[group_columns]\
.groupby(by=["Year", "Sector"])\
.sum()\
.reset_index()\
.to_csv(f"{dir_.name}/LAEI_2019_NA_FILLED_WITH_MEAN.csv", index=False)