# Air Quality

The air quality data from urban traffic stations in Spanish cities with more than >100000 inhabitants are studied in order to obtain the effect of COVID-19 lockdown in the air quality.

The whole curation process have been performed using the `src/` scripts and this notebook only show examples of the process for monitoring sites in Madrid capital.

In [1]:
# Load packages
suppressMessages(library(saqgetr))
suppressMessages(library(lubridate))
suppressMessages(library(tidyverse))
suppressMessages(library(openxlsx))
suppressMessages(library(plyr))
suppressMessages(library(openair))

In [2]:
# Working directory
setwd("AirQualityCOVID")

# Source some scripts 
source("src/curation/airQuality.R")

In [3]:
Sys.setlocale("LC_ALL", "es_ES.UTF-8")

### Main Variables

In [4]:
site_type <- "traffic"
site_area <- "urban"

start_dt <- ymd_hms("2013-01-01 00:00:00")
end_dt <- ymd_hms("2020-12-30 00:00:00")

pollutants <- c("no", "no2", "o3", "pm10", "pm2.5")

## Spanish stations information

In [5]:
# AQ station in cities with more than 100000 inhabitants
sites.100mil <- read.xlsx("data/curation/estaciones-CA-JA.xlsx",
                          sheet="ciudades-100000-A") %>% 
                    filter(Municipio == "Madrid") %>%
                    select("Municipio", "Población",
                           "Estación.tráfico", "Código.estación") 

In [7]:
spain.sites <- get_saq_sites() %>%
    filter(country == "spain",
           site %in% sites.100mil$"Código.estación",
           site_type == "traffic",
           site_area == "urban",
           date_start <= start_dt,
           ) %>%
    select(site, site_name, latitude, longitude, elevation, 
           country, site_type, site_area, date_start, date_end)

In [8]:
sites.AQ <- merge(x = spain.sites,
                  y = sites.100mil,
                  by.x = "site", by.y="Código.estación",
                  all.x = TRUE) 

## Curation of Air Quality data

Only air quality data, by pollutant, were retained when there were observations available for more than 3 years and at least the 80% of daily data between March, 2020 and June 2020. These months include the entire time-period of lockdown and de-escalation phases in Spain.

|   site   |  variable   |   start_yr   |    end_yr    | hv.min  | missing.wk | missing.mnth | missing.yr |
|----------|--------------|--------------|--------------|---------|------------|--------------|------------|
| es0001a  |     no2      |  01-01-2015  |  02-01-2015  |  TRUE   |     34     |      2       |     0      |
| es0001a  |     no       |  01-01-2015  |  02-01-2015  |  TRUE   |     40     |     12       |     1      |
| es0001a  |      o3      |  01-01-2015  |  02-01-2015  |  FALSE  |      4     |      0       |     0      |

| Parameter | Value |
|----|----|
| hv.min | TRUE |
|miss.yr | $< 5$ |

### Curation Variables

In [9]:
#------------------------------
#      Curation Variables      
#------------------------------

hv.min.percent <- 0.8 # data > 80%
main.prd <- c(ymd_hms("2020-03-01 00:00:00"),
               ymd_hms("2020-06-30 00:00:00"))

pairs.st.pll <- do.call(rbind,
                        do.call(rbind,
                                lapply(pollutants, function(pll){
                                    lapply(levels(as.factor(sites.AQ$site)), c, pll)
                                }))
                       ) 

In [10]:
#------------------------------
#      Curation Process        
#------------------------------

curate.info <- do.call(rbind.fill,
                       apply(pairs.st.pll, 1, main.curation,
                             c(start_dt, end_dt), main.prd, hv.min.percent)
                      )
head(curate.info)

[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."


Unnamed: 0_level_0,site,variable,start_dt,end_dt,hv.min,mss.wk,mss.mnth,mss.yr
Unnamed: 0_level_1,<fct>,<fct>,<date>,<date>,<lgl>,<int>,<int>,<int>
1,es0115a,no,2013-01-01,2020-12-30,False,-1,0,0
2,es0115a,no2,2013-01-01,2020-12-30,False,-1,0,0
3,es0118a,no,2013-01-01,2020-12-30,True,-1,0,0
4,es0118a,no2,2013-01-01,2020-12-30,True,-1,0,0
5,es0118a,o3,2013-01-01,2020-12-30,True,-1,0,0
6,es0118a,pm10,2013-01-01,2020-12-30,True,-1,0,0


In [11]:
valid.info <- curate.info[curate.info$hv.min == TRUE,]
valid.info <- valid.info[valid.info$mss.yr < 5,]

checked_sitesAQ <- merge(x = valid.info %>%
                             select(site, variable),
                         y = sites.AQ,
                         by = "site", all.x = T, all.y=F) 

head(checked_sitesAQ)

Unnamed: 0_level_0,site,variable,site_name,latitude,longitude,elevation,country,site_type,site_area,date_start,date_end,Municipio,Población,Estación.tráfico
Unnamed: 0_level_1,<fct>,<fct>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dttm>,<dttm>,<chr>,<dbl>,<chr>
1,es0118a,no,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-10-10 23:00:00,Madrid,3266126,ESCUELAS AGUIRRE
2,es0118a,no2,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-10-10 23:00:00,Madrid,3266126,ESCUELAS AGUIRRE
3,es0118a,o3,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-10-10 23:00:00,Madrid,3266126,ESCUELAS AGUIRRE
4,es0118a,pm10,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-10-10 23:00:00,Madrid,3266126,ESCUELAS AGUIRRE
5,es0118a,pm2.5,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-10-10 23:00:00,Madrid,3266126,ESCUELAS AGUIRRE
6,es0120a,no,RAMÓN Y CAJAL,40.45167,-3.677222,708,spain,traffic,urban,2002-01-01,2021-10-10 23:00:00,Madrid,3266126,RAMÓN Y CAJAL
