# Air Quality

The air quality data from urban traffic stations in Spanish cities with more than >100000 inhabitants are studied in order to obtain the effect of COVID-19 lockdown in the air quality.

In [1]:
# Load packages
#suppressMessages(library(saqgetr))
#suppressMessages(library(lubridate))
#suppressMessages(library(tidyverse))
#suppressMessages(library(openxlsx))
#suppressMessages(library(plyr))
#suppressMessages(library(openair))

In [2]:
# Working directory
setwd("~/Repositories/AirQualityCOVID")

# Source some scripts 
source("src/Curation/airQuality.R")

### Main Variables

In [3]:
site_type <- "traffic"
site_area <- "urban"

start_dt <- ymd_hms("2013-01-01 00:00:00")
end_dt <- ymd_hms("2020-12-30 00:00:00")

pollutants <- c("no", "no2", "o3", "pm10", "pm2.5")

## Spanish stations information

In [4]:
# AQ station in cities with more than 100000 inhabitants
sites.100mil <- read.xlsx("data/xlsx/estaciones-CA-JA.xlsx",
                          sheet="ciudades-100000-A") %>% 
                    select("Municipio", "Población",
                           "Estación.tráfico", "Código.estación") 

In [5]:
spain.sites <- get_saq_sites() %>%
    filter(country == "spain",
           site %in% sites.100mil$"Código.estación",
           site_type == "traffic",
           site_area == "urban",
           date_start <= start_dt,
           ) %>%
    select(site, site_name, latitude, longitude, elevation, 
           country, site_type, site_area, date_start, date_end)

In [6]:
sites.AQ <- merge(x = spain.sites,
                  y = sites.100mil,
                  by.x = "site", by.y="Código.estación",
                  all.x = TRUE) 

## Curation of Air Quality data

Estudiar de que contaminantes no hay datos suficientes para el estudio en cada estacion. Se ha de comprobar que hay datos ($> 80\%$) durante el periodo de interes (`1-Marzo-2020` <=> `30-Junio-2020`) con resolución diaria. Tambien se comprueban si hay intervalos largos de tiempo sin datos, utilizando una resolucion minima semanal, mensual y anual.

|   site   |  Pollutant   |   start_yr   |    end_yr    | hv.min  | missing.wk | missing.mnth | missing.yr |
|----------|--------------|--------------|--------------|---------|------------|--------------|------------|
| es0001a  |     no2      |  01-01-2015  |  02-01-2015  |  TRUE   |     34     |      2       |     0      |
| es0001a  |     no       |  01-01-2015  |  02-01-2015  |  TRUE   |     40     |     12       |     1      |
| es0001a  |      o3      |  01-01-2015  |  02-01-2015  |  FALSE  |      4     |      0       |     0      |

Puesto que el estudio completo incluye 67 estaciones, supondria demasiado tiempo de computacion, por lo que en este notebook solo se trabajara con 3 estaciones. El calculo completo de todas las estaciones se realiza en un script de ```R```

### Curation Variables

In [7]:
#------------------------------
#      Curation Variables      
#------------------------------

hv.min.percent <- 0.8 # data > 80%
main.prd <- c(ymd_hms("2020-03-01 00:00:00"),
               ymd_hms("2020-06-30 00:00:00"))

#sites.lv <- levels(as.factor(sites.AQ$site))[1:3]
sites.lv <- c("es0118a", "es1438a") # Big cities (Madrid and Barcelona)",
sites.lv <- c(sites.lv, "es1580a", "es1340a") # small cities (Santander and Huelva)

pairs.st.pll <- do.call(rbind,
                        do.call(rbind,
                                lapply(pollutants, function(pll){
                                    lapply(sites.lv, c, pll)
                                }))
                       ) 

In [8]:
#------------------------------
#      Curation Process        
#------------------------------

curate.info <- do.call(rbind.fill,
                       apply(pairs.st.pll, 1, main.curation,
                             c(start_dt, end_dt), main.prd, hv.min.percent)
                      )
head(curate.info)

[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."
[1] "Downloading..."


Unnamed: 0_level_0,site,Pollutant,start_yr,end_yr,hv.min,mss.wk,mss.mnth,mss.yr
Unnamed: 0_level_1,<chr>,<chr>,<date>,<date>,<lgl>,<int>,<int>,<int>
1,es0118a,no,2013-01-01,2020-12-30,True,-1,0,0
2,es0118a,no2,2013-01-01,2020-12-30,True,-1,0,0
3,es0118a,o3,2013-01-01,2020-12-30,True,-1,0,0
4,es0118a,pm10,2013-01-01,2020-12-30,True,-1,0,0
5,es0118a,pm2.5,2013-01-01,2020-12-30,True,-1,0,0
6,es1438a,no,2013-01-01,2020-12-30,True,-1,0,0


### Filter Data by Parameters
| Parameter | Value |
|----|----|
| hv.min | TRUE |
|miss.yr | $< 5$ |

In [9]:
valid.info <- curate.info[curate.info$hv.min == TRUE,]
valid.info <- valid.info[valid.info$mss.yr < 5,]

checked_sitesAQ <- merge(x = valid.info %>%
                             select(site, Pollutant),
                         y = sites.AQ,
                         by = "site", all.x = T, all.y=F) 

head(checked_sitesAQ)

Unnamed: 0_level_0,site,Pollutant,site_name,latitude,longitude,elevation,country,site_type,site_area,date_start,date_end,Municipio,Población,Estación.tráfico
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<dttm>,<dttm>,<chr>,<dbl>,<chr>
1,es0118a,no,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-02-15 00:00:00,Madrid,3266128,ESCUELAS AGUIRRE
2,es0118a,no2,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-02-15 00:00:00,Madrid,3266128,ESCUELAS AGUIRRE
3,es0118a,o3,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-02-15 00:00:00,Madrid,3266128,ESCUELAS AGUIRRE
4,es0118a,pm10,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-02-15 00:00:00,Madrid,3266128,ESCUELAS AGUIRRE
5,es0118a,pm2.5,ESCUELAS AGUIRRE,40.42167,-3.682222,672,spain,traffic,urban,2002-11-19,2021-02-15 00:00:00,Madrid,3266128,ESCUELAS AGUIRRE
6,es1340a,no,POZO DULCE,37.25336,-6.93514,18,spain,traffic,urban,1997-01-01,2021-02-15 01:00:00,Huelva,143663,POZO DULCE


### Guardar Datos en csv <a id="saveAQ"></a>

```R
write.csv(checked_sitesAQ, 
          "data/Curation/checked_AQ.csv", row.names=FALSE)
```