# Resumen de la informacion relevante de cada estacion

El objetivo de este codigo es obtener una tabla resumen con la relacion n-n entre estaciones de calidad del aire y meteorologicas que ademas muestre cuanta informacion relevante aporta cada estacion. A partir de dicha tabla se espera decidir que estaciones aportan mas datos al estudio.

PUESTO QUE SE TRATA DE UNA GRAN CANTIDAD DE DATOS Y REALIZAR TODA LA EJECUCION LLEVARIA DEMASIADO TIEMPO, EN ESTE NOTEBOOK SE TRABAJA SOLO CON 3 ESTACIONES DE CALIDAD DEL AIRE Y EN NINGUN MOMENTO SE ESCRIBE NI LEE DE LOS ARCHIVOS (aunque el codigo se encuentra disponible en formato Markdown).

> [ESTACIONES DE CALIDAD DEL AIRE](#sitesAQ)
>
>    * [Configuracion del notebook](#config)
>    
>    * [Estaciones de españa](#espanha)
>
>         1) [Datos de las estaciones de ```saqgetr```](#estacionesSaqgetr)
>
>         2)[Datos geograficos de las estaciones](#estacionGeo)
>
>    * [Estudio de los Datos de las Estaciones](#stdio)
>
>    * [Guardar datos en csv](#saveAQ)
>
> [ESTACIONES DE LA AEMET](#sitesMto)
>
>    * [Guardar datos en csv](#saveMto)
>
> [INFORMACION RELEVANTE](#countdata)
>
>    * [Obtener los datos de Calidad del Aire](#getAQ)
>
>    * [Contar valores de Calidad del Aire](#countAQ)
>
>    * [Contar valores de Calidad del Aire](#countAQ)
>
>    * [Contar valores Meteorologicos](#countMto)
>
>    * [Agrupar todos los datos principales](#groupAll)


---
---

## ESTACIONES DE CALIDAD DEL AIRE <a id="sitesAQ"></a>

### Configuracion del notebook <a id="config"></a>

Cargamos todos los paquetes necesarios para ejecutar el notebook

In [1]:
# Loading
suppressMessages(library(saqgetr))
suppressMessages(library(tidyverse))
suppressMessages(library(lubridate))
suppressMessages(library(worldmet))
suppressMessages(library(readxl))

## Parametros de los datos <a id="param"></a>

In [2]:
# contaminantes a estudiar
pollutants <- c("no", "no2", "o3", "pm10")

# fechas de inicio y final de toma de datos
start_dt <- ymd_hms("2015-01-01 00:00:00")
end_dt <- ymd_hms("2020-10-01 00:00:00")

# fecha de inicio de confinamiento
lckdwn_strt <- ymd_hms("2020-03-14 00:00:00")

# Archivos de los datos
# NO SE TRABAJA CON ARCHIVOS
sitesAQ.fl = NULL# "../data/csv/sitesAQ.csv"                                           
dataAQ.fl = NULL# "../data/csv/dataAQ.csv"

sitesMto.fl = NULL# "../data/csv/sitesMto.csv"                                         
dataMto.fl = NULL# "../data/csv/dataMto.csv"                                          
years = 2010:2020

final.fl = NULL# "../data/csv/nn_sites.csv"

# Guardar los datos de la calidad del aire
save.data = FALSE

### Estaciones de España <a id="espanha"></a>

#### Datos geograficos de las estaciones <a id="estacionGeo"></a>

Seleccionamos aquellas estaciones que se encuentren en un núcleo con una
población mayor o igual a cien mil habitantes `poblacion >= 100 000 hab`.

En la hoja 8 `ciudades-100000-A-JA` de la base de datos de poblaciones (`estaciones-CA.xlsx`) aparecen todas las estaciones de trafico de las ciudades de mas de 100000 habitantes con sus nombres.

In [3]:
file <- "../data/xlsx/estaciones-CA-JA.xlsx"
sheets <- c("todas", "traffic", "traffic-urban", "traffic-urban-2020",
            "traffic-suburban", "traffic-suburban-2020",
            "ciudades-100000", "ciudades-100000-A")
# xlsx files
sites.100mil <- read_excel(file, sheet=sheets[8])

In [4]:
head(sites.100mil)

Municipio,Población,Estación tráfico,Código estación,Nº estaciones Ecolog,Nº estaciones tráfico,Observaciones
<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>
A Coruña,245711,CORLAB 1,es1138a,4,1,"Es Riazor o Santa Margarida, revisar. Datos sólo hasta 2018. REVISAR"
Alcalá de Henares,195649,Alcalá de Henares,es1563a,0,1,
Alcobendas,117040,Alcobendas,es1564a,0,1,
Alicante,334887,Florida-Babel,es1915a,3,1,Urban background. Not traffic
Alicante,334888,ALACANT-EL PLÁ,es1635a,3,1,Esta es de tráfico
Almería,198533,MEDITERRÁNEO,es1393a,0,1,


#### Datos de las estaciones de `saqgetr` <a id="estacionesSaqgetr"></a>

Importamos la informacion de las estaciones de calidad de aire de españa obtenidas de la base de datos y filtramos segun los criterios de estudio.

| Variable | Valores |
|-------|-----------|
|Contaminantes| $NO$, $NO_2$, $O_3$, $PM_{10}$|
|Fecha Inicio| 01 Enero 2015 |
|Fecha Final | 31 Diciembre 2020 |
| Site Type | traffic |
|site area | urban |

In [5]:
# obtener datos de CA de España. Salen los códigos de las estaciones
# de Calidad de aire (941)
spain.sites <- get_saq_sites() %>%
    filter(country == "spain",
           site %in% sites.100mil$"Código estación",
           site_type == "traffic",
           site_area == "urban",
           date_start <= start_dt,
           date_end >= end_dt,
           ) %>%
    select(site, site_name, latitude, longitude, elevation, 
           country, site_type, site_area, date_start, date_end)

In [6]:
# SOLO SE TRABAJA CON 3 ESTACIONES DE CALIDAD DEL AIRE
spain.sites <- spain.sites[1:3, ]

In [7]:
removed.sites <- nrow(sites.100mil) - nrow(spain.sites)

print(paste("Se han eliminado", removed.sites, "estaciones"))
print(paste("Quedan", nrow(spain.sites), "estaciones para el estudio"))

[1] "Se han eliminado 76 estaciones"
[1] "Quedan 3 estaciones para el estudio"


Agrupamos en un solo data.frame toda la informacion relevante de las estaciones de estudio

In [8]:
sites.info <- get_saq_processes() %>%
    filter(site %in% spain.sites$site,
           variable %in% pollutants,
           date_start <= start_dt,
           #date_end >= end_dt,
          ) %>%
    select(process, site, variable, variable_long, 
           period, unit, observation_count
          )

sites.geo <- sites.100mil %>%
    select("Municipio", "Población", 
           "Estación tráfico", "Código estación") %>%
    rename(site = "Código estación",)

sitesAQ <- merge(x = sites.info, y = sites.geo, by = "site", all.x = TRUE)
sitesAQ <- merge(x = sitesAQ, y = spain.sites, by = "site", all.x = TRUE)

In [9]:
# Convert charactes into factors
sitesAQ <- mutate_if(sitesAQ, is.character, as.factor)

In [10]:
head(sitesAQ)

Unnamed: 0_level_0,site,process,variable,variable_long,period,unit,observation_count,Municipio,Población,Estación tráfico,site_name,latitude,longitude,elevation,country,site_type,site_area,date_start,date_end
Unnamed: 0_level_1,<fct>,<int>,<fct>,<fct>,<fct>,<fct>,<dbl>,<fct>,<dbl>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<fct>,<fct>,<fct>,<dttm>,<dttm>
1,es0041a,31140,no2,Nitrogen dioxide (air),day,ug.m-3,151,Bilbao,346843,María Diaz de Haro,DIRECCIÓN DE SALUD,43.25883,-2.94565,32,spain,traffic,urban,1986-04-01,2021-01-15 23:00:00
2,es0110a,31281,o3,Ozone (air),day,ug.m-3,4092,Bilbao,24350,Erandio,ERANDIO,43.30268,-2.97724,4,spain,traffic,urban,1997-01-01,2021-01-15 23:00:00
3,es0110a,31282,o3,Ozone (air),dymax,ug.m-3,4095,Bilbao,24350,Erandio,ERANDIO,43.30268,-2.97724,4,spain,traffic,urban,1997-01-01,2021-01-15 23:00:00
4,es0110a,31283,o3,Ozone (air),hour,ug.m-3,94149,Bilbao,24350,Erandio,ERANDIO,43.30268,-2.97724,4,spain,traffic,urban,1997-01-01,2021-01-15 23:00:00
5,es0110a,31284,o3,Ozone (air),hour8,ug.m-3,97891,Bilbao,24350,Erandio,ERANDIO,43.30268,-2.97724,4,spain,traffic,urban,1997-01-01,2021-01-15 23:00:00
6,es0110a,31285,no2,Nitrogen dioxide (air),day,ug.m-3,5324,Bilbao,24350,Erandio,ERANDIO,43.30268,-2.97724,4,spain,traffic,urban,1997-01-01,2021-01-15 23:00:00


### Guardar Datos en csv <a id="saveAQ"></a>

```R
write.csv(sitesAQ, sitesAQ.fl, row.names=FALSE)
```

---
---

## ESTACIONES DE LA AEMET <a id="sitesMto"></a>

Se pueden localizar las estaciones meteo mas cercanas a una estacion de CA mediante mapa

Obtenemos la informacion de las estaciones de calidad del aire del estudio

```R
sitesAQ <- read.csv(sitesAQ.fl)
head(sitesAQ)
```

Obtenemos la informacion de las 3 estaciones meteorologicas mas cercanas a cada estacion de calidad del aire

In [11]:
sitesMto <- data.frame()
sites.lv <- levels(sitesAQ$site)

for (i in 1:length(sites.lv)) {
    mto <- getMeta(lat = sitesAQ[sitesAQ$site == sites.lv[i], ]$latitude[1], 
                   lon = sitesAQ[sitesAQ$site == sites.lv[i], ]$longitude[1], 
                   end.year = "current",
                   n = 3, returnMap = F)
    mto$siteAQ <- sites.lv[i]
    sitesMto <- rbind(sitesMto, mto)
}

In [12]:
sitesMto <- mutate_if(sitesMto, is.character, as.factor)

In [13]:
head(sitesMto)

usaf,wban,station,ctry,st,call,latitude,longitude,elev(m),begin,end,code,longr,latr,dist,siteAQ
<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<date>,<date>,<fct>,<dbl>,<dbl>,<dbl>,<fct>
80250,99999,BILBAO,SP,,LEBB,43.301,-2.911,42.1,1973-01-01,2021-01-17,080250-99999,-0.05080653,0.755745,5.464009,es0041a
80800,99999,VITORIA,SP,,LEVT,42.883,-2.724,512.7,1973-01-01,2021-01-17,080800-99999,-0.04754277,0.7484495,45.503738,es0041a
80210,99999,SANTANDER,SP,,LEXJ,43.427,-3.82,4.9,1973-01-01,2021-01-17,080210-99999,-0.06667158,0.7579441,73.13704,es0041a
80250,99999,BILBAO,SP,,LEBB,43.301,-2.911,42.1,1973-01-01,2021-01-17,080250-99999,-0.05080653,0.755745,5.36354,es0110a
80800,99999,VITORIA,SP,,LEVT,42.883,-2.724,512.7,1973-01-01,2021-01-17,080800-99999,-0.04754277,0.7484495,50.995824,es0110a
80230,99999,SANTANDER,SP,,,43.483,-3.8,59.0,1973-01-01,2021-01-17,080230-99999,-0.06632251,0.7589215,69.437339,es0110a


### Guardar Datos en csv <a id="saveMto"></a>

```R
write.csv(sitesMto, sitesMto.fl, row.names=FALSE)
```

---
---

## INFORMACION RELEVANTE <a id="countdata"></a>

Calcular el numero de datos disponibles para cada estacion y almacenar en una tabla el codigo de cada estacion junto con dicho valor

### Obtener los datos de Calidad del Aire <a id="getAQ"></a>

```R
if (file.exists(sitesAQ.fl)) {
    sitesAQ <- read.csv(sitesAQ.fl, stringsAsFactors = TRUE)
}
```

```R
if (file.exists(dataAQ.fl)) {
    dataAQ <- read.csv(dataAQ.fl, stringsAsFactors = TRUE)
} else {
```

In [14]:
dataAQ <- get_saq_observations(site = levels(sitesAQ$site),
                             variable = pollutants,
                             valid_only = TRUE,
                             start = start_dt,
                             end = end_dt,
                             verbose = TRUE
                            )
dataAQ <- mutate_if(dataAQ, is.character, as.factor)

# Save data in a file for each station 
if (save.data) {
    print("Saving Data...")
    write.csv(dataAQ, dataAQ.fl, row.names=FALSE)

    split.by.site(dataAQ, site.lv="all",
                  folder="../data/csv/dataAQ/")
}
#}

2021-01-21 13:34:17.934 CET: Loading `air_quality_data_site_es0041a_2015.csv.gz`...

2021-01-21 13:34:18.297 CET: Loading `air_quality_data_site_es0041a_2016.csv.gz`...

2021-01-21 13:34:18.736 CET: Loading `air_quality_data_site_es0041a_2017.csv.gz`...

2021-01-21 13:34:19.359 CET: Loading `air_quality_data_site_es0041a_2018.csv.gz`...

2021-01-21 13:34:19.834 CET: Loading `air_quality_data_site_es0041a_2019.csv.gz`...

2021-01-21 13:34:20.438 CET: Loading `air_quality_data_site_es0041a_2020.csv.gz`...

2021-01-21 13:34:20.994 CET: Loading `air_quality_data_site_es0110a_2015.csv.gz`...

2021-01-21 13:34:21.955 CET: Loading `air_quality_data_site_es0110a_2016.csv.gz`...

2021-01-21 13:34:22.867 CET: Loading `air_quality_data_site_es0110a_2017.csv.gz`...

2021-01-21 13:34:23.944 CET: Loading `air_quality_data_site_es0110a_2018.csv.gz`...

2021-01-21 13:34:25.087 CET: Loading `air_quality_data_site_es0110a_2019.csv.gz`...

2021-01-21 13:34:26.009 CET: Loading `air_quality_data_site_es011

In [15]:
head(dataAQ)

date,date_end,site,variable,process,summary,validity,unit,value
<dttm>,<dttm>,<fct>,<fct>,<int>,<int>,<int>,<fct>,<dbl>
2017-01-01 01:00:00,2017-01-01 02:00:00,es0041a,pm10,115268,1,1,ug.m-3,39
2017-01-01 02:00:00,2017-01-01 03:00:00,es0041a,pm10,115268,1,1,ug.m-3,27
2017-01-01 03:00:00,2017-01-01 04:00:00,es0041a,pm10,115268,1,1,ug.m-3,18
2017-01-01 04:00:00,2017-01-01 05:00:00,es0041a,pm10,115268,1,1,ug.m-3,17
2017-01-01 05:00:00,2017-01-01 06:00:00,es0041a,pm10,115268,1,1,ug.m-3,17
2017-01-01 06:00:00,2017-01-01 07:00:00,es0041a,pm10,115268,1,1,ug.m-3,15


### Contar valores de Calidad del Aire <a id="countAQ"></a>

In [16]:
numCount <- c()
for (st in levels(sitesAQ$site)) {
    numCount <- c(numCount, nrow(dataAQ[dataAQ$site == st, ]))
}

nn.sitesAQ <- data.frame(siteAQ = levels(sitesAQ$site),
                         countAQ = numCount)
rm(dataAQ, numCount)

In [17]:
nn.sitesAQ <- mutate_if(nn.sitesAQ, is.character, as.factor)

In [18]:
head(nn.sitesAQ)

Unnamed: 0_level_0,siteAQ,countAQ
Unnamed: 0_level_1,<fct>,<int>
1,es0041a,62431
2,es0110a,146228
3,es0115a,96260


### Obtener los datos Meteorologicos <a id="getMto"></a>

```R
if (file.exists(sitesMto.fl)) {
    sitesMto <- read.csv(sitesMto.fl, stringsAsFactors=TRUE)
}
```

``` R
if (file.exists(dataMto.fl)) {
    dataMto <- read.csv(dataMto.fl, stringsAsFactors=TRUE)
} else {
```

In [19]:
# Get Data from worlmet
dataMto <- importNOAA(code = levels(sitesMto$code),
                      year = years,
                      hourly = FALSE,
                      n.cores = 4,
                      quiet = FALSE,
                      path = NA
                     )
dataMto <- mutate_if(dataMto, is.character, as.factor)

# Save data in a file for each station 
if (save.data) {
    print("Saving Data...")
    write.csv(dataMto, dataMto.fl, row.names=FALSE)

    split.by.site(dataMto, site.lv="all",
                  folder="../data/csv/dataMto/")
}
#}

[1] "The following sites / years are missing:"
           code year date station
1  082200-99999 2010 <NA>    <NA>
2  082220-99999 2010 <NA>    <NA>
3  082200-99999 2011 <NA>    <NA>
4  082220-99999 2011 <NA>    <NA>
5  082200-99999 2012 <NA>    <NA>
6  082220-99999 2012 <NA>    <NA>
7  082200-99999 2013 <NA>    <NA>
8  082220-99999 2013 <NA>    <NA>
9  082200-99999 2014 <NA>    <NA>
10 082220-99999 2014 <NA>    <NA>
11 082200-99999 2015 <NA>    <NA>
12 082220-99999 2015 <NA>    <NA>
13 082200-99999 2016 <NA>    <NA>
14 082220-99999 2016 <NA>    <NA>


### Contar valores Meteorologicos <a id="countMto"></a>

In [20]:
countMto <- NA                                                                                          
codeMto <- NA
code.lv <- levels(dataMto$code)

for (i in 1:length(code.lv)) {
    codeMto <- rbind(codeMto, code.lv[i])
    
    countMto <- rbind(countMto, colSums(!is.na(
                                        dataMto[dataMto$code == code.lv[i],
                                        !colnames(dataMto) %in% c("date",
                                                                  "station",
                                                                  "latitude",
                                                                  "longitude",
                                                                  "elev"
                                                                 )])
                                        )
                                / nrow(dataMto[dataMto$code == code.lv[i], ])
                     )
    countMto[i+1, "code"] <- nrow(dataMto[dataMto$code == code.lv[i], ])
}
countMto <- data.frame(countMto[complete.cases(countMto), ]) %>%
                rename(countMto = code)

codeMto <- data.frame(codeMto[complete.cases(codeMto), ])
colnames(codeMto) <- c("siteMto")

countMto <- cbind(codeMto, countMto)

nn.sitesMto <- merge(x = sitesMto %>%
                        rename(siteMto=code) %>%
                        select(siteMto, dist, siteAQ),
                     y = countMto,
                     by = "siteMto", all=TRUE
                    )

rm(dataMto, countMto, sitesMto, codeMto)

In [21]:
nn.sitesMto <- mutate_if(nn.sitesMto, is.character, as.factor)
head(nn.sitesMto)

Unnamed: 0_level_0,siteMto,dist,siteAQ,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,⋯,cl_2,cl_3,cl,cl_1_height,cl_2_height,cl_3_height,pwc,precip_12,precip_6,precip
Unnamed: 0_level_1,<fct>,<dbl>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,080210-99999,73.13704,es0041a,214667,0.9998975,0.7823885,0.9988913,0.2556704,0.4832694,0.9984068,⋯,0.3147433,0.0728896384,0.5931466,0.58035,0.312805415,0.07288498,0.0556024,0.005538811,0.00545496,0.10531195
2,080230-99999,69.437339,es0110a,63175,0.9987653,0.9949664,0.9999209,0.9994618,0.9828888,0.999541,⋯,0.01218837,0.0001424614,0.2994064,0.2641393,0.001060546,0.0001108033,0.09402454,0.079730906,0.146988524,0.30516818
3,080250-99999,5.464009,es0041a,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,0.9986055,⋯,0.3538388,0.1144956622,0.6121647,0.5995691,0.352265464,0.1144867228,0.04258724,0.005265299,0.005305526,0.1021012
4,080250-99999,5.36354,es0110a,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,0.9986055,⋯,0.3538388,0.1144956622,0.6121647,0.5995691,0.352265464,0.1144867228,0.04258724,0.005265299,0.005305526,0.1021012
5,080800-99999,45.503738,es0041a,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,0.9976132,⋯,0.33002449,0.0468006695,0.7187508,0.7064374,0.32842915,0.0467965471,0.01709937,0.004839601,0.004864335,0.09352714
6,080800-99999,50.995824,es0110a,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,0.9976132,⋯,0.33002449,0.0468006695,0.7187508,0.7064374,0.32842915,0.0467965471,0.01709937,0.004839601,0.004864335,0.09352714


---
---

## TABLA CON RELACION NN

In [22]:
nn.sites <- merge(x = nn.sitesAQ,
                  y = nn.sitesMto,
                  by = "siteAQ", all = TRUE)
nn.sites <- mutate_if(nn.sites, is.character, as.factor)
head(nn.sites)

Unnamed: 0_level_0,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,visibility,⋯,cl_2,cl_3,cl,cl_1_height,cl_2_height,cl_3_height,pwc,precip_12,precip_6,precip
Unnamed: 0_level_1,<fct>,<int>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,es0041a,62431,080210-99999,73.13704,214667,0.9998975,0.7823885,0.9988913,0.2556704,0.4832694,⋯,0.3147433,0.0728896384,0.5931466,0.58035,0.312805415,0.07288498,0.0556024,0.005538811,0.00545496,0.10531195
2,es0041a,62431,080250-99999,5.464009,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,⋯,0.3538388,0.1144956622,0.6121647,0.5995691,0.352265464,0.1144867228,0.04258724,0.005265299,0.005305526,0.1021012
3,es0041a,62431,080800-99999,45.503738,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,⋯,0.33002449,0.0468006695,0.7187508,0.7064374,0.32842915,0.0467965471,0.01709937,0.004839601,0.004864335,0.09352714
4,es0110a,146228,080230-99999,69.437339,63175,0.9987653,0.9949664,0.9999209,0.9994618,0.9828888,⋯,0.01218837,0.0001424614,0.2994064,0.2641393,0.001060546,0.0001108033,0.09402454,0.079730906,0.146988524,0.30516818
5,es0110a,146228,080250-99999,5.36354,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,⋯,0.3538388,0.1144956622,0.6121647,0.5995691,0.352265464,0.1144867228,0.04258724,0.005265299,0.005305526,0.1021012
6,es0110a,146228,080800-99999,50.995824,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,⋯,0.33002449,0.0468006695,0.7187508,0.7064374,0.32842915,0.0467965471,0.01709937,0.004839601,0.004864335,0.09352714


```R
write.csv(nn.sites, final.fl, row.names=FALSE)
```