# Resumen de la informacion relevante de cada estacion

El objetivo de este codigo es obtener una tabla resumen con la relacion n-n entre estaciones de calidad del aire y meteorologicas que ademas muestre cuanta informacion relevante aporta cada estacion. A partir de dicha tabla se espera decidir que estaciones aportan mas datos al estudio.

PUESTO QUE SE TRATA DE UNA GRAN CANTIDAD DE DATOS Y REALIZAR TODA LA EJECUCION LLEVARIA DEMASIADO TIEMPO, EN ESTE NOTEBOOK SE TRABAJA SOLO CON 3 ESTACIONES DE CALIDAD DEL AIRE Y EN NINGUN MOMENTO SE ESCRIBE NI LEE DE LOS ARCHIVOS (aunque el codigo se encuentra disponible en formato Markdown).

> [Configuracion del notebook](#config)
>
> [ESTACIONES DE CALIDAD DEL AIRE](#sitesAQ)
>    
>    * [Estaciones de españa](#espanha)
>
>         1) [Datos de las estaciones de ```saqgetr```](#estacionesSaqgetr)
>
>         2)[Datos geograficos de las estaciones](#estacionGeo)
>
>    * [Estudio de los Datos de las Estaciones](#stdio)
>
>    * [Guardar datos en csv](#saveAQ)
>
> [ESTACIONES DE LA AEMET](#sitesMto)
>
>    * [Guardar datos en csv](#saveMto)
>
> [INFORMACION RELEVANTE](#countdata)
>
>    * [Obtener los datos de Calidad del Aire](#getAQ)
>
>    * [Contar valores de Calidad del Aire](#countAQ)
>
>    * [Contar valores Meteorologicos](#countMto)
>
>    * [Agrupar todos los datos principales](#groupAll)


### Configuracion del notebook <a id="config"></a>

Cargamos todos los paquetes necesarios para ejecutar el notebook

In [41]:
# Loading
suppressMessages(library(saqgetr))
suppressMessages(library(tidyverse))
suppressMessages(library(lubridate))
suppressMessages(library(worldmet))
suppressMessages(library(openxlsx))

## Parametros de los datos <a id="param"></a>

In [42]:
# contaminantes a estudiar
pollutants <- c("no", "no2", "o3", "pm10", "pm2.5")

# fechas de inicio y final de toma de datos
start_dt <- ymd_hms("2010-01-01 00:00:00")
end_dt <- ymd_hms("2020-10-31 00:00:00")

# Archivos de los datos
# NO SE TRABAJA CON ARCHIVOS
sitesAQ.fl = NULL# "../data/csv/sitesAQ.csv"                                           
dataAQ.fl = NULL# "../data/csv/dataAQ.csv"

sitesMto.fl = NULL# "../data/csv/sitesMto.csv"                                         
dataMto.fl = NULL# "../data/csv/dataMto.csv"                                          
years = 2010:2020

final.fl = NULL# "../data/csv/nn_sites.csv"

# Guardar los datos de la calidad del aire
save.data = FALSE

# Tabla resumen de los filtros aplicados
resum.table <- vector(mode="list")

## ESTACIONES DE CALIDAD DEL AIRE <a id="sitesAQ"></a>

### Estaciones de España <a id="espanha"></a>

#### Datos geograficos de las estaciones <a id="estacionGeo"></a>

Seleccionamos aquellas estaciones que se encuentren en un núcleo con una
población mayor o igual a cien mil habitantes `poblacion >= 100 000 hab`.

En la hoja 8 `ciudades-100000-A-JA` de la base de datos de poblaciones (`estaciones-CA.xlsx`) aparecen todas las estaciones de trafico de las ciudades de mas de 100000 habitantes con sus nombres.

In [43]:
file <- "../data/xlsx/estaciones-CA-JA.xlsx"
sheets <- c("todas", "traffic", "traffic-urban", "traffic-urban-2020",
            "traffic-suburban", "traffic-suburban-2020",
            "ciudades-100000", "ciudades-100000-A")
# xlsx files
sites.100mil <- read.xlsx(file, sheet=sheets[8])

In [44]:
resum.table[[sheets[8]]] <- sites.100mil

In [45]:
head(sites.100mil)

Unnamed: 0_level_0,Municipio,Población,Estación.tráfico,Código.estación,Nº.estaciones.Ecolog,Nº.estaciones.tráfico,Observaciones
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>
1,A Coruña,245711,CORLAB 1,es1138a,4,1,"Es Riazor o Santa Margarida, revisar. Datos sólo hasta 2018. REVISAR"
2,Alcalá de Henares,195649,Alcalá de Henares,es1563a,0,1,
3,Alcobendas,117040,Alcobendas,es1564a,0,1,
4,Alicante,334887,Florida-Babel,es1915a,3,1,Urban background. Not traffic
5,Alicante,334888,ALACANT-EL PLÁ,es1635a,3,1,Esta es de tráfico
6,Almería,198533,MEDITERRÁNEO,es1393a,0,1,


#### Datos de las estaciones de `saqgetr` <a id="estacionesSaqgetr"></a>

Importamos la informacion de las estaciones de calidad de aire de españa obtenidas de la base de datos y filtramos segun los criterios de estudio.

| Variable | Valores |
|-------|-----------|
|Contaminantes| $NO$, $NO_2$, $O_3$, $PM_{10}$|
|Fecha Inicio| 01 Enero 2015 |
|Fecha Final | 31 Diciembre 2020 |
| Site Type | traffic |
|site area | urban |

In [46]:
# obtener datos de CA de España. Salen los códigos de las estaciones
# de Calidad de aire (941)
spain.sites <- get_saq_sites() %>%
    filter(country == "spain",
           site %in% sites.100mil$"Código.estación",
           site_type == "traffic",
           site_area == "urban",
           date_start <= start_dt,
           date_end >= end_dt,
           ) %>%
    select(site, site_name, latitude, longitude, elevation, 
           country, site_type, site_area, date_start, date_end)

In [47]:
removed.sites <- (nrow(sites.100mil)-1) - nrow(spain.sites)

print(paste("Se han eliminado", removed.sites, "estaciones"))
print(paste("Quedan", nrow(spain.sites), "estaciones para el estudio"))

[1] "Se han eliminado 13 estaciones"
[1] "Quedan 65 estaciones para el estudio"


Agrupamos en un solo data.frame toda la informacion relevante de las estaciones de estudio

In [8]:
sites.info <- get_saq_processes() %>%
    filter(site %in% spain.sites$site,
           variable %in% pollutants,
           date_start <= start_dt,
           #date_end >= end_dt,
          ) %>%
    select(process, site, variable, variable_long, 
           period, unit, observation_count, date_start, date_end
          )

In [9]:
head(sites.info)

process,site,variable,variable_long,period,unit,observation_count,date_start,date_end
<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dttm>,<dttm>
31140,es0041a,no2,Nitrogen dioxide (air),day,ug.m-3,151,1987-04-01,1987-08-31 00:00:00
31281,es0110a,o3,Ozone (air),day,ug.m-3,4092,1997-01-01,2009-12-31 00:00:00
31282,es0110a,o3,Ozone (air),dymax,ug.m-3,4095,1997-01-01,2009-12-31 00:00:00
31283,es0110a,o3,Ozone (air),hour,ug.m-3,94149,1997-01-01,2009-12-31 23:00:00
31284,es0110a,o3,Ozone (air),hour8,ug.m-3,97891,1997-01-01,2009-12-31 23:00:00
31285,es0110a,no2,Nitrogen dioxide (air),day,ug.m-3,5324,1997-01-10,2011-12-31 00:00:00


In [10]:
a <- data.frame(site=levels(as.factor(sites.info$site)))

for (st in a$site) {
    for (pll in (levels(as.factor(sites.info$variable)))) {
        if ("hour" %in% (sites.info[sites.info$site == st &
                                   sites.info$variable == pll,]$period)) {
            df <- sites.info[sites.info$site == st &
                             sites.info$variable == pll &
                             sites.info$period == "hour",]
         
            a[a$site == st, paste(pll, ".resolucion", sep="")] <- "hour"  
            a[a$site == st, paste(pll, ".count", sep="")] <- max(df$observation_count)
            a[a$site == st, paste(pll, ".start_dt", sep="")] <- date(min(df$date_start))
            a[a$site == st, paste(pll, ".end_dt", sep="")] <- date(max(df$date_end))
            
        } else if ("hour8" %in% (sites.info[sites.info$site == st &
                                           sites.info$variable == pll,]$period)) {
            
            df <- sites.info[sites.info$site == st &
                             sites.info$variable == pll &
                             sites.info$period == "hour8",]
         
            a[a$site == st, paste(pll, ".resolucion", sep="")] <- "hour8"    
            a[a$site == st, paste(pll, ".count", sep="")] <- max(df$observation_count)
            a[a$site == st, paste(pll, ".start_dt", sep="")] <- date(min(df$date_start))
            a[a$site == st, paste(pll, ".end_dt", sep="")] <- date(max(df$date_end))
            
        } else if ("day" %in% (sites.info[sites.info$site == st &
                                         sites.info$variable == pll,]$period)) {
            
            df <- sites.info[sites.info$site == st &
                             sites.info$variable == pll &
                             sites.info$period == "day",]
         
            a[a$site == st, paste(pll, ".resolucion", sep="")] <- "day"
            a[a$site == st, paste(pll, ".count", sep="")] <- max(df$observation_count)
            a[a$site == st, paste(pll, ".start_dt", sep="")] <- date(min(df$date_start))
            a[a$site == st, paste(pll, ".end_dt", sep="")] <- date(max(df$date_end))
            
        } else if ("daymax" %in% (sites.info[sites.info$site == st &
                                            sites.info$variable == pll,]$period)) {
            
            df <- sites.info[sites.info$site == st &
                             sites.info$variable == pll &
                             sites.info$period == "daymax",]
         
            a[a$site == st, paste(pll, ".resolucion", sep="")] <- "daymax"    
            a[a$site == st, paste(pll, ".count", sep="")] <- max(df$observation_count)
            a[a$site == st, paste(pll, ".start_dt", sep="")] <- date(min(df$date_start))
            a[a$site == st, paste(pll, ".end_dt", sep="")] <- date(max(df$date_end))
            
        }
    }
}

In [11]:
sites.geo <- sites.100mil %>%
    select("Municipio", "Población", 
           "Estación.tráfico", "Código.estación") %>%
    rename(site = "Código.estación",)

In [12]:
head(sites.geo)

Unnamed: 0_level_0,Municipio,Población,Estación.tráfico,site
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<chr>
1,A Coruña,245711,CORLAB 1,es1138a
2,Alcalá de Henares,195649,Alcalá de Henares,es1563a
3,Alcobendas,117040,Alcobendas,es1564a
4,Alicante,334887,Florida-Babel,es1915a
5,Alicante,334888,ALACANT-EL PLÁ,es1635a
6,Almería,198533,MEDITERRÁNEO,es1393a


In [13]:
sitesAQ <- merge(x = sites.geo, y = a, by = "site", all.y = TRUE)
sitesAQ <- cbind(sitesAQ[, c("Municipio", "Población")], 
                 sitesAQ[, -which(names(sitesAQ) %in% c("Municipio", "Población"))])

In [14]:
head(sitesAQ[order(sitesAQ$Municipio),])

Unnamed: 0_level_0,Municipio,Población,site,Estación.tráfico,no2.resolucion,no2.count,no2.start_dt,no2.end_dt,no.resolucion,no.count,⋯,o3.start_dt,o3.end_dt,pm10.resolucion,pm10.count,pm10.start_dt,pm10.end_dt,pm2.5.resolucion,pm2.5.count,pm2.5.start_dt,pm2.5.end_dt
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<date>,<date>,<chr>,<dbl>,⋯,<date>,<date>,<chr>,<dbl>,<date>,<date>,<chr>,<dbl>,<date>,<date>
36,Alcalá de Henares,195649,es1563a,Alcalá de Henares,hour,103707,2001-01-01,2012-12-31,hour,103741,⋯,2001-01-01,2012-12-31,hour,103545,2001-01-02,2012-12-31,,,,
37,Alcobendas,117040,es1564a,Alcobendas,hour,103884,2001-01-01,2012-12-31,hour,103888,⋯,2001-01-01,2012-12-31,hour,103336,2001-01-02,2012-12-31,,,,
46,Alicante,334888,es1635a,ALACANT-EL PLÁ,hour,73990,2003-01-01,2012-12-31,hour,73989,⋯,2002-09-10,2012-12-31,day,2621,2003-01-01,2012-12-31,hour,8483.0,2009-01-01,2009-12-31
25,Almería,198533,es1393a,MEDITERRÁNEO,hour,65734,2005-01-01,2012-12-31,hour,65722,⋯,2002-01-01,2012-12-31,hour,16540,2005-01-01,2006-12-31,day,462.0,2009-01-05,2012-12-15
27,Barcelona,1636763,es1438a,Barcelona (l'Eixample),hour,101289,1997-01-01,2012-12-31,hour,83143,⋯,1997-01-01,2012-12-31,day,1673,2002-01-16,2012-12-31,,,,
29,Barcelona,1636762,es1480a,Gràcia-Sant Gervasi,hour,116848,1997-11-04,2012-12-31,hour,104616,⋯,1997-11-04,2012-12-31,day,1967,2002-01-02,2012-12-31,,,,


In [15]:
resum.table[["traffic-urban-polutants"]] <- sitesAQ

## Resumen estaciones 

In [16]:
sitesMto <- read.csv("../data/csv/sitesMto.csv") %>%
                select(station, code, dist, siteAQ)

In [17]:
cd <- levels(as.factor(sitesMto$code))

for (i in 1:length(cd)) {
    if (nrow(sitesMto[sitesMto$code == cd[i], ]) > 1) {
        sitesMto[sitesMto$code == cd[i], "Repetido"] <- i
    }
}

In [18]:
length(sitesMto$code) - length(levels(as.factor(sitesMto$code)))

In [19]:
head(sitesMto)

Unnamed: 0_level_0,station,code,dist,siteAQ,Repetido
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<chr>,<int>
1,BILBAO,080250-99999,5.464009,es0041a,10
2,VITORIA,080800-99999,45.503738,es0041a,20
3,SANTANDER,080210-99999,73.13704,es0041a,8
4,BILBAO,080250-99999,5.36354,es0110a,10
5,VITORIA,080800-99999,50.995824,es0110a,20
6,SANTANDER,080230-99999,69.437339,es0110a,9


In [20]:
resum.Mto <- merge(x = sitesAQ %>%
                       select(Municipio, Población, site, Estación.tráfico),
                   y=sitesMto,
                   by.x="site", by.y="siteAQ", all.y=TRUE
                  )

resum.Mto <- cbind(resum.Mto[, c("Municipio", "Población")], resum.Mto[, -which(names(resum.Mto) %in% c("Municipio", "Población"))])

In [21]:
head(resum.Mto)

Unnamed: 0_level_0,Municipio,Población,site,Estación.tráfico,station,code,dist,Repetido
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<int>
1,Bilbao,346843,es0041a,María Diaz de Haro,BILBAO,080250-99999,5.464009,10
2,Bilbao,346843,es0041a,María Diaz de Haro,VITORIA,080800-99999,45.503738,20
3,Bilbao,346843,es0041a,María Diaz de Haro,SANTANDER,080210-99999,73.13704,8
4,Bilbao,24350,es0110a,Erandio,BILBAO,080250-99999,5.36354,10
5,Bilbao,24350,es0110a,Erandio,VITORIA,080800-99999,50.995824,20
6,Bilbao,24350,es0110a,Erandio,SANTANDER,080230-99999,69.437339,9


In [22]:
resum.table[["estaciones meteo"]] <- resum.Mto

# Relacion NN de las estaciones

Estudio de las estaciones de datos en funcion de la cantidad de datos disponibles, la cantidad de `NANs` en los datos y la relevancia dentro del estudio

## Lectura de los datos

In [23]:
nn.sites <- read.csv("../data/csv/nn_sites.csv", stringsAsFactors = FALSE)
head(nn.sites)

Unnamed: 0_level_0,Municipio,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,⋯,cl_2,cl_3,cl,cl_1_height,cl_2_height,cl_3_height,precip_12,precip,precip_6,pwc
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,Bilbao,es0041a,66515,080250-99999,5.464009,223729,0.9999464,0.7066272,0.9988736,0.2520415,⋯,0.3538388,0.1144956622,0.6121647,0.5995691,0.352265464,0.1144867228,0.005265299,0.1021012,0.005305526,0.04258724
2,Bilbao,es0041a,66515,080800-99999,45.503738,242582,0.9998805,0.7156755,0.9980172,0.23106,⋯,0.33002449,0.0468006695,0.7187508,0.7064374,0.32842915,0.0467965471,0.004839601,0.09352714,0.004864335,0.01709937
3,Bilbao,es0041a,66515,080210-99999,73.13704,214667,0.9998975,0.7823885,0.9988913,0.2556704,⋯,0.3147433,0.0728896384,0.5931466,0.58035,0.312805415,0.07288498,0.005538811,0.10531195,0.00545496,0.0556024
4,Bilbao,es0110a,204740,080230-99999,69.437339,63175,0.9987653,0.9949664,0.9999209,0.9994618,⋯,0.01218837,0.0001424614,0.2994064,0.2641393,0.001060546,0.0001108033,0.079730906,0.30516818,0.146988524,0.09402454
5,Bilbao,es0110a,204740,080250-99999,5.36354,223729,0.9999464,0.7066272,0.9988736,0.2520415,⋯,0.3538388,0.1144956622,0.6121647,0.5995691,0.352265464,0.1144867228,0.005265299,0.1021012,0.005305526,0.04258724
6,Bilbao,es0110a,204740,080800-99999,50.995824,242582,0.9998805,0.7156755,0.9980172,0.23106,⋯,0.33002449,0.0468006695,0.7187508,0.7064374,0.32842915,0.0467965471,0.004839601,0.09352714,0.004864335,0.01709937


In [24]:
df <- data.frame()
percent <- c(0.5, 0.7, 0.8, 0.9)

for (nm in names(nn.sites)[6:length(names(nn.sites))]) {
    for (i in 1:length(percent)) {
        count <- sum(nn.sites[, nm] > percent[i])
        df[i, nm] <- count
    } 
}


row.names(df) <- paste(">", percent*100, "%", sep="")
head(df)

Unnamed: 0_level_0,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,RH,ceil_hgt,cl_1,cl_2,cl_3,cl,cl_1_height,cl_2_height,cl_3_height,precip_12,precip,precip_6,pwc
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
>50%,159,159,159,159,46,66,159,159,26,44,0,0,44,43,0,0,0,14,0,1
>70%,159,159,142,159,38,33,146,146,3,7,0,0,7,7,0,0,0,14,0,1
>80%,159,152,111,159,38,33,135,135,3,0,0,0,0,0,0,0,0,8,0,1
>90%,159,142,60,151,38,33,133,133,0,0,0,0,0,0,0,0,0,2,0,1


In [25]:
resum.table[["Porcentaje estaciones Meteo"]] <- df

## Variables Meteo

In [26]:
nn.sites <- nn.sites[, 1:13]

In [27]:
a <- c(0)

for (i in 1:nrow(nn.sites)) {
    if (FALSE %in% (nn.sites[i, 5:ncol(nn.sites)] > 0.0)) {
        a <- c(a, -i)
    }
}
if (length(a) != 1) {
    nn.sites <- nn.sites[a, ]
}

In [28]:
head(nn.sites)

Unnamed: 0_level_0,Municipio,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,RH
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,Bilbao,es0041a,66515,080250-99999,5.464009,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,0.9986055,0.9985429
2,Bilbao,es0041a,66515,080800-99999,45.503738,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,0.9976132,0.9975266
3,Bilbao,es0041a,66515,080210-99999,73.13704,214667,0.9998975,0.7823885,0.9988913,0.2556704,0.4832694,0.9984068,0.9983742
4,Bilbao,es0110a,204740,080230-99999,69.437339,63175,0.9987653,0.9949664,0.9999209,0.9994618,0.9828888,0.999541,0.999541
5,Bilbao,es0110a,204740,080250-99999,5.36354,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,0.9986055,0.9985429
6,Bilbao,es0110a,204740,080800-99999,50.995824,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,0.9976132,0.9975266


In [30]:
resum.table[["Estaciones datos ws-RH"]] <- resum.Mto[resum.Mto$code %in% nn.sites$siteMto &
                                                            resum.Mto$site %in% nn.sites$siteAQ, ]

## Datos > 80%

In [31]:
percnt <- 0.8
columns <- 7:13 # ncol(nn.sites)
print(columns)

for (i in 1:nrow(nn.sites)) {
    numVar <- sum(nn.sites[i, columns] >= percnt)

    nn.sites[i, "numVar"] <- numVar
}

[1]  7  8  9 10 11 12 13


In [32]:
head(nn.sites)

Unnamed: 0_level_0,Municipio,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,RH,numVar
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,Bilbao,es0041a,66515,080250-99999,5.464009,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,0.9986055,0.9985429,4
2,Bilbao,es0041a,66515,080800-99999,45.503738,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,0.9976132,0.9975266,4
3,Bilbao,es0041a,66515,080210-99999,73.13704,214667,0.9998975,0.7823885,0.9988913,0.2556704,0.4832694,0.9984068,0.9983742,4
4,Bilbao,es0110a,204740,080230-99999,69.437339,63175,0.9987653,0.9949664,0.9999209,0.9994618,0.9828888,0.999541,0.999541,7
5,Bilbao,es0110a,204740,080250-99999,5.36354,223729,0.9999464,0.7066272,0.9988736,0.2520415,0.4516983,0.9986055,0.9985429,4
6,Bilbao,es0110a,204740,080800-99999,50.995824,242582,0.9998805,0.7156755,0.9980172,0.23106,0.4698288,0.9976132,0.9975266,4


In [33]:
dfs <- data.frame()
stations <- levels(as.factor(nn.sites$siteAQ))

for (st in stations) {
    a <- nn.sites[nn.sites$siteAQ == st &
                  nn.sites$numVar >= max(nn.sites$numVar), ][order(nn.sites[nn.sites$siteAQ == st &
                                                                           nn.sites$numVar == max(nn.sites$numVar), ]$dist), ]

    if (nrow(a) >= 1) {    
        dfs <- rbind(dfs, a[1, ])
    }
}

In [34]:
head(dfs)

Unnamed: 0_level_0,Municipio,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,RH,numVar
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
4,Bilbao,es0110a,204740,080230-99999,69.437339,63175,0.9987653,0.9949664,0.9999209,0.9994618,0.9828888,0.999541,0.999541,7
26,Ourense,es1096a,405864,080440-99999,61.344307,55321,0.9926429,0.9699752,0.9999819,0.9934564,0.9898411,0.9961859,0.9961859,7
29,Vigo,es1137a,396299,080440-99999,25.92304,55321,0.9926429,0.9699752,0.9999819,0.9934564,0.9898411,0.9961859,0.9961859,7
34,León,es1161a,285796,080530-99999,83.016388,50372,0.9919598,0.9536846,0.9999603,0.9997221,0.9913444,0.9986897,0.9986897,7
40,Lleida,es1225a,275014,081710-99999,2.317033,60878,0.9999014,0.9875653,0.9999179,0.9970104,0.9873682,0.9999179,0.9999014,7
46,Bilbao,es1244a,302453,080270-99999,73.118721,31430,0.9997455,0.984728,0.9999364,0.9998409,0.9999364,0.9967547,0.9967229,7


In [35]:
resum.table[["Meteo >80% ws-RH"]] <- resum.Mto[resum.Mto$code %in% dfs$siteMto &
                                                      resum.Mto$site %in% dfs$siteAQ, ]

## Solo una estacion por ciudad

In [36]:
head(dfs)

Unnamed: 0_level_0,Municipio,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,RH,numVar
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
4,Bilbao,es0110a,204740,080230-99999,69.437339,63175,0.9987653,0.9949664,0.9999209,0.9994618,0.9828888,0.999541,0.999541,7
26,Ourense,es1096a,405864,080440-99999,61.344307,55321,0.9926429,0.9699752,0.9999819,0.9934564,0.9898411,0.9961859,0.9961859,7
29,Vigo,es1137a,396299,080440-99999,25.92304,55321,0.9926429,0.9699752,0.9999819,0.9934564,0.9898411,0.9961859,0.9961859,7
34,León,es1161a,285796,080530-99999,83.016388,50372,0.9919598,0.9536846,0.9999603,0.9997221,0.9913444,0.9986897,0.9986897,7
40,Lleida,es1225a,275014,081710-99999,2.317033,60878,0.9999014,0.9875653,0.9999179,0.9970104,0.9873682,0.9999179,0.9999014,7
46,Bilbao,es1244a,302453,080270-99999,73.118721,31430,0.9997455,0.984728,0.9999364,0.9998409,0.9999364,0.9967547,0.9967229,7


In [37]:
municipios <- levels(as.factor(dfs[, "Municipio"]))
unique.cty <- data.frame()

for (i in 1:length(municipios)) {
    sm.city <- dfs[dfs$Municipio == municipios[i], ]
    
    unique.cty <- rbind(unique.cty, 
                        sm.city[which.max(sm.city$countAQ), ]
                       )
}

unique.cty

Unnamed: 0_level_0,Municipio,siteAQ,countAQ,siteMto,dist,countMto,ws,wd,air_temp,atmos_pres,visibility,dew_point,RH,numVar
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
137,Alicante,es1635a,305134,083590-99999,2.586339,47506,0.9971583,0.9841283,0.9998948,0.9995369,0.9951375,0.9998316,0.9998316,7
46,Bilbao,es1244a,302453,080270-99999,73.118721,31430,0.9997455,0.984728,0.9999364,0.9998409,0.9999364,0.9967547,0.9967229,7
145,Castellón de la Plana,es1834a,295845,082860-99999,5.553443,49122,0.9996336,0.9972925,0.9999186,0.9996539,0.9940149,0.9947885,0.9947885,7
155,Elche,es1849a,300692,083590-99999,22.447105,47506,0.9971583,0.9841283,0.9998948,0.9995369,0.9951375,0.9998316,0.9998316,7
63,Gijón,es1271a,420559,080140-99999,3.13695,16220,0.9996917,0.8863132,0.9999383,0.9996301,0.9996301,0.9996917,0.9996917,7
105,Granada,es1560a,223676,084170-99999,67.704765,48487,0.9998969,0.9805721,0.9999175,0.9980613,0.9954833,0.9992369,0.9992369,7
69,Huelva,es1340a,153259,083830-99999,3.665939,47958,0.9993327,0.994808,0.9999791,0.9993119,0.993536,0.999854,0.9998332,7
115,Las Palmas de Gran Canaria,es1573a,355339,600200-99999,87.398704,42552,0.9992715,0.9896127,0.999906,0.999248,0.9854766,0.9996945,0.999671,7
34,León,es1161a,285796,080530-99999,83.016388,50372,0.9919598,0.9536846,0.9999603,0.9997221,0.9913444,0.9986897,0.9986897,7
40,Lleida,es1225a,275014,081710-99999,2.317033,60878,0.9999014,0.9875653,0.9999179,0.9970104,0.9873682,0.9999179,0.9999014,7


In [38]:
unique.cty <- unique.cty[order(-unique.cty$countAQ -unique.cty$countMto), ]

resum.table[["1-estacionXciudad"]] <- resum.Mto[resum.Mto$code %in% unique.cty$siteMto &
                                                resum.Mto$site %in% unique.cty$siteAQ, ]

In [40]:
write.xlsx(resum.table, "../data/xlsx/resumen-3.xlsx")