# Extracción de Datos de Temperatura

## Abstract

En el presente notebook se resume el proceso de preparación de los datos de **temperatura** para el período 2014-2022 de 5 ciudades españolas:

* Madrid
* Barcelona
* Sevilla
* Valencia
* Bilbao

El criterio de selección de las mismas es su importancia económica, población y distribución geográfica que permite captar bastante bien las variaciones de temperatura que se experimentan en la Península Ibérica para un mismo período de tiempo.

Los datos de temperatura se extrajeron de la web de **Copernicus Climate Data Store**. Copernicus es una iniciativa de la Comisión Europea y de la Agencia Espacial Europea para construir un sistema autónomo de observación de la Tierra que permita la observación del medio ambiente y cómo le afectan los cambios ambientales, el origen de estos cambios y la influencia en la vida de las personas.

La solicitud de datos se realiza en el siguiente enlace:

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form


## 0. Importación e instalación de librerías

Para poder leer los ficheros **.grib** que nos devuelve la web de **Copernicus** instalamos los siguientes paquetes.

In [1]:
pip install xarray

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install eccodes

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install cfgrib

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [4]:
import requests
import json
import numpy as np
import datetime
import string
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

## 1. Lectura de ficheros

Vamos a leer uno de los ficheros para ver qué información nos proporcionan y con qué campos nos quedaremos.

In [116]:
ds_dataframe = xr.open_dataset('temperature_datasets/madrid_2014_2022.grib', engine='cfgrib').to_dataframe()

In [117]:
ds_dataframe

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,number,surface,valid_time,t2m
time,step,latitude,longitude,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2013-12-31,0 days 01:00:00,40.41,-3.71,0,0.0,2013-12-31 01:00:00,
2013-12-31,0 days 02:00:00,40.41,-3.71,0,0.0,2013-12-31 02:00:00,
2013-12-31,0 days 03:00:00,40.41,-3.71,0,0.0,2013-12-31 03:00:00,
2013-12-31,0 days 04:00:00,40.41,-3.71,0,0.0,2013-12-31 04:00:00,
2013-12-31,0 days 05:00:00,40.41,-3.71,0,0.0,2013-12-31 05:00:00,
...,...,...,...,...,...,...,...
2022-02-28,0 days 20:00:00,40.41,-3.71,0,0.0,2022-02-28 20:00:00,284.110107
2022-02-28,0 days 21:00:00,40.41,-3.71,0,0.0,2022-02-28 21:00:00,282.633789
2022-02-28,0 days 22:00:00,40.41,-3.71,0,0.0,2022-02-28 22:00:00,281.246094
2022-02-28,0 days 23:00:00,40.41,-3.71,0,0.0,2022-02-28 23:00:00,280.459717


Para facilitar la lectura de los campos del dataframe resultante, lo transformamos a un fichero .csv.

In [118]:
ds_dataframe.to_csv(r'temperature_datasets/madrid_2014_2022.csv', index=True)

In [119]:
ds_dataframe = pd.read_csv('temperature_datasets/madrid_2014_2022.csv')
ds_dataframe

Unnamed: 0,time,step,latitude,longitude,number,surface,valid_time,t2m
0,2013-12-31,0 days 01:00:00,40.41,-3.71,0,0.0,2013-12-31 01:00:00,
1,2013-12-31,0 days 02:00:00,40.41,-3.71,0,0.0,2013-12-31 02:00:00,
2,2013-12-31,0 days 03:00:00,40.41,-3.71,0,0.0,2013-12-31 03:00:00,
3,2013-12-31,0 days 04:00:00,40.41,-3.71,0,0.0,2013-12-31 04:00:00,
4,2013-12-31,0 days 05:00:00,40.41,-3.71,0,0.0,2013-12-31 05:00:00,
...,...,...,...,...,...,...,...,...
71563,2022-02-28,0 days 20:00:00,40.41,-3.71,0,0.0,2022-02-28 20:00:00,284.11010
71564,2022-02-28,0 days 21:00:00,40.41,-3.71,0,0.0,2022-02-28 21:00:00,282.63380
71565,2022-02-28,0 days 22:00:00,40.41,-3.71,0,0.0,2022-02-28 22:00:00,281.24610
71566,2022-02-28,0 days 23:00:00,40.41,-3.71,0,0.0,2022-02-28 23:00:00,280.45972


Vemos la información de cada columna del dataframe.

In [120]:
ds_dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71568 entries, 0 to 71567
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   time        71568 non-null  object 
 1   step        71568 non-null  object 
 2   latitude    71568 non-null  float64
 3   longitude   71568 non-null  float64
 4   number      71568 non-null  int64  
 5   surface     71568 non-null  float64
 6   valid_time  71568 non-null  object 
 7   t2m         71545 non-null  float64
dtypes: float64(4), int64(1), object(3)
memory usage: 4.4+ MB


Comprobamos que los valores de latitud y longitus son únicos.

In [121]:
ds_dataframe['latitude'].unique()

array([40.41])

In [122]:
ds_dataframe['longitude'].unique()

array([-3.71])

Observamos que los datos de temperatura comienzan a partir del 01/01/2014 y que los valores de temperatura (t2m) vienen dados en Kelvin.

In [123]:
ds_dataframe.loc[ds_dataframe['time'] == '2013-12-31']

Unnamed: 0,time,step,latitude,longitude,number,surface,valid_time,t2m
0,2013-12-31,0 days 01:00:00,40.41,-3.71,0,0.0,2013-12-31 01:00:00,
1,2013-12-31,0 days 02:00:00,40.41,-3.71,0,0.0,2013-12-31 02:00:00,
2,2013-12-31,0 days 03:00:00,40.41,-3.71,0,0.0,2013-12-31 03:00:00,
3,2013-12-31,0 days 04:00:00,40.41,-3.71,0,0.0,2013-12-31 04:00:00,
4,2013-12-31,0 days 05:00:00,40.41,-3.71,0,0.0,2013-12-31 05:00:00,
5,2013-12-31,0 days 06:00:00,40.41,-3.71,0,0.0,2013-12-31 06:00:00,
6,2013-12-31,0 days 07:00:00,40.41,-3.71,0,0.0,2013-12-31 07:00:00,
7,2013-12-31,0 days 08:00:00,40.41,-3.71,0,0.0,2013-12-31 08:00:00,
8,2013-12-31,0 days 09:00:00,40.41,-3.71,0,0.0,2013-12-31 09:00:00,
9,2013-12-31,0 days 10:00:00,40.41,-3.71,0,0.0,2013-12-31 10:00:00,


Nos quedamos los datos a partir del 01/01/2014.

In [124]:
ds_dataframe = ds_dataframe.dropna().reset_index(drop=True)

Vamos a descomponer la columna **valid_time** separando las fechas de las horas. Para ello comenzamos convirtiéndola en un formato **datetime** válido.

In [125]:
ds_dataframe['valid_time'] = pd.to_datetime(ds_dataframe['valid_time'])

A partir de la columna valid_time, podemos crear 2 columnas: **date** y **hour**.

In [126]:
ds_dataframe['date'] = [datetime.datetime.date(d) for d in ds_dataframe['valid_time']] 

In [127]:
ds_dataframe['hour'] = [datetime.datetime.time(d) for d in ds_dataframe['valid_time']] 

In [128]:
ds_dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71545 entries, 0 to 71544
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   time        71545 non-null  object        
 1   step        71545 non-null  object        
 2   latitude    71545 non-null  float64       
 3   longitude   71545 non-null  float64       
 4   number      71545 non-null  int64         
 5   surface     71545 non-null  float64       
 6   valid_time  71545 non-null  datetime64[ns]
 7   t2m         71545 non-null  float64       
 8   date        71545 non-null  object        
 9   hour        71545 non-null  object        
dtypes: datetime64[ns](1), float64(4), int64(1), object(4)
memory usage: 5.5+ MB


Dado que solo necesitamos los valores de temperatura a lo largo del tiempo, simplificamos el dataframe.

In [129]:
del ds_dataframe['time']
del ds_dataframe['step']
del ds_dataframe['latitude']
del ds_dataframe['longitude']
del ds_dataframe['number']
del ds_dataframe['surface']
del ds_dataframe['valid_time']

In [130]:
ds_dataframe = ds_dataframe[['date', 'hour', 't2m']]

In [131]:
ds_dataframe

Unnamed: 0,date,hour,t2m
0,2014-01-01,00:00:00,277.84985
1,2014-01-01,01:00:00,277.74854
2,2014-01-01,02:00:00,277.80054
3,2014-01-01,03:00:00,277.86353
4,2014-01-01,04:00:00,277.95654
...,...,...,...
71540,2022-02-28,20:00:00,284.11010
71541,2022-02-28,21:00:00,282.63380
71542,2022-02-28,22:00:00,281.24610
71543,2022-02-28,23:00:00,280.45972
