Extraemos los datos de 3 fuentes diferentes:

* Copernicus: DEsde donde extraemos los datos climáticos
* Sistemas de información de la Cuenca Hidrográfica del Jucar: Desde donde extraemos información de ríos y canales y de la Albufera
* CEDEX (Centro de Estudios y Experimentación de Obras Públicas): Desde donde extraemos información de los embalses de cuenca del Jucar
Además este último centro también dispone de datos diarios de ríos y canales. Para discernir hacemos una pequeña comparativa entre los datos de los ríos del CEDEX y de la los sistemas de información de la cuenca

In [1]:
import pandas as pd

# Tablas

## Tabla de fechas

In [2]:
start_date = "1900-01-01"
end_date = "2024-11-15"
dates = pd.date_range(start=start_date, end=end_date, freq='D')

# Crear DataFrame con las fechas
df_date = pd.DataFrame({'date': dates})

# Generar el identificador int como ddmmyyyy
df_date['date_id'] = df_date['date'].dt.strftime('%Y%m%d').astype(int)

df_date.to_csv('df_date.csv', index = False)

## Datos cuenca hidrográfica del Jucar

https://www.chj.es/es-es/medioambiente/sistemasdeinformacion/Paginas/Sistemasdeinformacion.aspx

### Ríos y canales

In [3]:
df_rios_canales = pd.read_csv('Extracción/Cuenca hidrográfica Juca SI/F2796_Rios_y_Canales_ROEA/F2796_D2_Serie día.csv', encoding='latin1', sep=';')
df_rios_canales = df_rios_canales.rename(columns = {'Cód. CHJ' : 'id_station','Fecha' : 'date','Cantidad (hm³)' : 'quantity_hm3'})
df_rios_canales = df_rios_canales[['id_station', 'date','quantity_hm3']]
df_rios_canales['date'] = pd.to_datetime(df_rios_canales['date'], format='%d-%m-%Y %H:%M:%S')
df_rios_canales = df_rios_canales.dropna()

df_rios_canales_id = pd.read_csv('Extracción/Cuenca hidrográfica Juca SI/F2796_Rios_y_Canales_ROEA/F2796_M0_Todas.csv', encoding='UTF-8',index_col = 0)
df_rios_canales_id = df_rios_canales_id.rename(columns = {'Cód. Estación' : 'id_station','Altitud (m)':'Altitud'})
df_rios_canales_info = df_rios_canales_id[['id_station','Estación de Aforo', 'Cód. ROEA', 'Tipo',
                                           'Cód. Munic.', 'Municipio', 'Cód. SE', 'Sistema de Explotación','Altitud']]
df_rios_canales_id = df_rios_canales_id[['id_station','latitude','longitude']]

#Crear columna pixels únicos
unique_pixels = df_rios_canales_id[['latitude', 'longitude']].drop_duplicates().reset_index(drop=True)
unique_pixels['location_id'] = range(len(unique_pixels))
df_rios_canales_id = pd.merge(df_rios_canales_id, unique_pixels, on=['latitude', 'longitude'], how='left')
df_rios_canales = pd.merge(df_rios_canales, df_rios_canales_id[['id_station', 'location_id']], on = 'id_station', how = 'left')
df_rios_canales_info = pd.merge(df_rios_canales_info, df_rios_canales_id[['id_station', 'location_id']], on = 'id_station', how = 'left')
df_rios_canales_info.rename(columns={
    'id_station': 'id_station',
    'Estación de Aforo': 'EstacióndeAforo',
    'Cód. ROEA': 'CodROEA',
    'Tipo': 'Tipo',
    'Cód. Munic.': 'CodMunic',
    'Municipio': 'Municipio',
    'Cód. SE': 'CodSE',
    'Sistema de Explotación': 'SistemadeExplotación',
    'Altitud': 'Altitud',
    'location_id': 'location_id'
}, inplace=True)

df_rios_canales_id = df_rios_canales_id.drop('id_station', axis = 1)
df_rios_canales = df_rios_canales.drop('id_station', axis = 1)
df_rios_canales['quantity_hm3'] = df_rios_canales['quantity_hm3'].str.replace(',','.').astype('float')
#indicador de fecha
df_rios_canales = pd.merge(df_rios_canales, df_date, on = 'date', how = 'left').drop('date', axis = 1)
df_rios_canales_id.to_csv('df_rios_canales_id.csv',index = False)
df_rios_canales.to_csv('df_rios_canales.csv',index = False)
df_rios_canales_info.to_csv('df_rios_canales_info.csv',index = False)

## Datos de CEDEX

https://www.cedex.es/comunicacion/noticias/cedex-tiene-disposicion-publica-datos-anuario-aforos-desde-1912

#### Embalses.

* ref_ceh: Identificador del embalse
* fecha: Fecha (día/mes/año) en la que se toma el dato
* reserva: Reserva diaria (hm3)
* salida: Salida media diaria (m3/s)
* tipo: Identificador del tipo de medida (1 o 2). Tipo 1: La reserva se mide al final del día, por lo que al hacer el balance, la reserva del día siguiente se obtiene sumando a la reserva las entradas del día siguiente y restando las salidas del día siguiente. R(DIA 2)= R(DIA 1)+ E(DIA 2) - S(DIA 2) Tipo 2: La reservas se mide al comienzo del día, de manera que la reserva del día siguiente se obtiene sumando a la reserva las entradas del mismo díao3/ mo día

In [4]:
df_embalses = pd.read_csv('Extracción/CEDEX - CH Jucar/afliqe.csv',sep = ';')
df_embalses = df_embalses.rename(columns = {'reserva' : 'quantity_hm3','ref_ceh':'id_station','fecha' : 'date'})
df_embalses['date'] = pd.to_datetime(df_embalses['date'], format='%d/%m/%Y')
df_embalses = df_embalses[['id_station','date','quantity_hm3']]
df_embalses_id = pd.read_csv('Extracción/CEDEX - CH Jucar/Transformados/df_embalses_cedex_id.csv', index_col = 0)
df_embalses_id = df_embalses_id.rename(columns = {'ref_ceh':'id_station'})
df_embalses_id = df_embalses_id.drop('nom_embalse', axis = 1)
#Crear columna pixels únicos
unique_pixels = df_embalses_id[['latitude', 'longitude']].drop_duplicates().reset_index(drop=True)
unique_pixels['location_id'] = range(df_rios_canales_id['location_id'].max()+1, df_rios_canales_id['location_id'].max() + len(unique_pixels)+1)

df_embalses_id = pd.merge(df_embalses_id, unique_pixels, on=['latitude', 'longitude'], how='left')
df_embalses_info = pd.read_excel('Extracción/Cuenca hidrográfica Juca SI/F2797_Embalses_ROEA/F2797_M0_Embalses_ROEA.xlsx')
df_embalses_info = df_embalses_info.rename(columns = {'Cód. Embalse' : 'id_station'})
df_embalses_info = pd.merge(df_embalses_info, df_embalses_id[['id_station', 'location_id']], on = 'id_station')
df_embalses_info = df_embalses_info.rename(columns={
    'id_station': 'id_station',
    'Embalse': 'Embalse',
    'Cód. ROEA': 'CodROEA',
    'Cód. Presa principal': 'CodPresaprincipal',
    'Presa Principal': 'PresaPrincipal',
    'Vol. Útil (hm³)': 'VolUtil_hm3',
    'Cód. Munic.': 'CodMunic',
    'Municipio': 'Municipio',
    'Cód. Prov.': 'CodProv',
    'Provincia': 'Provincia',
    'Cód. SE': 'CodSE',
    'Sistema de Explotación': 'SistemadeExplotación',
    'Cauce': 'Cauce',
    'Cód. Masa Superf. PHJ22': 'CodMasaSuperfPHJ22',
    'Masa Superficial PHJ22': 'MasaSuperficialPHJ22',
    'location_id': 'location_id'
})
df_embalses = pd.merge(df_embalses, df_embalses_id[['id_station', 'location_id']], on = 'id_station')
df_embalses_id = df_embalses_id.drop('id_station', axis = 1)
df_embalses = df_embalses.drop('id_station', axis = 1)
#indicador de fecha
df_embalses = pd.merge(df_embalses, df_date, on = 'date', how = 'left').drop('date', axis = 1)
#guardar csv
df_embalses_id.to_csv('df_embalses_id.csv',index = False)
df_embalses.to_csv('df_embalses.csv',index = False)
df_embalses_info.to_csv('df_embalses_info.csv',index = False)

## Copernicus

In [14]:
df_copernicus = pd.read_csv('Extracción/Copernicus - climate ERA5/df_copernicus_limpio.csv')

df_copernicus['date'] = pd.to_datetime(df_copernicus['date'])

df_copernicus_id = df_copernicus[['latitude', 'longitude']].drop_duplicates().reset_index(drop=True)

# Asignar un ID único a cada combinación
df_copernicus_id['location_id'] = range(df_embalses_id['location_id'].max()+1, df_embalses_id['location_id'].max() + len(df_copernicus_id)+1)

df_copernicus = df_copernicus.merge(df_copernicus_id, on=['latitude', 'longitude'], how='left').drop(['latitude', 'longitude'],axis = 1)
df_copernicus = pd.merge(df_copernicus, df_date, on = 'date', how = 'left').drop('date', axis = 1)
df_copernicus.to_csv('df_copernicus.csv', index = False)
df_copernicus_id.to_csv('df_copernicus_id.csv', index = False)

## Aemet

In [15]:
df_aemet1 = pd.read_csv('Extracción/Aemet/2004-02-29_2024-09-30.csv')
df_aemet2 = pd.read_csv('Extracción/Aemet/1990-01-01_2004-02-19.csv')
df_aemet3 = pd.read_csv('Extracción/Aemet/1980-01-01_1989-12-01.csv')
df_aemet_info =  pd.read_csv('Extracción/Aemet/estaciones_aemet_id.csv')

df_aemet = pd.concat([df_aemet1, df_aemet2, df_aemet3])
df_aemet = df_aemet.drop([ 'horatmin', 'horatmax','hrMedia','hrMax', 'horaHrMax', 'horaHrMin', 'horaPresMax','horaPresMin'], axis = 1)
decimal_columns = ['tmed', 'prec', 'tmin', 'tmax',  'velmedia', 'racha',  'presMax', 'presMin', 'sol']
df_aemet['prec'] = df_aemet['prec'].replace('Ip', '0') 
df_aemet['prec'] = df_aemet['prec'].replace('Acum', '9999999999')

for col in decimal_columns:
    df_aemet[col] = df_aemet[col].str.replace(',', '.').astype(float)

df_aemet_id = df_aemet_info[['latitude', 'longitude']].drop_duplicates().reset_index(drop=True)
df_aemet_id['location_id'] = range(df_copernicus['location_id'].max()+1, df_copernicus['location_id'].max() + len(df_aemet_id)+1)
df_aemet_info = df_aemet_info.merge(df_aemet_id, on=['latitude', 'longitude'], how='left').drop(['latitude', 'longitude'],axis = 1)
df_aemet = df_aemet.merge(df_aemet_info[['indicativo', 'location_id']], on=['indicativo'], how='left').drop(['indicativo'],axis = 1)

max_value = df_aemet['prec'].max()
second_max_value = df_aemet['prec'][df_aemet['prec'] != max_value].max()
df_aemet['prec'] = df_aemet['prec'].replace(max_value, second_max_value)
df_aemet = df_aemet.rename(columns = {'fecha' : 'date'})
df_aemet['date'] = pd.to_datetime(df_aemet['date'])
#mergeamos con fecha
df_aemet = pd.merge(df_aemet,df_date, on = 'date', how = 'left').drop('date', axis = 1)
df_aemet = df_aemet.drop(['nombre', 'provincia'],axis = 1)
df_aemet.to_csv('df_aemet.csv',index = False)
df_aemet_id.to_csv('df_aemet_id.csv',index = False)
df_aemet_info.to_csv('df_aemet_info.csv',index = False)

### Tabla locations_id

In [16]:
df_rios_canales_id['Type'] = 'Rio'
df_embalses_id['Type'] = 'Embalse'
df_copernicus_id['Type'] = 'Copernicus'
df_aemet_id['Type'] = 'Aemet'

In [17]:
locations_id = pd.concat([df_rios_canales_id,df_embalses_id,df_copernicus_id,df_aemet_id])
locations_id.to_csv('locations_id.csv',index = False)

### Tabla pixels

In [18]:
df_copernicus_id = pd.read_csv('df_copernicus_id.csv')
df_rios_canales_id = pd.read_csv('df_rios_canales_id.csv')
df_embalses_id = pd.read_csv('df_embalses_id.csv')
df_aemet_id = pd.read_csv('df_aemet_id.csv')

##### Método con distancias euclidianas:

In [17]:
from scipy.spatial.distance import cdist
import numpy as np

# Obtener las coordenadas de cada conjunto de datos
coords_copernicus = df_copernicus_id[['latitude', 'longitude']].to_numpy()
coords_rios_canales = df_rios_canales_id[['latitude', 'longitude']].to_numpy()
coords_embalses = df_embalses_id[['latitude', 'longitude']].to_numpy()
coords_aemet = df_aemet_id[['latitude', 'longitude']].to_numpy()

# Calcular las distancias Euclidianas entre cada conjunto de datos y Copernicus
dist_coper_rios = cdist(coords_rios_canales, coords_copernicus, metric='euclidean')
dist_coper_embalses = cdist(coords_embalses, coords_copernicus, metric='euclidean')
dist_coper_aemet = cdist(coords_aemet, coords_copernicus, metric='euclidean')

# Encontrar los índices de las ubicaciones más cercanas en Copernicus
closest_coper_rios = np.argmin(dist_coper_rios, axis=1)
closest_coper_embalses = np.argmin(dist_coper_embalses, axis=1)
closest_coper_aemet = np.argmin(dist_coper_aemet, axis=1)

# Crear DataFrames para enlazar las ubicaciones más cercanas de cada conjunto de datos con Copernicus
df_closest_rios = pd.DataFrame({
    'location_id_rios_canales': df_rios_canales_id['location_id'].values,  # IDs de los ríos y canales
    'location_id_copernicus': df_copernicus_id['location_id'].values[closest_coper_rios],  # IDs de Copernicus correspondientes
    'latitude': df_copernicus_id['latitude'].values[closest_coper_rios],  # Latitudes de las ubicaciones más cercanas
    'longitude': df_copernicus_id['longitude'].values[closest_coper_rios]  # Longitudes de las ubicaciones más cercanas
})

df_closest_embalses = pd.DataFrame({
    'location_id_embalses': df_embalses_id['location_id'].values,  # IDs de los embalses
    'location_id_copernicus': df_copernicus_id['location_id'].values[closest_coper_embalses],  # IDs de Copernicus correspondientes
    'latitude': df_copernicus_id['latitude'].values[closest_coper_embalses],  # Latitudes de las ubicaciones más cercanas
    'longitude': df_copernicus_id['longitude'].values[closest_coper_embalses]  # Longitudes de las ubicaciones más cercanas
})

df_closest_aemet = pd.DataFrame({
    'location_id_aemet': df_aemet_id['location_id'].values,  # IDs de las estaciones AEMET
    'location_id_copernicus': df_copernicus_id['location_id'].values[closest_coper_aemet],  # IDs de Copernicus correspondientes
    'latitude': df_copernicus_id['latitude'].values[closest_coper_aemet],  # Latitudes de las ubicaciones más cercanas
    'longitude': df_copernicus_id['longitude'].values[closest_coper_aemet]  # Longitudes de las ubicaciones más cercanas
})

In [18]:
df_pixeles_cercanos = pd.merge(df_closest_rios, df_closest_embalses, on = ['location_id_copernicus','latitude', 'longitude'], how = 'outer')
df_pixeles_cercanos = pd.merge(df_pixeles_cercanos, df_closest_aemet, on = ['location_id_copernicus','latitude', 'longitude'], how = 'outer')
df_pixeles_cercanos = df_pixeles_cercanos[[ 'location_id_copernicus', 'location_id_embalses', 'location_id_aemet','location_id_rios_canales']]

In [20]:
df_pixeles_cercanos.to_csv('df_pixeles_cercanos.csv',index = False) 

##### Método con áreas del pixel

In [19]:
# Definir el tamaño del píxel Copernicus en grados
pixel_size = 0.5

# Calcular los límites de cada píxel en Copernicus
df_copernicus_id['north'] = df_copernicus_id['latitude'] + (pixel_size / 2)
df_copernicus_id['south'] = df_copernicus_id['latitude'] - (pixel_size / 2)
df_copernicus_id['east'] = df_copernicus_id['longitude'] + (pixel_size / 2)
df_copernicus_id['west'] = df_copernicus_id['longitude'] - (pixel_size / 2)

# Función para determinar si un punto está dentro de un píxel
def is_within_pixel(lat, lon, pixel):
    return (
        pixel['south'] <= lat <= pixel['north'] and
        pixel['west'] <= lon <= pixel['east']
    )

# Verificar la inclusión de cada punto en el área de los píxeles
def find_matching_pixels(df_points, df_copernicus, point_id_col):
    matches = []
    for _, point in df_points.iterrows():
        for _, pixel in df_copernicus.iterrows():
            if is_within_pixel(point['latitude'], point['longitude'], pixel):
                matches.append({
                    point_id_col: point['location_id'],
                    'location_id_copernicus': pixel['location_id'],
                    'latitude': pixel['latitude'],
                    'longitude': pixel['longitude']
                })
                break  # Un punto solo puede pertenecer a un píxel
    return pd.DataFrame(matches)

# Aplicar la función a cada conjunto de datos
df_closest_rios = find_matching_pixels(df_rios_canales_id, df_copernicus_id, 'location_id_rios_canales')
df_closest_embalses = find_matching_pixels(df_embalses_id, df_copernicus_id, 'location_id_embalses')
df_closest_aemet = find_matching_pixels(df_aemet_id, df_copernicus_id, 'location_id_aemet')

In [20]:
df_pixeles_cercanos = pd.merge(df_closest_rios, df_closest_embalses, on = ['location_id_copernicus','latitude', 'longitude'], how = 'outer')
df_pixeles_cercanos = pd.merge(df_pixeles_cercanos, df_closest_aemet, on = ['location_id_copernicus','latitude', 'longitude'], how = 'outer')
df_pixeles_cercanos = df_pixeles_cercanos[[ 'location_id_copernicus', 'location_id_embalses', 'location_id_aemet','location_id_rios_canales']]
df_pixeles_cercanos.to_csv('df_pixeles_cercanos.csv',index = False) 

# Base de datos

## SQL

In [21]:
import sqlite3
import pandas as pd
import os

Habiendo construido todas las tablas, ahora elaboramos una base relacional de esta forma:

![title](Schema.png)

In [22]:
df_pixeles_cercanos = pd.read_csv('df_pixeles_cercanos.csv')
df_copernicus = pd.read_csv('df_copernicus.csv')
df_embalses = pd.read_csv('df_embalses.csv')
df_embalses_info = pd.read_csv('df_embalses_info.csv')
df_rios_canales = pd.read_csv('df_rios_canales.csv')
df_rios_canales_info = pd.read_csv('df_rios_canales_info.csv')
df_aemet = pd.read_csv('df_aemet.csv')
df_aemet_info = pd.read_csv('df_aemet_info.csv')
df_date = pd.read_csv('df_date.csv')
locations_id = pd.read_csv('locations_id.csv')

In [23]:
file_path = 'aguaCHJucar.db'

if os.path.exists(file_path):
    os.remove(file_path)
conn = sqlite3.connect(file_path)
cursor = conn.cursor()

# Habilitar claves foráneas
cursor.execute("PRAGMA foreign_keys = ON;")

# Script SQL corregido
sql_script = '''
CREATE TABLE locations_id (
  latitude REAL,
  longitude REAL,
  location_id INTEGER PRIMARY KEY,
  Type TEXT
);

CREATE TABLE df_date (
  date TIMESTAMP,
  date_id INTEGER PRIMARY KEY
);

CREATE TABLE df_pixeles_cercanos (
  location_id_copernicus INTEGER,
  location_id_embalses INTEGER,
  location_id_aemet INTEGER,
  location_id_rios_canales INTEGER,
  FOREIGN KEY (location_id_copernicus) REFERENCES locations_id (location_id),
  FOREIGN KEY (location_id_embalses) REFERENCES locations_id (location_id),
  FOREIGN KEY (location_id_aemet) REFERENCES locations_id (location_id),
  FOREIGN KEY (location_id_rios_canales) REFERENCES locations_id (location_id)
);

CREATE TABLE df_copernicus (
  total_precipitation REAL,
  skin_temperature REAL,
  evaporation REAL,
  runoff REAL,
  snowfall REAL,
  soil_water_l1 REAL,
  soil_water_l2 REAL,
  soil_water_l3 REAL,
  soil_water_l4 REAL,
  high_vegetation_cover REAL,
  low_vegetation_cover REAL,
  type_high_vegetation REAL,
  type_low_vegetation REAL,
  location_id INTEGER,
  date_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id),
  FOREIGN KEY (date_id) REFERENCES df_date (date_id)
);

CREATE TABLE df_embalses (
  quantity_hm3 REAL,
  location_id INTEGER,
  date_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id),
  FOREIGN KEY (date_id) REFERENCES df_date (date_id)
);

CREATE TABLE df_embalses_info (
  id_station INTEGER,
  Embalse TEXT,
  CodROEA INTEGER,
  CodPresaprincipal TEXT,
  PresaPrincipal TEXT,
  VolUtil_hm3 REAL,
  CodMunic INTEGER,
  Municipio TEXT,
  CodProv INTEGER,
  Provincia TEXT,
  CodSE INTEGER,
  SistemadeExplotación TEXT,
  Cauce TEXT,
  CodMasaSuperfPHJ22 TEXT,
  MasaSuperficialPHJ22 TEXT,
  location_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id)
);

CREATE TABLE df_rios_canales (
  quantity_hm3 REAL,
  location_id INTEGER,
  date_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id),
  FOREIGN KEY (date_id) REFERENCES df_date (date_id)
);

CREATE TABLE df_rios_canales_info (
  id_station INTEGER,
  EstacióndeAforo TEXT,
  CodROEA INTEGER,
  Tipo TEXT,
  CodMunic INTEGER,
  Municipio TEXT,
  CodSE REAL,
  SistemadeExplotación TEXT,
  Altitud TEXT,
  location_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id)
);

CREATE TABLE df_aemet (
  altitud INTEGER,
  tmed REAL,
  prec REAL,
  tmin REAL,
  tmax REAL,
  dir REAL,
  velmedia REAL,
  racha REAL,
  horaracha TEXT,
  hrMin REAL,
  presMax REAL,
  presMin REAL,
  sol REAL,
  location_id INTEGER,
  date_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id),
  FOREIGN KEY (date_id) REFERENCES df_date (date_id)
);

CREATE TABLE df_aemet_info (
  provincia TEXT,
  altitud INTEGER,
  indicativo TEXT,
  nombre TEXT,
  indsinop REAL,
  location_id INTEGER,
  FOREIGN KEY (location_id) REFERENCES locations_id (location_id)
);
'''

# Ejecutar el script SQL
cursor.executescript(sql_script)

# Confirmar los cambios
conn.commit()

# Cerrar la conexión
conn.close()

In [24]:
conn = sqlite3.connect('aguaCHJucar.db')
df_pixeles_cercanos.to_sql('df_pixeles_cercanos', conn, if_exists='append', index=False)
df_copernicus.to_sql('df_copernicus', conn, if_exists='append', index=False)
df_embalses.to_sql('df_embalses', conn, if_exists='append', index=False)
df_embalses_info.to_sql('df_embalses_info', conn, if_exists='append', index=False)
df_rios_canales.to_sql('df_rios_canales', conn, if_exists='append', index=False)
df_rios_canales_info.to_sql('df_rios_canales_info', conn, if_exists='append', index=False)
df_aemet.to_sql('df_aemet', conn, if_exists='append', index=False)
df_aemet_info.to_sql('df_aemet_info', conn, if_exists='append', index=False)
df_date.to_sql('df_date', conn, if_exists='append', index=False)
locations_id.to_sql('locations_id', conn, if_exists='append', index=False)
conn.close()

In [6]:
conn = sqlite3.connect('aguaCHJucar.db')

cursor = conn.cursor()

query = "SELECT * FROM df_copernicus JOIN df_date ON df_date.date_id = df_copernicus.date_id;"

df = pd.read_sql_query(query, conn)


conn.close()
df

Unnamed: 0,total_precipitation,skin_temperature,evaporation,runoff,snowfall,soil_water_l1,soil_water_l2,soil_water_l3,soil_water_l4,high_vegetation_cover,low_vegetation_cover,type_high_vegetation,type_low_vegetation,location_id,date_id,date,date_id.1
0,0.000000e+00,278.61860,-0.000795,0.000000,0.000000,2.488437,2.523067,2.075987,1.413376,0.369385,0.629484,19.0,1.0,176,19600101,1960-01-01,19600101
1,7.335329e-07,278.85257,-0.000642,0.000000,0.000000,2.459801,2.496525,2.078841,1.413689,0.369385,0.629484,19.0,1.0,176,19600102,1960-01-02,19600102
2,2.325780e-05,278.63947,-0.000538,0.000001,0.000002,2.439879,2.477760,2.082399,1.414040,0.369385,0.629484,19.0,1.0,176,19600103,1960-01-03,19600103
3,0.000000e+00,278.10742,-0.000742,0.000000,0.000000,2.415260,2.455587,2.083358,1.414391,0.369385,0.629484,19.0,1.0,176,19600104,1960-01-04,19600104
4,0.000000e+00,278.21744,-0.000759,0.000000,0.000000,2.384617,2.427243,2.074848,1.414734,0.369385,0.629484,19.0,1.0,176,19600105,1960-01-05,19600105
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61100,1.561467e-05,297.98180,-0.002948,0.000023,0.000000,1.572445,1.498507,1.725455,1.833000,0.459558,0.540442,19.0,1.0,203,19650724,1965-07-24,19650724
61101,6.989036e-04,297.88626,-0.002722,0.000055,0.000000,1.478015,1.478247,1.715775,1.832184,0.459558,0.540442,19.0,1.0,203,19650725,1965-07-25,19650725
61102,2.530530e-03,294.35486,-0.002550,0.000122,0.000000,1.571812,1.486924,1.719815,1.831428,0.459558,0.540442,19.0,1.0,203,19650726,1965-07-26,19650726
61103,4.566765e-04,293.97090,-0.002378,0.000040,0.000000,1.547944,1.492321,1.720400,1.830818,0.459558,0.540442,19.0,1.0,203,19650727,1965-07-27,19650727


In [234]:
df_copernicus

Unnamed: 0,total_precipitation,skin_temperature,evaporation,runoff,snowfall,soil_water_l1,soil_water_l2,soil_water_l3,soil_water_l4,high_vegetation_cover,low_vegetation_cover,type_high_vegetation,type_low_vegetation,location_id,date_id
0,0.000000e+00,278.61860,-0.000795,0.000000,0.000000,2.488437,2.523067,2.075987,1.413376,0.369385,0.629484,19.0,1.0,176,19600101
1,7.335329e-07,278.85257,-0.000642,0.000000,0.000000,2.459801,2.496525,2.078841,1.413689,0.369385,0.629484,19.0,1.0,176,19600102
2,2.325780e-05,278.63947,-0.000538,0.000001,0.000002,2.439879,2.477760,2.082399,1.414040,0.369385,0.629484,19.0,1.0,176,19600103
3,0.000000e+00,278.10742,-0.000742,0.000000,0.000000,2.415260,2.455587,2.083358,1.414391,0.369385,0.629484,19.0,1.0,176,19600104
4,0.000000e+00,278.21744,-0.000759,0.000000,0.000000,2.384617,2.427243,2.074848,1.414734,0.369385,0.629484,19.0,1.0,176,19600105
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61100,1.561467e-05,297.98180,-0.002948,0.000023,0.000000,1.572445,1.498507,1.725455,1.833000,0.459558,0.540442,19.0,1.0,203,19650724
61101,6.989036e-04,297.88626,-0.002722,0.000055,0.000000,1.478015,1.478247,1.715775,1.832184,0.459558,0.540442,19.0,1.0,203,19650725
61102,2.530530e-03,294.35486,-0.002550,0.000122,0.000000,1.571812,1.486924,1.719815,1.831428,0.459558,0.540442,19.0,1.0,203,19650726
61103,4.566765e-04,293.97090,-0.002378,0.000040,0.000000,1.547944,1.492321,1.720400,1.830818,0.459558,0.540442,19.0,1.0,203,19650727
