[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/casadoj/efas_hydro.git/HEAD?urlpath=%2Fdoc%2Ftree%2F.%2Fnotebook%2Fglofas_calibration.ipynb)

# GloFAS calibration time series

This notebook creates a CSV file with the time series of daily discharge needed for the GloFAS calibration. First, it will extract the gauging stations available in the Hydrologial Data Management Service, and then it will download the discharge time series for those stations.

As a result, it produces two ZIP files (_stations.zip_, _timeseries.zip_) that contain, respectively, a shapefile of the station metadata and a CSV file with the discharge time series for the selected stations and period.

In [None]:
import pandas as pd
from datetime import datetime
from tqdm.auto import tqdm
from pathlib import Path
import shutil

from efashydro.stations import get_stations, plot_stations
from efashydro.timeseries import get_timeseries

## Configuration 

In the cell below, to fill in the `USER` and `PASSWORD` below with your credentials, and define the filters for both stations ad time series.

In [None]:
# HDMS API configuration
USER = 'xxxxxxxx'
PASSWORD = 'yyyyyyyy'

# station filters
KIND = 'river'
COUNTRY_ID = 'PT'
PROVIDER_ID = None

# time series filters
SERVICE = 'nhoperational24hw'
VARIABLE = ['D']
START = datetime(1980, 1, 1)
END = datetime(2024, 1, 1)

## `get_stations()`

The following cell extracts the stations in the database that pass the filters `KIND`, `COUNTRY_ID` and `PROVIDED_ID` defined in the configuration section.

In [None]:
stations = get_stations(
    user=USER, 
    password=PASSWORD, 
    kind=KIND,
    country_id=COUNTRY_ID,
    provider_id=PROVIDER_ID
)
print(f'Metadata for {len(stations)} stations were extracted')

plot_stations(
    geometry=stations.geometry,
    area=stations.CATCH_SKM,
    # extent=[-10, 4.5, 35.5, 44]
)

### Export

In [None]:
path_stations = Path('./stations/')
path_stations.mkdir(parents=True, exist_ok=True)
filename = 'stations'
if COUNTRY_ID is not None:
    filename += f'_{COUNTRY_ID}'
if PROVIDER_ID is not None:
    filename += f'_{PROVIDER_ID}'

# as shapefile
stations.to_file(path_stations / f'{filename}.shp')
# as CSV
stations.drop('geometry', axis=1).to_csv(path_stations / f'{filename}.csv')

# compress the stations folder
zipfile = shutil.make_archive('stations', 'zip', path_stations)
print(f'You can now download the compressed file {zipfile} from the file browser.')

The result is a `geopandas.GeoDataFrame` of stations and their metadata. As a `geopandas` object, the stations have associated their geographical location and can be exported to a shapefile to be used in a GIS software.

## `get_timeseries()`

The cell below extracts the time series for the stations selected above. In the configuration section, you must have set up the `SERVICE`, `VARIABLE`, and period of interest (`START`, `END`).

In [None]:
time_series = {}
for efas_id in tqdm(stations.index, desc='Load timeseries'):
    time_series[efas_id] = get_timeseries(
        user=USER,
        password=PASSWORD,
        station_id=efas_id,
        service=SERVICE,
        variable=VARIABLE, 
        start=START,
        end=END
    )

The result is a dictionary of `pandas.DataFrames`, where every key is the ID of a station and the value the time series available for that station. These `pandas.DataFrames` could be saved as CSV files, for instance.

### Export

In [None]:
path_timeseries = Path('./timeseries/')
path_timeseries.mkdir(parents=True, exist_ok=True)
filename = 'discharge'
if COUNTRY_ID is not None:
    filename += f'_{COUNTRY_ID}'
if PROVIDER_ID is not None:
    filename += f'_{PROVIDER_ID}'

# concatenate all the time series
ts_list = []
for efas_id, df in time_series.items():
    df.columns = [efas_id]
    ts_list.append(df)
ts_df = pd.concat(ts_list, axis=1)

# make sure the time series cover all the period
dates = pd.date_range(START, END, freq='D')
if len(ts_df) != len(dates):
    ts_df = ts_df.reindex(dates)
    ts_df.index.name = 'time'

# save as CSV file
ts_df.to_csv(path_timeseries / f'{filename}.csv')

# compress the time series folder
zipfile = shutil.make_archive('timeseries', 'zip', path_timeseries)
print(f'You can now download the compressed file {zipfile} from the file browser.')