# ResOpsBR: select reservoirs and study period
***

**Author:** Chus Casado Rodríguez<br>
**Date:** 18-07-2025<br>

**Introduction:**<br>
This notebook reads all the attributes and time series in the dataset and selects the reservoirs appropriate for testing the different reservoir routines. Several conditions need to be met for a reservoir to be selected:

1. It must contain observed time series of the variables `inflow`, `storage` and `outflow`.
2. The longest period without gaps in those three time series needs to be longer than 4 years.
3. The bias between the observed inflow and outflow timeseries needs to be between 0.7 and 1.3.

These conditions are specified in the YML configuration file.

In [1]:
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from pathlib import Path
import pickle

from lisfloodreservoirs.utils import DatasetConfig
from lisfloodreservoirs import read_attributes, read_timeseries
from lisfloodreservoirs.utils.timeseries import define_period

## Configuration

In [2]:
cfg = DatasetConfig('config_ResOpsBR_v11.yml')

PATH_OUT = cfg.PATH_RESOPS / cfg.VERSION / 'selection'
PATH_OUT.mkdir(parents=True, exist_ok=True)
print(f'Selected reservoirs and periods will be saved in:\n\t{PATH_OUT}\n')

variables = ['inflow', 'storage', 'outflow']

Selected reservoirs and periods will be saved in:
	Z:\nahaUsers\casadje\datasets\reservoirs\ResOpsBR\v1.1\selection



## Data

### Attributes

In [5]:
# import all tables of attributes
attributes = read_attributes(
    cfg.PATH_ATTRS, 
    reservoirs=None,
    index_col='GDW_ID'
)
print(f'{attributes.shape[0]} reservoirs in the attribute tables')

# # keep only reservoirs with all observed variables
# mask = attributes[[var.upper() for var in variables]].all(axis=1)
# attributes = attributes[mask]
# attributes.sort_index(axis=0, inplace=True)
# print('{0} reservoirs include observed time series for all variables: {1}'.format(
#     attributes.shape[0],
#     ', '.join(variables)
# ))

# keep reservoirs that comply with the catchment area and total storage conditions
if cfg.MIN_AREA is not None:
    mask_area = attributes.CATCH_SKM >= cfg.MIN_AREA
    attributes = attributes[mask_area]
    print('{0} reservoirs comply with the minimum catchment area: {1} km²'.format(
        attributes.shape[0],
        cfg.MIN_AREA
    ))
if cfg.MIN_VOL is not None:
    mask_volume = attributes.CAP_MCM >= cfg.MIN_VOL
    attributes = attributes[mask_volume]
    print('{0} reservoirs comply with the minimum storage capacity: {1} hm3'.format(
        attributes.shape[0],
        cfg.MIN_VOL
    ))

143 reservoirs in the attribute tables
142 reservoirs comply with the minimum catchment area: 50 km²
131 reservoirs comply with the minimum storage capacity: 10 hm3


#### Time series

In [None]:
# read time series
timeseries = read_timeseries(
    cfg.PATH_TS / 'csv',
    attributes.index, 
)
print(f'{len(timeseries)} reservoirs with timeseries\n')

# keep only reservors with all variables
timeseries = {ID: ts for ID, ts in timeseries.items() if len(ts.columns.intersection(variables)) == len(variables)}
attributes = attributes.loc[list(timeseries)]
print(f'{len(timeseries)} reservoirs with timeseries or all variables\n')

# remove reservoirs with excessively low degree of regulation
if cfg.MIN_DOR is not None:
    dor = pd.Series({
        gdw_id: attributes.loc[gdw_id, 'CAP_MCM'] * 1e6 / (ts.inflow.mean() * 365 * 24 * 3600) for gdw_id, ts in timeseries.items()
    }, name='DOR')
    mask_dor = dor > cfg.MIN_DOR
    attributes = attributes[mask_dor]
    timeseries = {gdw_id: ts for gdw_id, ts in timeseries.items() if mask_dor[gdw_id]}
    print('{0} reservoirs comply with the minimum degre of regulation: {1}'.format(
        attributes.shape[0],
        cfg.MIN_DOR)
         )

## Selection

In [9]:
bias = {}
periods = {}
for gdw_id, ts in tqdm(timeseries.items(), desc='select reservoirs', total=len(timeseries)):
    
    # select study period
    start, end = define_period(ts[variables + ['precip_point', 'evapo_point']])
    if np.isnan(start) or np.isnan(end):
        print(f'{gdw_id:>4} discarded for lack of records')
        continue
    duration = (end - start) / np.timedelta64(1, 'D')
    if duration >= cfg.MIN_YEARS * 365:
        ts = ts.loc[start:end]
    else:
        print(f'{gdw_id:>4} discarded for lack of records:\t{duration:.0f} days')
        continue
        
    # bias between inflow and outflow
    bias[gdw_id] = ts.outflow.mean() / ts.inflow.mean()
    if (1 - cfg.TOL_BIAS) <= bias[gdw_id] <= (1 + cfg.TOL_BIAS):
        # save periods
        periods[str(gdw_id)] = {
            'start_dates': [pd.Timestamp(start)],
            'end_dates': [pd.Timestamp(end)]
        }
    else:
        print(f'{gdw_id:>4} discarded for excesive bias:\t{bias[gdw_id]:.2f}')
    
print(f'\n{len(periods)} reservoirs selected')

select reservoirs:   0%|          | 0/58 [00:00<?, ?it/s]

 200 discarded for excesive bias:	0.64
1144 discarded for lack of records:	726 days
1191 discarded for excesive bias:	1.45
1196 discarded for lack of records:	806 days
6876 discarded for lack of records:	1185 days
6888 discarded for lack of records:	244 days
7274 discarded for lack of records:	462 days
7420 discarded for lack of records:	1231 days

50 reservoirs selected


### Export

In [10]:
# export list of selected reservoirs
with open(PATH_OUT / 'reservoirs.txt', 'w') as f:
    for gdw_id in periods.keys():
        f.write(f'{gdw_id}\n')

In [11]:
# export selected study period
with open(PATH_OUT / 'periods.pkl', 'wb') as f:
    pickle.dump(periods, f)