# Select reservoirs and study period
***

**Author:** Chus Casado Rodríguez<br>
**Date:** 03-06-2025<br>

**Introduction:**<br>
This notebook reads all the attributes and time series in the dataset and selects the reservoirs appropriate for testing the different reservoir routines. Several conditions need to be met for a reservoir to be selected:

1. It must contain observed time series of the variables `inflow`, `storage` and `outflow`.
2. The longest period without gaps in those three time series needs to be longer than 4 years.
3. The bias between the observed inflow and outflow timeseries needs to be between 0.7 and 1.3.

These conditions are specified in the YML configuration file.

In [1]:
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import pickle

from lisfloodreservoirs.utils import DatasetConfig
from lisfloodreservoirs import read_attributes, read_timeseries
from lisfloodreservoirs.utils.timeseries import define_period

## Configuration

In [2]:
cfg = DatasetConfig('config_ResOpsUS_v21.yml')

PATH_OUT = cfg.PATH_RESOPS / cfg.VERSION / 'selection'
PATH_OUT.mkdir(parents=True, exist_ok=True)
print(f'Selected reservoirs and periods will be saved in:\n\t{PATH_OUT}\n')

variables = ['inflow', 'storage', 'outflow']

Selected reservoirs and periods will be saved in:
	Z:\nahaUsers\casadje\datasets\reservoirs\ResOpsUS\v2.1\selection



## Data

### Attributes

In [3]:
# import all tables of attributes
attributes = read_attributes(cfg.PATH_ATTRS, reservoirs=None)
print(f'{attributes.shape[0]} reservoirs in the attribute tables')

# keep only reservoirs with all observed variables
mask = attributes[[var.upper() for var in variables]].all(axis=1)
attributes = attributes[mask]
attributes.sort_index(axis=0, inplace=True)
print('{0} reservoirs include observed time series for all variables: {1}'.format(
    attributes.shape[0],
    ', '.join(variables)
))

# keep reservoirs that comply with the catchment area and total storage conditions
if cfg.MIN_AREA is not None:
    mask_area = attributes.CATCH_SKM >= cfg.MIN_AREA
    attributes = attributes[mask_area]
    print('{0} reservoirs comply with the minimum catchment area: {1} km²'.format(
        attributes.shape[0],
        cfg.MIN_AREA
    ))
if cfg.MIN_VOL is not None:
    mask_volume = attributes.CAP_MCM >= cfg.MIN_VOL
    attributes = attributes[mask_volume]
    print('{0} reservoirs comply with the minimum storage capacity: {1} hm3'.format(
        attributes.shape[0],
        cfg.MIN_VOL
    ))

677 reservoirs in the attribute tables
284 reservoirs include observed time series for all variables: inflow, storage, outflow
268 reservoirs comply with the minimum catchment area: 50 km²
268 reservoirs comply with the minimum storage capacity: 10 hm3


### Time series

In [4]:
# read time series
timeseries = read_timeseries(
    cfg.PATH_TS / 'csv',
    attributes.index
)
print(f'{len(timeseries)} reservoirs with timeseries\n')

  0%|          | 0/268 [00:00<?, ?it/s]

268 reservoirs with timeseries



In [5]:
# remove reservoirs with excessively low degree of regulation
if cfg.MIN_DOR is not None:
    dor = pd.Series({
        grand_id: attributes.loc[grand_id, 'CAP_MCM'] * 1e6 / (ts.inflow.mean() * 365 * 24 * 3600) for grand_id, ts in timeseries.items()
    },
                    name='DOR')
    mask_dor = dor > cfg.MIN_DOR
    attributes = attributes[mask_dor]
    timeseries = {grand_id: ts for grand_id, ts in timeseries.items() if mask_dor[grand_id]}
    print('{0} reservoirs comply with the minimum degre of regulation: {1}'.format(attributes.shape[0],
                                                                                       cfg.MIN_DOR))

254 reservoirs comply with the minimum degre of regulation: 0.08


## Selection

In [6]:
bias = {}
periods = {}
for grand_id, ts in tqdm(timeseries.items(), desc='select reservoirs', total=len(timeseries)):
    
    # select study period
    # start, end = define_period(ts[['inflow', 'precip_point', 'evapo_point']])
    start, end = define_period(ts[variables + ['precip_point', 'evapo_point']])
    # start, end = define_period(ts.inflow)
    if np.isnan(start) or np.isnan(end):
        print(f'{grand_id:>4} discarded for lack of records')
        continue
    duration = (end - start) / np.timedelta64(1, 'D')
    if duration >= cfg.MIN_YEARS * 365:
        ts = ts.loc[start:end]
    else:
        print(f'{grand_id:>4} discarded for lack of records:\t{duration:.0f} days')
        continue
        
    # bias between inflow and outflow
    bias[grand_id] = ts.outflow.mean() / ts.inflow.mean()
    if (1 - cfg.TOL_BIAS) <= bias[grand_id] <= (1 + cfg.TOL_BIAS):
        # save periods
        periods[str(grand_id)] = {
            'start_dates': [pd.Timestamp(start)],
            'end_dates': [pd.Timestamp(end)]
        }
    else:
        print(f'{grand_id:>4} discarded for excesive bias:\t{bias[grand_id]:.2f}')
    
print(f'\n{len(periods)} reservoirs selected')

select reservoirs:   0%|          | 0/254 [00:00<?, ?it/s]

 111 discarded for lack of records:	348 days
 114 discarded for excesive bias:	0.46
 135 discarded for lack of records:	1 days
 138 discarded for lack of records:	2 days
 144 discarded for lack of records:	1362 days
 158 discarded for lack of records:	1325 days
 163 discarded for excesive bias:	0.00
 173 discarded for lack of records:	381 days
 185 discarded for excesive bias:	0.62
 190 discarded for lack of records:	1115 days
 203 discarded for lack of records:	124 days
 210 discarded for excesive bias:	0.48
 223 discarded for lack of records:	863 days
 295 discarded for lack of records:	975 days
 299 discarded for lack of records
 320 discarded for lack of records:	882 days
 338 discarded for lack of records:	513 days
 347 discarded for lack of records:	10 days
 374 discarded for lack of records:	1445 days
 382 discarded for lack of records:	439 days
 385 discarded for lack of records:	1456 days
 386 discarded for excesive bias:	0.33
 437 discarded for lack of records:	391 days
 470 

### Export

In [7]:
# export list of selected reservoirs
with open(PATH_OUT / 'reservoirs.txt', 'w') as f:
    for grand_id in periods.keys():
        f.write(f'{grand_id}\n')

In [8]:
# export selected study period
with open(PATH_OUT / 'periods.pkl', 'wb') as f:
    pickle.dump(periods, f)