# Select reservoirs and study period
***

**Author:** Chus Casado Rodríguez<br>
**Date:** 02-08-2024<br>

**Introduction:**<br>
This notebook reads all the attributes and time series in the dataset and selects the reservoirs appropriate for testing the different reservoir routines. Several conditions need to be met for a reservoir to be selected:

1. It must contain observed time series of the variables `inflow`, `storage` and `outflow`. <font color='red'>NOT IN RESOPSES!!!</font>
2. The longest period without gaps in those three time series needs to be longer than 8 years.
3. The bias between the observed inflow and outflow timeseries needs to be between 0.7 and 1.3.

In [1]:
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from pathlib import Path
import yaml
import pickle

from lisfloodreservoirs import read_attributes, read_timeseries
from lisfloodreservoirs.utils import DatasetConfig
from lisfloodreservoirs.utils.timeseries import define_period #,create_demand

## Configuration

In [4]:
cfg = DatasetConfig('config_dataset.yml')

PATH_OUT = cfg.PATH_RESOPS / cfg.VERSION / 'selection'
PATH_OUT.mkdir(parents=True, exist_ok=True)
print(f'Selected reservoirs and periods will be saved in:\n\t{PATH_OUT}\n')

inflow_var = 'inflow_efas5_bc'
variables = [inflow_var, 'storage', 'outflow']

Selected reservoirs and periods will be saved in:
	Z:\nahaUsers\casadje\datasets\reservoirs\ResOpsES\v3.0\selection



## Data

### Attributes

In [5]:
# import all tables of attributes
attributes = pd.read_csv(cfg.PATH_ATTRS / 'combined.csv', index_col='GRAND_ID') #read_attributes(PATH_DATA / 'attributes', reservoirs=None)
print(f'{attributes.shape[0]} reservoirs in the attribute tables')

# # keep only reservoirs with all observed variables
# mask = pd.concat([attributes[var.upper()] == 1 for var in variables], axis=1).all(axis=1)
# attributes = attributes[mask]
# attributes.sort_index(axis=0, inplace=True)
# print('{0} reservoirs include observed time series for all variables: {1}'.format(attributes.shape[0],
#                                                                                 ', '.join(variables)))

# keep reservoirs that comply with the catchment area and total storage conditions
if cfg.MIN_AREA is not None:
    mask_area = attributes.CATCH_SKM >= cfg.MIN_AREA
    attributes = attributes[mask_area]
    print('{0} reservoirs comply with the minimum catchment area: {1} km²'.format(attributes.shape[0],
                                                                                  cfg.MIN_AREA))
if cfg.MIN_VOL is not None:
    mask_volume = attributes.CAP_MCM >= cfg.MIN_VOL
    attributes = attributes[mask_volume]
    print('{0} reservoirs comply with the minimum storage capacity: {1} hm3'.format(attributes.shape[0],
                                                                                    cfg.MIN_VOL))

if cfg.MIN_DOR is not None:
    mask_dor = attributes.DOR >= cfg.MIN_DOR
    attributes = attributes[mask_dor]
    print('{0} reservoirs comply with the minimum degree of regulation: {1}'.format(attributes.shape[0],
                                                                                    cfg.MIN_DOR))

if cfg.MIN_DOD is not None:
    mask_dod = attributes.DOD_M >= cfg.MIN_DOD
    attributes = attributes[mask_dod]
    print('{0} reservoirs comply with the minimum degree of disruptivity: {1} m'.format(attributes.shape[0],
                                                                                        cfg.MIN_DOD))

206 reservoirs in the attribute tables
206 reservoirs comply with the minimum catchment area: 25 km²
206 reservoirs comply with the minimum storage capacity: 10 hm3
177 reservoirs comply with the minimum degree of regulation: 0.08
144 reservoirs comply with the minimum degree of disruptivity: 0.06 m


#### Time series

In [6]:
# read time series
timeseries = read_timeseries(cfg.PATH_TS / 'csv',
                             attributes.index)
print(f'{len(timeseries)} reservoirs with timeseries\n')

144 reservoirs with timeseries



## Selection

In [7]:
bias = {}
periods = {}
for grand_id, ts in tqdm(timeseries.items(), desc='select reservoirs', total=len(timeseries)):
    
    # select study period
    start, end = define_period(ts[variables])
    if np.isnan(start) or np.isnan(end):
        print(f'{grand_id:>4} discarded for lack of records')
        continue
    duration = (end - start) / np.timedelta64(1, 'D')
    if duration >= cfg.MIN_YEARS * 365:
        ts = ts.loc[start:end]
    else:
        print(f'{grand_id:>4} discarded for lack of records:\t{duration:.0f} days')
        continue
        
    # bias between inflow and outflow
    bias[grand_id] = ts.outflow.mean() / ts[inflow_var].mean()
    if (1 - cfg.TOL_BIAS) <= bias[grand_id] <= (1 + cfg.TOL_BIAS):
        # save periods
        periods[grand_id] = {
            'start': start,
            'end': end
        }
    else:
        print(f'{grand_id:>4} discarded for excesive bias:\t{bias[grand_id]:.2f}')
    
print(f'\n{len(periods)} reservoirs selected')

select reservoirs:   0%|          | 0/144 [00:00<?, ?it/s]

2656 discarded for lack of records:	1915 days
2661 discarded for lack of records:	839 days
2664 discarded for lack of records:	2098 days
2667 discarded for lack of records:	2098 days
2677 discarded for lack of records:	2901 days
2699 discarded for excesive bias:	0.64
2812 discarded for excesive bias:	1.50
2821 discarded for lack of records:	1460 days
2824 discarded for excesive bias:	0.66
2833 discarded for excesive bias:	0.61
2839 discarded for excesive bias:	0.69
2855 discarded for excesive bias:	1.30
2866 discarded for excesive bias:	0.62
2878 discarded for excesive bias:	0.69
2880 discarded for excesive bias:	0.68
2895 discarded for lack of records:	2098 days
3479 discarded for lack of records:	2098 days
3485 discarded for lack of records:	2561 days
3492 discarded for lack of records:	2098 days
3504 discarded for lack of records:	2545 days
6829 discarded for lack of records:	2190 days
7052 discarded for excesive bias:	0.27
7056 discarded for lack of records:	2678 days

121 reservoi

### Export

In [8]:
# export list of selected reservoirs
with open(PATH_OUT / 'reservoirs.txt', 'w') as f:
    for grand_id in periods.keys():
        f.write(f'{grand_id}\n')

In [9]:
# export selected study period
with open(PATH_OUT / 'periods.pkl', 'wb') as f:
    pickle.dump(periods, f)