# Reservoir calibration with SCE-UA
***

**Autor:** Chus Casado<br>
**Date:** 23-06-2024<br>

**Introduction:**<br>
This code calibrates a reservoir model using the genetic algorithm SCE-UA (Shuffle Complex Evolution-University of Arizona) ([Duan et al., 2023](https://link.springer.com/article/10.1007/BF00939380)).

The setup of the calibration (reservoir model, target variable(s), etc) is defined in the _config*.yml_ file.

**To do:**<br>
* [ ] Make sure that all the input data is defined in the configuration file.
* [ ] Convert this notebook into a executable script with several arguments:
    * [ ] `--config-file` to define the configuration file
    * [ ] `--id` to define a list of reservoirs to be calibrated. By default it would include all the reservoirs in the input data.
    * [ ] `--overwrite` to allow for overwriting the results of previous calibrations. By default is False, so the calibration would skip a reservoir if already calibrated with the same setup.
* [ ] Should we apply split sampling or not? If yes, what would be the best parameter set, the one with the best performance in the calibration or the validation set?
    
All these arguments can be added to the configuration file, instead.

**Questions:**<br>

In [1]:
import sys
sys.path.append('../../src/')
import os
os.environ['USE_PYGEOS'] = '0'
import numpy as np
import pandas as pd
import geopandas as gpd
from datetime import datetime, timedelta
import spotpy
# from spotpy.objectivefunctions import kge
from tqdm.auto import tqdm
from pathlib import Path
import yaml

from lisfloodreservoirs.models import get_model
from lisfloodreservoirs.calibration import get_calibrator
from lisfloodreservoirs.utils.metrics import KGEmod

## Configuration

In [2]:
with open('config_linear_2var.yml', 'r', encoding='utf8') as ymlfile:
    cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)

### Paths
# PATH_GLOFAS = Path(cfg['paths']['GloFAS'])
PATH_RESOPS = Path(cfg['paths']['ResOpsUS'])
# PATH_GRAND = Path(cfg['paths']['GRanD'])

### Reservoir model
MODEL = cfg['simulation']['model'].lower()
MODEL_CFG = cfg['simulation'].get('config', {})

# calibration
ALGORITHM = cfg['calibration']['algorithm'].lower()
TARGET = cfg['calibration']['target']
MAX_ITER = cfg['calibration'].get('max_iter', 1000)
COMPLEXES = cfg['calibration'].get('COMPLEXES', 4)
TRAIN_SIZE = cfg['calibration'].get('TRAIN_SIZE', 0.7)
# # sequential mode
# parallel = "seq"  

# results will be saved in this path
PATH_OUT = Path('./') / MODEL / 'calibration' / ALGORITHM
if len(TARGET) == 1:
    PATH_OUT = PATH_OUT / 'univariate' / TARGET[0]
elif len(TARGET) == 2:
    PATH_OUT /= 'bivariate'
else:
    print('ERROR. Only univariate or bivariate calibrations are supported')
    sys.exit()
PATH_OUT.mkdir(parents=True, exist_ok=True)
print(f'Results will be saved in {PATH_OUT}')

Results will be saved in linear\calibration\sceua\bivariate


In [3]:
variables = ['inflow', 'storage', 'outflow']

## Data

### Attributes

In [4]:
# import all tables of attributes
path_attrs = PATH_RESOPS / 'attributes'
try:
    attributes = pd.concat([pd.read_csv(file, index_col='GRAND_ID') for file in path_attrs.glob('*.csv')], axis=1, join='inner')
except Exception as e:
    raise ValueError('ERROR while reading attribute tables: {}'.format(e)) from e
print(f'{attributes.shape[0]} reservoirs in the attribute tables')

# keep only reservoirs with all observed variables
mask = pd.concat([attributes[var.upper()] == 1 for var in variables], axis=1).all(axis=1)
attributes = attributes[mask]
print('{0} reservoirs include observed timeseris for all variables: {1}'.format(attributes.shape[0],
                                                                                *variables))

118 reservoirs in the attribute tables
63 reservoirs include observed timeseris for all variables: inflow


#### Time series

Time series of reservoirs simulated in GloFAS, as the GloFAS simulated inflow will be used as the forcing of the reservoir module.

In [5]:
path_ts = PATH_RESOPS / 'time_series' / 'csv'
timeseries = {}
for grand_id in tqdm(attributes.index, desc='reading time series'):
    file = path_ts / f'{grand_id}.csv'
    if file.is_file():
        ts = pd.read_csv(file, parse_dates=True, index_col='date')
    else:
        print(f"File {file} doesn't exist")
        continue
    # select columns associated with variables of interest
    select_columns = [col for col in ts.columns if col.split('_')[0] in variables]
    ts = ts[select_columns]
    if not ts.columns.str.contains('glofas').any():
        print(f'{grand_id} does not contain GloFAS simulated time series')
        continue
    # invert normalization
    capacity = attributes.loc[grand_id, 'CAP_MCM'] * 1e6
    ts *= capacity
    ts.iloc[:, ts.columns.str.contains('inflow')] /= (24 * 3600)
    ts.iloc[:, ts.columns.str.contains('outflow')] /= (24 * 3600)
    # save time series
    timeseries[grand_id] = ts
    
print(f'{len(timeseries)} reservoirs with timeseries')

reading time series:   0%|          | 0/63 [00:00<?, ?it/s]

41 does not contain GloFAS simulated time series
182 does not contain GloFAS simulated time series
185 does not contain GloFAS simulated time series
600 does not contain GloFAS simulated time series
59 reservoirs with timeseries


## Calibration

In [18]:
for ID in tqdm(attributes.index): #[393]):
    
    # file where the calibration results will be saved
    dbname = f'{PATH_OUT}/{ID:03}_samples'
    if os.path.isfile(dbname + '.csv'):
        print(f'The file {dbname}.csv already exists.')
        continue   

    ## TIME SERIES
    
    try:
        # observed time series
        obs = timeseries[ID][ts.columns.intersection(variables)].copy()
        obs[obs < 0] = np.nan
        # GloFAS simulated time series
        glofas = timeseries[ID][[f'{var}_glofas' for var in variables]]
        glofas.columns = variables
        start, end = glofas.first_valid_index(), glofas.last_valid_index()
        
        # split sampling
        start_obs = max([obs[var].first_valid_index() for var in ['storage', 'outflow']])
        end_obs = min([obs[var].last_valid_index() for var in ['storage', 'outflow']])
        cal_days = timedelta(days=np.floor((end_obs - start_obs).days * TRAIN_SIZE))
        start_cal = end_obs - cal_days

        # define train and test time series
        x_train = glofas.inflow[start_cal:end_obs]
        y_train = obs.loc[start_cal:end_obs, ['storage', 'outflow']]
        x_test = glofas.inflow[start:start_cal]
        y_test = obs.loc[start_obs:start_cal, ['storage', 'outflow']]
    except Exception as e:
        print(f'ERROR. The time series of reservoir {grand_id} could not be set up\n', e)
        continue

    ## SET UP SPOTPY
    
    try:
        # storage limits (m3)
        Vtot, Vmin = attributes.loc[grand_id, ['CAP_MCM', 'Vmin']]
        Vtot *= 1e6
        Vmin *= Vtot
        # outflow limits (m3/s)
        Qmin = attributes.loc[grand_id, 'Qmin']

        # initialize the calibration setup of the LISFLOOD reservoir routine
        setup = get_calibrator(MODEL,
                               inflow=x_train,
                               storage=y_train.storage, 
                               outflow=y_train.outflow,
                               Vmin=Vmin,
                               Vtot=Vtot,
                               Qmin=Qmin,
                               target=TARGET,
                               obj_func=KGEmod)

        # define the sampling method
        sceua = spotpy.algorithms.sceua(setup, dbname=dbname, dbformat='csv', save_sim=False)
    except Exception as e:
        print(f'ERROR. The SpotPY set up of reservoir {ID} could not be done\n', e)
        continue
        
    ## CALIBRATION
    
    try:
        # start the sampler
        sceua.sample(MAX_ITER, ngs=COMPLEXES, kstop=3, pcento=0.01, peps=0.1)
    except Exception as e:
        print(f'ERROR. While sampling the reservoir {ID}\n', e)
        continue

    ### VALIDATION
    
    # read CSV of results
    try:
        results = pd.read_csv(f'{dbname}.csv')
        results.index.name = 'iteration'
        parcols = [col for col in results.columns if col.startswith('par')]
    except Exception as e:
        print(f'ERROR while reading results form reservoir {ID}\n', e)
        continue
    
    # compute validation KGE of each simulation and overwrite CSV file
    try:       
        results['like_val'] = np.nan
        for i in tqdm(results.index):
            sim = setup.simulation(pars=results.loc[i, parcols],
                                   inflow=x_test,
                                   storage_init=y_test.storage[0])
            results.loc[i, 'like_val'] = np.sqrt(np.sum([(1 - KGEmod(y_test[var], sim[var])[0])**2 for var in TARGET]))
        results.to_csv(f'{dbname}.csv', index=False, float_format='%.8f')
    except Exception as e:
        print(f'ERROR while computing KGE for the validation period in reservoir {ID}\n', e)
    
    # select optimal parameters (best validation) and export them
    try:
        best_iter = results.like_val.idxmin() # results.like1.idxmin()
        parvalues = {col[3:]: float(results.loc[best_iter, col]) for col in parcols}
        with open(f'{PATH_OUT}/{ID:03}_optimal_parameters.yml', 'w') as file:
            yaml.dump(parvalues, file)
    except Exception as e:
        print(f'ERROR while searching for optimal parameters in reservoir {ID}\n', e)
        continue
    
    # simulate the whole observed period with the optimal parameterization
    try:       
        if MODEL.lower() == 'linear':
            kwargs = {'Vmin': Vmin, 'Vtot': Vtot, 'Qmin': Qmin, 'T': parvalues['T']}
        elif MODEL.lower() == 'lisflood':
            Vf = parvalues['FFf'] * Vtot
            Vn = Vmin + parvalues['alpha'] * (Vf - Vmin)
            Vn_adj = Vn + parvalues['beta'] * (Vf - Vn)
            Qf = setup.inflow.quantile(parvalues['QQf'])
            Qn = parvalues['gamma'] * Qf
            k = parvalues['k']
            kwargs = {'Vmin': Vmin, 'Vn': Vn, 'Vn_adj': Vn_adj, 'Vf': Vf, 'Vtot': Vtot, 'Qmin': Qmin, 'Qn': Qn, 'Qf': Qf}
        else:
            raise ValueError(f'Model {MODEL} is not supported')
        res = get_model(MODEL, **kwargs)  
        sim = res.simulate(glofas.inflow[start_obs:end_obs],
                           obs.storage[start_obs])
        
        # performance
        performance = pd.DataFrame(index=['KGE', 'alpha', 'beta', 'rho'], columns=obs.columns)
        for var in performance.columns:
            try:
                performance[var] = KGEmod(obs[var], sim[var])
            except:
                continue
        file_out = PATH_OUT / f'{ID:03}_performance.csv'
        performance.to_csv(file_out, float_format='%.3f')
        
        res.scatter(sim,
                    obs,
                    norm=False,
                    title=ID,
                    # save=PATH_OUT / f'{ID:03}_scatter.jpg'
                   )
        
        res.lineplot({'GloFAS': glofas, 'cal': sim},
                     obs,
                     # save=PATH_OUT / f'{ID:03}_lineplot.jpg'
                    )
    except Exception as e:
        print(f'ERROR while simulating with optimal parameters in reservoir {ID}\n', e)

  0%|          | 0/63 [00:00<?, ?it/s]

ERROR. The time series of reservoir 7313 could not be set up
 41
Initializing the  Shuffled Complex Evolution (SCE-UA) algorithm  with  1000  repetitions
The objective function will be minimized
Starting burn-in sampling...


  0%|          | 0/9714 [00:00<?, ?it/s]

Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
* Database file 'linear\calibration\sceua\bivariate/131_samples.csv' created.


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

4 of 1000, minimal objective function=0.852406, time remaining: 00:08:30


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

8 of 1000, minimal objective function=0.852406, time remaining: 00:09:24


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

11 of 1000, minimal objective function=0.852406, time remaining: 00:09:47


  0%|          | 0/9714 [00:00<?, ?it/s]

Burn-in sampling completed...
Starting Complex Evolution...
ComplexEvo loop #1 in progress...


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

19 of 1000, minimal objective function=0.770481, time remaining: 00:09:36


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

28 of 1000, minimal objective function=0.770481, time remaining: 00:08:40


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

36 of 1000, minimal objective function=0.768849, time remaining: 00:08:10


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

ComplexEvo loop #2 in progress...


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

50 of 1000, minimal objective function=0.768643, time remaining: 00:07:26


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

58 of 1000, minimal objective function=0.768643, time remaining: 00:07:13


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

65 of 1000, minimal objective function=0.768643, time remaining: 00:07:01


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

73 of 1000, minimal objective function=0.767358, time remaining: 00:06:50
ComplexEvo loop #3 in progress...


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

81 of 1000, minimal objective function=0.76733, time remaining: 00:06:41


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

89 of 1000, minimal objective function=0.76733, time remaining: 00:06:35


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

97 of 1000, minimal objective function=0.76733, time remaining: 00:06:29


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

105 of 1000, minimal objective function=0.76733, time remaining: 00:06:22
Objective function convergence criteria is now being updated and assessed...
Updated convergence criteria: 0.197888
ComplexEvo loop #4 in progress...


  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

  0%|          | 0/9714 [00:00<?, ?it/s]

KeyboardInterrupt: 