# ResOpsBR: time series
***

**Autor:** Chus Casado<br>
**Date:** 27-11-2024<br>

**Introduction:**<br>
This code creates the time series for the reservoirs in ResOpsMX. The time series include records from CONAGUA and simulations from GloFAS. The result is a time series that combines the observed data from CONAGUA with the simulation from GloFASv4. For each reservoir, these time series are exported both in CSV and a NetCDF format.

Records are cleaned to avoid errors:
    * Outliers in the **storage** time series are filtered by comparison with a moving median (window 7 days). If the relative difference of a given storage value and the moving median exceeds a threshold, the value is removed. This procedure is encapsulated in the function `lisfloodreservoirs.utils.timeseries.clean_storage()`
    * Outliers in the **inflow** time series are removed using two conditions: one based in the gradient, and the other using an estimated inflow based on the water balance. When both conditions are met, the value is removed. Since inflow time series cannot contain missing values when used in the reservoir simulation, a simple linear interpolation is used to fill in gaps up to 7 days. This procedure is encapsulated in the function `lisfloodreservoirs.utils.timeseries.clean_inflow()`.

**To do:**<br>
* [x] Plot time series
* [x] Make sure that there aren't negative values in the time series, nor zeros in storage.
* [ ] Check the quality of the data by closing the mass balance when possible. <font color='steelblue'>I've used the mass balance to identify errors in the inflow time series (function `clean_inflow`).</font>.
* [ ] Fill in the inflow time series with the mass balance, if possible. <font color='steelblue'>I've filled in gaps in the inflow time series with linear interpolation up to 7-day gaps (function `clean_inflow`).</font>.

In [6]:
import numpy as np
import pandas as pd
import xarray as xr
# from datetime import datetime, timedelta
from tqdm.auto import tqdm
from copy import deepcopy

from lisfloodreservoirs.utils import DatasetConfig
from lisfloodreservoirs import read_attributes
from lisfloodreservoirs.utils.plots import plot_resops, reservoir_analysis, compare_flows
from lisfloodreservoirs.utils.timeseries import clean_storage, clean_inflow, time_encoding, quantile_mapping

from utils_br import plot_timeseries_BR

## Configuration

In [2]:
cfg = DatasetConfig('config_dataset.yml')

print(f'Time series will be saved in {cfg.PATH_TS}')

Time series will be saved in Z:\nahaUsers\casadje\datasets\reservoirs\ResOpsBR\v1.0\time_series


## Data

### Attributes


In [3]:
# import all tables of attributes
attributes = read_attributes(cfg.PATH_ATTRS)
map_ana_grand = {sar_id: grand_id for grand_id, sar_id in attributes['SAR_ID'].iteritems()}
print(f'{attributes.shape[0]} reservoirs in the attribute tables')

100 reservoirs in the attribute tables


### Time series
#### ResOpsMX

In [4]:
# read time series
timeseries = {}
for sar_id, grand_id in tqdm(map_ana_grand.items()):
    file = cfg.PATH_OBS_TS / f'{sar_id}.csv'
    if file.is_file():

        ts = pd.read_csv(file, parse_dates=['date'], index_col='date')
        ts.volume_pct /= 100
        ts['volume_mcm'] = ts.volume_pct / 100 * attributes.loc[grand_id, 'CAP_MCM']
        # make sure there aren't gaps in the dates
        dates = pd.date_range(ts.index.min(), ts.index.max(), freq='D')
        if len(dates) > ts.shape[0]:
            ts = ts.reindex(dates)
            ts.index.name = 'date'

        # rename columns
        rename_cols = {
            'volume_mcm': 'storage',
            'level_m': 'elevation',
            'inflow_cms': 'inflow',
            'outflow_cms': 'outflow'
        }
        ts.rename(columns=rename_cols, inplace=True)
        
        # remove negative values
        ts[ts < 0] = np.nan
        # clean outliers in storage
        clean_storage(ts.storage, w=7, error_thr=.2, inplace=True)
        
        # trim time series to period with storage and outflow
        mask_availability = ts[['inflow', 'storage', 'outflow']].notnull().all(axis=1)
        if mask_availability.sum() == 0:
            continue
        start, end = ts[mask_availability].first_valid_index(), ts[mask_availability].last_valid_index()
        start, end = max(cfg.START, start), min(cfg.END, end)
        attributes.loc[grand_id, ['TIME_SERIES_START', 'TIME_SERIES_END']] = start, end
        ts = ts.loc[start:end]
        
        # save
        timeseries[grand_id] = ts.loc[start:end]
    else:
        print(f'File not found: {file}')
print(f'Time series were imported for {len(timeseries)} reservoirs')

  0%|          | 0/100 [00:00<?, ?it/s]

Time series were imported for 59 reservoirs


##### **Plot timeseries**

In [5]:
PATH_PLOTS = cfg.PATH_TS / 'plots'
PATH_PLOTS.mkdir(exist_ok=True)

for grand_id, ts in tqdm(timeseries.items()):
    max_storage = {
        'GRanD': attributes.loc[grand_id, 'CAP_MCM'],
        # 'BR': 
    }
    max_elevation = {
        'GRanD': attributes.loc[grand_id, 'ELEV_MASL'],
        # 'BR': 
    }
    title = '{0} - {1}'.format(grand_id, attributes.loc[grand_id, 'DAM_NAME'])
    plot_timeseries_BR(
        ts.storage,
        ts.elevation,
        ts[['outflow_turbine_cms', 'outflow_spillway_cms']],
        max_storage,
        max_elevation,
        # zlim=(attributes.loc[grand_id, 'NAME_MASL'] - attributes.loc[grand_id, 'DAM_HGT_M'] * 1.2, None),
        title=title,
        save=PATH_PLOTS / f'{grand_id}.jpg'
    )

  0%|          | 0/59 [00:00<?, ?it/s]

In [6]:
# grand_id = 1363 #1349 #1347 #1333
# ts = timeseries[grand_id]

# plot_resops(ts.storage, ts.elevation, outflow=ts.outflow,
#             capacity=attributes.loc[grand_id, ['NAME_MCM', 'NAMO_MCM']],
#             level=attributes.loc[grand_id, ['NAME_MASL', 'NAMO_MASL']])

# plot_resops(ts.storage, ts.area, outflow=ts.outflow,
#             capacity=attributes.loc[grand_id, ['NAME_MCM', 'NAMO_MCM']],
#             # level=attributes.loc[grand_id, ['NAME_MASL', 'NAMO_MASL']]
#            )

In [7]:
# convert to xarray.Dataset
xarray_list = []
for key, df in timeseries.items():
    ds = xr.Dataset.from_dataframe(df)
    ds = ds.assign_coords(GRAND_ID=key)
    xarray_list.append(ds)
obs = xr.concat(xarray_list, dim='GRAND_ID')

#### GloFAS

##### Inflow 

In [8]:
# import GloFAS simulation
sim = xr.open_dataset(cfg.PATH_SIM_TS / 'inflow.nc')
sim = sim.rename({'time': 'date', 'ID': 'GRAND_ID', 'dis': 'inflow'})

# bias correct
for grand_id in sim.GRAND_ID.data:
    
    inflow = sim['inflow'].sel(GRAND_ID=grand_id).to_pandas()
    inflow.name = 'inflow'
    ts = timeseries[grand_id]
    
    # compute net inflow
    if ('outflow' in ts.columns) & ('storage' in ts.columns):
        ΔS = ts.storage.diff().values
        net_inflow = ΔS * 1e6 / (24 * 3600) + ts.outflow
        net_inflow[net_inflow < 0] = 0
        net_inflow.name = 'net_inflow'

    # bias correct simulated inflow
    inflow_bc = quantile_mapping(obs=net_inflow,
                                 sim=inflow)
    inflow_bc.name = 'inflow_bc'
    
    # # plot raw vs bias-corrected inflow
    # compare_flows(ts.storage, ts.outflow, inflow, inflow_bc)
    
    # overwrite bias-corrected inflow
    sim['inflow'].loc[{'GRAND_ID': grand_id}] = inflow_bc.values

##### Meteo

In [9]:
# load meteorological time series
path_meteo_areal = cfg.PATH_RESOPS / 'ancillary' / 'catchstats'
variables = [x.stem for x in path_meteo_areal.iterdir() if x.is_dir()]
meteo_areal = xr.Dataset({f'{var}': xr.open_mfdataset(f'{path_meteo_areal}/{var}/*.nc')[f'{var}_mean'].compute() for var in variables})
meteo_areal['time'] = meteo_areal['time'] - np.timedelta64(24, 'h') # WARNING!! One day lag compared with LISFLOOD

# keep catchments in the attributes
IDs = list(attributes.index.intersection(meteo_areal.id.data))
meteo_areal = meteo_areal.sel(id=IDs)

# rename 'id' with the GRanD ID
meteo_areal = meteo_areal.rename({
    'id': 'GRAND_ID',
    'time': 'date',
    'e0': 'evapo_areal',
    'tp': 'precip_areal',
    '2t': 'temp_areal'
})

## Prepare dataset

### Convert units

In [10]:
if cfg.NORMALIZE:

    # reservoir attributes used to normalize the dataset
    area_sm = xr.DataArray.from_series(attributes.AREA_SKM) * 1e6 # m2
    capacity_cm = xr.DataArray.from_series(attributes.CAP_MCM) * 1e6 # m3
    catchment_sm = xr.DataArray.from_series(attributes.CATCH_SKM) * 1e6 # m2
    
    # Observed timeseries
    # -------------------
    for var, da in obs.items():
        # convert variables in hm3 to fraction of reservoir capacity [-]
        if var in ['storage', 'evaporation']:
            obs[f'{var}_norm'] = obs[var] * 1e6 / capacity_cm
        # convert variables in m3/s to fraction of reservoir capacity [-]
        elif var in ['inflow', 'outflow']:
            obs[f'{var}_norm'] = obs[var] * 24 * 3600 / capacity_cm

    # Simulated timeseries
    # -------------------
    for var, da in sim.items():
        # convert variables in hm3 to fraction of reservoir capacity [-]
        if var.split('_')[0] in ['storage']:
            sim[f'{var}_norm'] = sim[var] * 1e6 / capacity_cm
        # convert variables in m3/s to fraction of reservoir capacity [-]
        elif var.split('_')[0] in ['inflow', 'outflow']:
            sim[f'{var}_norm'] = sim[var] * 24 * 3600 / capacity_cm
            
    # Catchment meteorology
    # ---------------------
    # convert areal evaporation and precipitation from mm to fraction filled
    for var in ['evapo', 'precip']:
        meteo_areal[f'{var}_areal_norm'] = meteo_areal[f'{var}_areal'] * catchment_sm * 1e-3 / capacity_cm       

### Export

In [11]:
path_csv = cfg.PATH_TS / 'csv'
path_csv.mkdir(parents=True, exist_ok=True)
path_nc = cfg.PATH_TS / 'netcdf'
path_nc.mkdir(parents=True, exist_ok=True)

for grand_id in tqdm(attributes.index, desc='Exporting time series'):

    # concatenate time series
    ds = obs.sel(GRAND_ID=grand_id).drop(['GRAND_ID'])
    if grand_id in sim.GRAND_ID.data:
        ds = xr.merge((ds, sim.sel(GRAND_ID=grand_id).drop(['GRAND_ID'])))
    if grand_id in meteo_areal.GRAND_ID.data:
        ds = xr.merge((ds, meteo_areal.sel(GRAND_ID=grand_id).drop(['GRAND_ID'])))

    # # delete empty variables
    # for var in list(ds.data_vars):
    #     if (ds[var].isnull().all()):
    #         del ds[var]
        
    # trim time series to the observed period
    start, end = attributes.loc[grand_id, ['TIME_SERIES_START', 'TIME_SERIES_END']]
    ds = ds.sel(date=slice(start, end))

    # create time series of temporal attributes
    ds['year'] = ds.date.dt.year
    ds['month'] = ds.date.dt.month
    ds['month_sin'], ds['month_cos'] = time_encoding(ds['month'], period=12)
    ds['weekofyear'] = ds.date.dt.isocalendar().week
    ds['woy_sin'], ds['woy_cos'] = time_encoding(ds['weekofyear'], period=52)
    ds['dayofyear'] = ds.date.dt.dayofyear
    ds['doy_sin'], ds['doy_cos'] = time_encoding(ds['dayofyear'], period=365)
    ds['dayofweek'] = ds.date.dt.dayofweek
    ds['dow_sin'], ds['dow_cos'] = time_encoding(ds['dayofweek'], period=6)

    # export CSV
    # ..........
    ds.to_pandas().to_csv(path_csv / f'{grand_id}.csv')

    # export NetCDF
    # .............
    ds.to_netcdf(path_nc / f'{grand_id}.nc')

Exporting time series:   0%|          | 0/99 [00:00<?, ?it/s]