## eWaterCycle API

- API does not need to be perfect in any way.
- Notebook could have a few options for possible design, or as a separate notebook for different options.
- Notebook should generate forcing, run WFLOW, and analyse the result (based on preprocessing, full run notebook and analysis notebook) and "uses" the new API
- Notebook should be an example in the eWaterCycle package repo
- Notebook does not have to actually work
- Data should remain private (especially the GRDC data)
- Notebook only pretends to run a single catchment with a single forcing for a single year.

## Classes

- `CFG`: global config
- `ForcingData`: container for forcing output / forcing data
- `Model`: Model runner
- `ModelData`: container for model output / model data

In [None]:
from ewatercycle import CFG

## Setup

- where are the data?
- where are the files?
- model specific settings (time period, location/catchment, name of variable)

In [None]:
CFG.load_from_file('~/.ewatercycle/config.yaml')

# path to raw forcing data
# path to input parameter set (model specific)
# path to shape file of each catchment
# path to work directory (temporary directory)

# CFG:
CFG = {
    'raw_forcing_data': '/Path/to/data',
    'output_directory': '~/work_directory'  # forcing data / result of model run
    'shapefiles': 'Path/to/shapefiles'  # Meuse -> Meuse.shp
    'station_ids': # Mapping catchment to station ids, Meuse -> station id
    'WFLOW':
        'config_file': '/path/to/model/specific/config_file'
        'docker_container': '/location/of/docker/container'
    'LISFLOOD':
        ...
    'MARRMOT':
        ...
    'grdc_data': '/Path/to/grdc/data'
}

## Preprocessing

Note: the simple mapping of `catchment='Meuse'` will need something more explicit. Where exactly a catchment ends is not always straightforward, so different models and modellers disagree.

In [None]:
from ewatercycle import forcing

# forcing.generate takes a single forcing dataset
forcing_output = forcing.generate(
    model='wflow', 
    forcing='ERA-Interim',
    startyear=1990, 
    endyear=2000,
    catchment='Meuse',
)
forcing_output
# <ForcingData for ERA5>

# For multiple forcings:
forcing_output_dict = {}

for forcing in 'ERA5', 'ERA-Interim':
    output = forcing.generate(
        model='wflow', 
        forcing=forcing,
        start_year=1990, 
        end_year=2000,
        catchment='Meuse',
    )
    forcing_output_dict[forcing] = output

forcing_output_dict
# {
#     'ERA5':
#         <ForcingData for ERA5>
#     'ERA-Interim':
#         <ForcingData for ERA-Interim>
# }

In [None]:
forcing_output.location
# path to forcing output
forcing_output.start_year
# 1990
forcing_output.end_year
# 2000
forcing_output.forcing
# 'ERA5'
forcing_output.model
# 'wflow'
forcing_output.catchment
# Meuse
forcing_output.region_extent
# {
#     'start_longitude': 0,
#     'end_longitude': 6.75,
#     'start_latitude': 47.25,
#     'end_latitude': 52.5,
# }
forcing_output.visualize(variable='pr')
# Visualize forcing data on a map
# Interactive slider to go through the timestamps
# Plot border of shapefile on the image
forcing_output.plot_timeseries()
# i.e. https://hyperspy.org
forcing_output.log
# show log output
forcing_output.recipe_output
# Return recipe output from esmvaltool api to access citation info, provenance, etc.

## (Calibration)

## Running the model

Notes:
 - Model class/initialize/setup should match PyMT.
 - The shortcuts to create a model from a forcing and vice-versa are nice-to-haves.
 - Support a more explicit run loop in addition to the single-line run method.
 - The start and end time should be set in the setup rather than the run method (as most models expect this info in the configuration file)

In [None]:
from ewatercycle.wflow import Model

# model works with a specific forcing to keep it simple
# parallel processing can be performed in a util function, i.e. ewatercycle.parallel_run(model, forcings=...)
# using i.e. a ThreadPoolExecutor https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor

# setup -> copy data / config to work directory
# create grpc4bmi directories
model = Model.setup(
    model='wflow',
    forcing_data=forcing_output,
)

# have constructors for `forcing_output`

model = forcing_output.to_model()
# or
model = Model.from_forcing_data(forcing_output)

# `.run` starts the docker container and runs update func
# saves data to hard drive as netcdf file
model_output = model.run(
    spinup_years = 5,
    start_year = 1995,
    end_year = 2000,
    variable='RiverRunoff',
)

model_output
# <ModelData for ERA5>

In [None]:
# comparison
model_output_dict = {}

for forcing, forcing_data in forcing_output_dict:
    model = forcing_output.to_model()
    output = model.run(
        spinup_years = 5,
        start_year = 1995,
        end_year = 2000,
        variable='RiverRunoff',
    )
    model_output_dict[forcing] = output
# {
#     'ERA5':
#         <ModelData for ERA5>
#     'ERA-Interim':
#         <ModelData for ERA-Interim>
# }

`ModelData` mimicks `ForcingData` for some of the attributes

Note: is it possible to create this structure while doing a more explicit loop in a nice way?

In [None]:
model_output.location
# Location of output data (netcdf)
model_output.log
# output for the model run
model_output.to_dataframe()
# read netcdf into a pandas dataframe
model_output.to_xarray()
# read netcdf with xarray
model_output.start_year
# 1990
model_output.end_year
# 2000
model_output.forcing
# 'ERA5'
model_output.model
# 'wflow'
model_output.catchment
# Meuse
model_output.log
# show log output

How to access bmi variables?

Note: bmi functions should be on the model object directly

In [None]:
model.bmi.get_output_var_names()

## Analyzing the results

Note: Station ID may have to be explicit rather than a sessing/config.

In [None]:
from ewatercycle.observations import get_data

# station id is obtained from CFG / global settings
obs_timeseries = get_data(
    'grdc',
    start_year=1990,
    end_year=2000,
    catchment='Meuse',
)  # -> pandas.DataFrame

### hydrograph

Some models also need some additional processing to get the data in the form of a timeseries required to calculate the hydrograph.

In [None]:
from ewatercycle.plot import hydrograph

simulated_data = model_output.to_dataframe()

# some models need some additional processing to get a timeseries of the simulated data

from ewatercycle.utils import guess_outlet_gridpoint
simulated_timeseries = guess_outlet_gridpoint(simulated_data, station_id='id', padding=5)

hydrograph(
    simulated=simulated_timeseries,
    observed=obs_timeseries,
    ...
)  # generate matplotlib plot

`hydrograph` can take a list of simulated data and observed data.

In [None]:
hydrograph(
    simulated=[simulated_timeseries, ..., ...],
    observed=[obs_timeseries, ..., ...],
    ...
)  # generate matplotlib plot

Sometimes we also want to show the forcing precipitation data at the top of the plot, this can also be a list.

In [None]:
forcing_data = forcing_output.variables['pr']  # iris.cube.Cube

# some models need some additional processing to get a timeseries of the precipitation

from ewatercycle.utils import catchment_statistics
forcing_timeseries = catchment_statistics(forcing_data, catchment='Meuse', statistics='sum')

hydrograph(
    simulated=[simulated_timeseries, ..., ...],
    observed=[obs_timeseries, ..., ...],
    forcing=[forcing_timeseries, ..., ...],
    ...
)  # generate matplotlib plot

### metrics

Use hydrostats to calculate metrics

https://hydrostats.readthedocs.io/en/stable/Metrics.html#

https://hydrostats.readthedocs.io/en/stable/ref_table.html

In [None]:
from hydrostats import metrics

metrics.nse(
    simulated=simulated_data,
    observed=obs_data,
)

### save/load hydrograph results

In [None]:
from ewatercycle.util import export_hydrograph_data, import_hydrograph_data

# saves the data using pandas.to_csv (or otherwise)

export_hydrograph_data(
    'my_output.csv',
    simulated=[simulated_data, ..., ...],
    observed=[obs_data, ..., ...],
    forcing=[forcing_data, ..., ...],
)

# loads the data using pandas.from_csv (or otherwise)

simulated, observed, forcing = import_hydrograph_data('my_output.csv')

In [None]:
import matplotlib.pyplot as plt

fig, axis = plt.figure()

hydrograph(
    simulated=[simulated_data, ..., ...],
    observed=[obs_data, ..., ...],
    forcing=[forcing_data, ..., ...],
    ...
    axis=axis,
)  # generate matplotlib plot

fig.savefig('output.jpg')