# eWaterCycle platform tutorial

This is a small tutorial of the eWaterCycle platform, showing the concepts in the eWaterCycle platform, and how these are generally used

## Glossary

- Experiment: A notebook running one or more hydrological models and producing a scientific result
- Model: Software implementation of an algorithm. Note this excludes data required for this model.
- Forcing: all time dependent data needed to run a model, and that is not impacted by the model.
- Model Parameters: fixed parameters (depth of river, land use, irrigation channels, dams). Considered constant during a model run.
- Parameter Set: File based collection of parameters for a certain model, resolution, and possibly area.
- Model instance: single running instance of a model, including all data required, and with a current state.

To create a model instance, we need to select:
- A model
- Forcing for this model
- Parameters for this model. Some models come with default parameters, most must be explicitly given a parameter set apon creation of an instance. Parameters can be overriden in the setup function.

- A configuration file for the model. This will be generated by the eWatercycle platform, possibly from a template.
- A container for the model. eWaterCycle will find it for you.


## Configuration

To be able to find all needed data and models eWaterCycle comes with a configuration object. This configuration contains system settings for eWaterCycle (which container technology to use, where is the data located, etc). In general these should not need to be changed by the user for a specific experiment.

In [None]:
from ewatercycle import CFG

## Setup

- where are the data?
- where are the files?
- model specific settings (time period, location/catchment, name of variable)

In [None]:
CFG.load_from_file('~/.ewatercycle/config.yaml')

# The collection of data needed by the platform (forcings, parameter sets, singulatiry images, grdc data) is usually managed by the system administrator. As a user you should not need to touch this

# CFG:
CFG = {
    'forcing_data': '/Path/to/data',
    'output_directory': '~/work_directory'  # forcing data / result of model run
    'parameter_files': '/path/to/parameters'
    'singularity_containers_dir': '/path/to/containers/'
    'container_engine': 'docker'
    'grdc_data': '/Path/to/grdc/data'
}

## Preprocessing

Note: the simple mapping of `catchment='Meuse'` will need something more explicit. Where exactly a catchment ends is not always straightforward, so different models and modellers disagree.

In [None]:
from ewatercycle import forcing

# forcing.generate takes a single forcing dataset
# next to esmvaltool also runs any auxility
# if a model needs a bounding box for forcing (like PCRGLOBWB), this would ideally be calculated from the shape
forcing = forcing.generate(
    target_model='wflow', 
    target_location='/path/to/forcing/output' (optional, otherwise generated)
    dataset='ERA-Interim' # example of a more advanced case: forcing.findCMIPData(mip=6, exp=historical)
    start_time="2021-05-07T13:32:00Z",
    end_time="2021-05-07T13:32:00Z",
    shape = '/path/to/shapefile.shp'
    model_specific_options(
        'wflow_dem_file': '/some/path/to/dem.dem'
        'hype_catchment_delineation': '/some/other/shapefile'
        )
)

forcing.location
# /some/path

forcing
# <ForcingData for ERA5>


In [None]:
forcing.location
# path to forcing output
forcing.start_time
# datetime()
forcing.end_time
# dateime()
forcing.dataset
# 'ERA-Interim'
forcing.target_model
# 'wflow'
forcing.shape
# Shape()
forcing.plot()
# some matplotlib
forcing.interactive_plot()
#some geoviews/widget/thingy
forcing.log
# show log output
forcing.recipe_output
# Return recipe output from esmvaltool api to access citation info, provenance, etc.

## (Calibration)

## Running the model

Notes:
 - Model class/initialize/setup should match PyMT.
 - The shortcuts to create a model from a forcing and vice-versa are nice-to-haves.
 - Support a more explicit run loop in addition to the single-line run method.
 - The start and end time should be set in the setup rather than the run method (as most models expect this info in the configuration file)

In [1]:
from ewatercycle.models import Wflow

Wflow.available_parameter_sets

# { "doi:39393993/3939393", Global 30 min parameters", "/some/path"}
# { "doi:39393993/3939393", Global 05 min parameters", "/some/path/too"}

Wflow.available_versions
# ["2019.1", "2020.1"]

parameter_set = '/some/parameter/set'
forcing = '/some/forcing/path'

#version = mandatory
#parameter set = optional (e.g. MARRMOT does not need one)
#forcing = in theory optional, in practice mandatory
model_instance = Wflow(version='2019.1', parameter_set=parameter_set, forcing=forcing)

model_instance.parameters()
#parameter_set: None
#forcing: None
#soil_depth: 9
#start_time=2018
#end_time=2020

# setup -> copy data / config to work directory
# create grpc4bmi directories
# complains about incompatible version and parameter set and forcing
model_instance.setup(
    some_parameter=45,
    land_mask='/some/land/mask.dem' # if outside of mounts, add a mount, or copy into working dir, or :'(
    soil_depth=9,
    starttime="2021-05-07T13:32:00Z"
    endtime="2021-05-07T13:32:00Z"
    #how long this list is is up to the model
)


SyntaxError: invalid syntax (<ipython-input-1-274eb6e0a0e2>, line 32)

In [None]:
discharge = model.get_value_as_xarray('Discharge')

In [None]:
discharge = model.get_value_at_location('Discharge', latitude=model_latitude, longitude=model_longitude, method='nearest')

In [None]:
data_frame = pd.DataFrame(index='time', columns=['discharge'])
while reference.time < reference.end_time:
    
    # Update the model (takes a few seconds per timestep)
    reference.update() 
    
    # Track discharge at station location
    discharge = model.get_value_at_location('Discharge', latitude=model_latitude, longitude=model_longitude, method='nearest')
    data_frame.append({"time":reference.time, "discharge": discharge}) # or something like that :-)
    
    # Show progress
    print(reference.time, end="\r")  # "\r" clears the output before printing the next timestamp

In [None]:
model.bmi.get_output_var_names()

## Analyzing the results

Note: Station ID may have to be explicit rather than a sessing/config.

### hydrograph

Some models also need some additional processing to get the data in the form of a timeseries required to calculate the hydrograph.

In [None]:
see other notebook in tech paper

`hydrograph` can take a list of simulated data and observed data.

Sometimes we also want to show the forcing precipitation data at the top of the plot, this can also be a list.

In [None]:
see other notebook in tech paper

### metrics

Use hydrostats to calculate metrics

https://hydrostats.readthedocs.io/en/stable/Metrics.html#

https://hydrostats.readthedocs.io/en/stable/ref_table.html

### save/load hydrograph results