In [1]:
# Suppress distracting outputs in this notebook
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
import logging
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger('esmvalcore')
logger.setLevel(logging.WARNING)

![image](https://github.com/eWaterCycle/ewatercycle/raw/main/docs/examples/logo.png)

# User guide

This user manual will explain how the eWaterCycle Python package can be used to perform hydrological experiments. We will walk through the following chapters:

- parameter sets
- forcing data
- model instances
- using observations
- analysis

Each of these chapters correspond to a so-called "subpackage" of eWaterCycle Python package. Before we continue, however, we will briefly explain the configuration file.

**Configuration**

To be able to find all needed data and models eWaterCycle comes with a configuration object. This configuration contains system settings for eWaterCycle (which container technology to use, where is the data located, etc). In general these should not need to be changed by the user for a specific experiment, and ideally a user would never need to touch this configuration on a properly managed system. However, it is good to know that it is there. 

You can see the default configuration on your system like so:

In [2]:
from ewatercycle import CFG
CFG

Config({'container_engine': 'singularity',
        'esmvaltool_config': None,
        'ewatercycle_config': PosixPath('/home/peter/.config/ewatercycle/ewatercycle.yaml'),
        'grdc_location': None,
        'output_dir': PosixPath('/home/peter/ewatercycle/ewatercycle/docs/examples'),
        'parameter_sets': {'lisflood_fraser': {'config': 'lisflood_fraser/settings_lat_lon-Run.xml',
                                               'directory': 'lisflood_fraser',
                                               'doi': 'N/A',
                                               'supported_model_versions': {'20.10'},
                                               'target_model': 'lisflood'},
                           'pcrglobwb_rhinemeuse_30min': {'config': 'pcrglobwb_rhinemeuse_30min/setup_natural_test.ini',
                                                          'directory': 'pcrglobwb_rhinemeuse_30min',
                                                          'doi': 'https://doi.org/10.52

Note: a path on the local filesystem is always denoted as "dir" (short for directory), instead of folder, path, or location. Especially location can be confusing in the context of geospatial modeling.

It is also possible to store and load custom configuration files. For more information, see [system setup](https://ewatercycle.readthedocs.io/en/latest/system_setup.html#configure-ewatercycle)

## Parameter sets

Parameter sets are an essential part of many hydrological models, and for the eWaterCycle package as well.

In [3]:
import ewatercycle.parameter_sets

The default [system setup](https://ewatercycle.readthedocs.io/en/latest/system_setup.html#download-example-parameter-sets) includes a number of example parameter sets that can be used directly. System administrators can also add available parameter sets that are globally availble to all users. In the future, we're hoping to add functionality to fetch new parameter sets using a DOI as well.

To see the available parameter sets:

In [4]:
ewatercycle.parameter_sets.available_parameter_sets()

('lisflood_fraser', 'pcrglobwb_rhinemeuse_30min', 'wflow_rhine_sbm_nc')

Since most parameter sets are model specific, you can filter the results as well:

In [5]:
ewatercycle.parameter_sets.available_parameter_sets(target_model='wflow')

('wflow_rhine_sbm_nc',)

Once you have found a suitable parameter set, you can load it and see some more details:

In [6]:
parameters = ewatercycle.parameter_sets.get_parameter_set('wflow_rhine_sbm_nc')
print(parameters)

Parameter set
-------------
name=wflow_rhine_sbm_nc
directory=/home/peter/ewatercycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc
config=/home/peter/ewatercycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc/wflow_sbm_NC.ini
doi=N/A
target_model=wflow
supported_model_versions={'2020.1.1'}


or you can access individual attributes of the parameter sets

In [7]:
parameters.supported_model_versions

{'2020.1.1'}

Should you wish to configure your own parameter set (e.g. for PCRGlobWB in this case), this is also possible:

In [8]:
custom_parameter_set = ewatercycle.parameter_sets.ParameterSet(
    name="custom_parameter_set",
    directory="/home/peter/ewatercycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min",
    config="/home/peter/ewatercycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min/setup_natural_test.ini",
    target_model="pcrglobwb",
    doi="https://doi.org/10.5281/zenodo.1045339",
    supported_model_versions={'setters'}
)

<div class="alert alert-info">

As you can see, an eWaterCycle parameter set is defined fully by a directory and a configuration file. The configuration file typically informs the model about the structure of the parameter set (e.g. "what is the filename of the land use data"). It is possible to change these settings later, when [setting up the model](#Models).

</div>

## Forcing data

eWaterCycle can load or generate forcing data for a model using the `forcing` module. 

In [2]:
import ewatercycle.forcing

### Existing forcing from external source

We first show how existing forcing data can be loaded with eWaterCycle. The wflow example parameter set already includes forcing data that was generated manually by the scientists at Deltares.

In [10]:
forcing = ewatercycle.forcing.load_foreign(
    directory = parameters.directory,
    target_model = "wflow",
    start_time = '1991-01-01T00:00:00Z',
    end_time = '1991-12-31T00:00:00Z',
    shape = None,
    forcing_info = dict(
        # Additional information about the external forcing data needed for the model configuration
        netcdfinput = "inmaps.nc",
        Precipitation = "/P",
        EvapoTranspiration = "/PET",
        Temperature = "/TEMP"
    )
)
print(forcing)

Forcing data for Wflow
----------------------
Directory: /home/peter/ewatercycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc
Start time: 1991-01-01T00:00:00Z
End time: 1991-12-31T00:00:00Z
Shapefile: None
Additional information for model config:
  - netcdfinput: inmaps.nc
  - Precipitation: /P
  - Temperature: /TEMP
  - EvapoTranspiration: /PET
  - Inflow: None


As you can see, the forcing consists of a generic part which is the same for all eWaterCycle models, and a model-specific part (`forcing_info`). If you're familiar with wflow, you might recognize that the model-specific settings map directly to wflow configuration settings. 

### Generating forcing data

In most cases, you will not have access to tailor-made forcing data, and manually pre-processing existing datasets can be quite a pain. eWaterCycle includes a forcing generator that can do all the required steps to go from the available datasets (ERA5, ERA-Interim, etc) to whatever format the models require. This is done through [ESMValTool recipes](https://docs.esmvaltool.org/en/latest/recipes/recipe_hydrology.html). For some models (e.g. lisflood) additional computations are done, as some steps require data and/or code that is not available to ESMValTool.

Apart from some standard parameters (start time, datasets, etc.), the forcing generator sometimes requires additional model-specific options. For our wflow example case, we need to pass the DEM file to the ESMValTool recipe as well. All model-specific options are listed in the [API documentation](https://ewatercycle.readthedocs.io/en/latest/apidocs/ewatercycle.forcing.html#ewatercycle.forcing.generate).

<div class="alert alert-info">
<p>ESMValTool configuration</p>
    
<p>As eWaterCycle relies on ESMValTool for processing forcing data, configuration for forcing is mostly defered to the esmvaltool configuration file. What ESMValTool configuration file to use can be specified in the <a href=https://ewatercycle.readthedocs.io/en/latest/system_setup.html#configure-esmvaltool>system setup</a>.</p>
</div>



In [3]:
forcing = ewatercycle.forcing.generate(
    target_model='wflow', 
    dataset='ERA5',
    start_time="1990-01-01T00:00:00Z",
    end_time="1990-01-31T00:00:00Z",
    shape='./examples/data/Rhine/Rhine.shp',
    model_specific_options = {
        'dem_file' : './examples/wflow_rhine_sbm_nc/staticmaps/wflow_dem.map',
    }
)
print(forcing)

Forcing data for Wflow
----------------------
Directory: /home/sarah/GitHub/ewatercycle/docs/examples/recipe_wflow_20210719_145723/work/wflow_daily/script
Start time: 1990-01-01T00:00:00Z
End time: 1990-01-31T00:00:00Z
Shapefile: /home/sarah/GitHub/ewatercycle/docs/examples/data/Rhine/Rhine.shp
Additional information for model config:
  - netcdfinput: wflow_ERA5_Rhine_1990_1990.nc
  - Precipitation: /pr
  - Temperature: /tas
  - EvapoTranspiration: /pet
  - Inflow: None


Generated forcing is automatically saved to the ESMValTool output directory. A `yaml` file is stored there as well, such that you can easily reload the forcing later without having to generate it again.

`ewatercycle_forcing.yaml`:

```yaml
!WflowForcing
start_time: '1990-01-01T00:00:00Z'
end_time: '1990-12-31T00:00:00Z'
shape:
netcdfinput: wflow_ERA5_Rhine_1990_1990.nc
Precipitation: /pr
EvapoTranspiration: /pet
Temperature: /tas
Inflow:
```

In [None]:
reloaded_forcing = ewatercycle.forcing.load(
    directory='/path/to/forcing/output/'
)

## Models

### Creating, setting up, and initializing a model instance

Now that we have created a forcing and selected a version of the model and a parameter set, we can create a model instance. When creating an instance we will have to select the version of the model to use, a parameter set, and a forcing. This will all be combined into an instance object. This can then be inspected for available parameters and default values, setup to create the configuration file, work_dir containing input files, and container, and initialized to prepare a model for running.

The way models are created, setup, and initialized matches PyMT as much as possible. There is currently no 'run' method as this convenance method makes it harder for users to create a more advanced usecase from a simple example.

### Model versions

eWaterCycle supports a number of models, and new once can be adding using a straightforward process. These are represented using a Python Class (e.g. ewatercycle.models.Wflow). Several version of a model may be available. To help with reproducibility the version of a model must always be specified when creating a model instance. A class function is available to retrieve a list of model versions.


In [None]:
import ewatercycle.models.Wflow

ewatercycle.models.Wflow.available_versions
# ["2019.1", "2020.1"]

In [None]:
import ewatercycle.models

#A parameter set and forcing object must be loaded or found before instanciating a model.

#Creates as model instance from a model version, parameter_set, and forcing
#version = mandatory
#parameter set = optional (e.g. MARRMOT does not need one)
#forcing = mandatory for now. At some point we may want to support feeding forcing to a model
#while running it using set_variable('temperature') and such.
model_instance = ewatercycle.models.Wflow(version='2019.1', parameter_set=parameter_set, forcing=forcing)

#using the parameters property all parameter defaults can be obtained.
#this is possibly a subset of everything that can be configured in the config file of model, and up to the
#creator of the model class to implement. For non-science settings such as logging settings, or file names
# for all parameter files it does not always make sense to expose these.
model_instance.parameters
#soil_depth: 9 (defaults to value in parameter set e.g. in a config template)
#start_time="2021-05-07T13:32:00Z" (defaults to start of forcing)
#end_time="2021-05-07T13:32:00Z" (defaults to end of forcing)
#not all parameter files can be set, but for some it may make sense.
#land_mask='some/land/mask/in/the/parameter_set.dem'

#the Setup function does the following:
#- Create a directory which serves as the current working directory for the mode instance
#- Creates a configuration file in this working directory based on the settings
#- Creates a container instance for the exact version of the model requesed
#- Makes the forcing, parameter set and and working directory available to the container using mounts.
#- If a model cannot cope with forcing and parameter set outside the working directory it is copied
#  to the working_directory instead.
#- Input is mounted read-only, the working directory is mounted read-write.
#- Setup will complain about incompatible model version, parameter_set, and forcing.
cfg_file, work_dir = model_instance.setup(
    land_mask='/some/land/mask.dem', # if outside of mounts, add a mount, or copy into working dir, or :'(
    soil_depth=9,
    start_time="2021-05-07T13:32:00Z",
    end_time="2021-05-07T13:32:00Z"
)

#After setup but before initialize everything is good-to-go, but nothing has been done yet. This is
#An opportunity to inspect the generated configuration file, and make any changes manually that could not be
#done through the setup method. Splitting these also makes it easier to run initialize in parallel in case a
#lot of models are created simultaneously (e.g. when calibrating a model)

# To modify, open it in an editor and save
print(cfg_file)

#This function will initialize the model using the files created above. For some models this can take some time.
model_instance.initialize(cfg_file)

### Running a model and getting output

Once initalized a model_instanced can be used by calling functions for running a single timestep (`update`), setting variables, and getting variables. Besides the rather lowlevel BMI functions several convenience functions are also available. These returns objects that make sense for the type of values returned, such as pandas DataFram and xarray DataArray or Dataset.

#### TODO Merge these cells below
In the loop, only use get_value_at_coords to keep track of a single outlet point.
At the end, show get_value_as_xarray to show you can also get the whole field

In [None]:
#example storing all output of a certain field.
output = []
while model_instance.time < model_instance.end_time:
    
    # Update the model (takes a few seconds per timestep)
    model_instance.update() 
    
    #store entire discharge field
    discharge = model.get_value_as_xarray('Discharge')
    output.append(discharge)
        
    # Show progress
    print(reference.time, end="\r")  # "\r" clears the output before printing the next timestamp
    
result = xarray.merge(output)

result
#xarray with full output of discharge field

In [None]:
#example storing a timeseries for a single location of a certain field

#some location of interest within the model
output_latitude = 55.4
output_longitude = 20.0

simulated_discharge = pd.DataFrame(index='time', columns=['discharge'])
while model_instance.time < model_instance.end_time:
    
    # Update the model (takes a few seconds per timestep)
    model_instance.update() 
    
    # Track discharge at station location
    discharge = model_instance.get_value_at_location('Discharge', latitude=output_latitude, longitude=output_longitude, method='nearest')
    simulated_discharge.append({"time":reference.time, "discharge": discharge})
    
    # Show progress
    print(model_instance.time, end="\r")  # "\r" clears the output before printing the next timestamp
    
model_output
#nice table of all dischage values over time for a single location

## Observations

In [None]:
#Read GRDC data

import ewatercycle.observation.grdc

grdc_station_id = 4147380

#This function automatically fetches the location of the GRDC data from the configuration file
#Start and end dates are fetched from the model instance
observations = ewatercycle.observation.grdc.get_grdc_data(
    station_id=grdc_station_id,
    start_date=model_instance.start_time.date(),
    end_date=model_instance.end_time.date(),
)
observations
#xarray containing grdc data

In [None]:
#Combine simulated and observated discharge into a single dataframe

import pandas
 
simulated_discharge_df = pandas.DataFrame(
    {'simulation': model_output}, index=timestamps
)
observations_df = observations.streamflow.to_dataframe().rename(
    columns={'streamflow': 'observation'}
)
discharge = simulated_discharge_df.join(observations_df)
discharge

#table with simulated and observed discharge

## Analysis
Once a model has run we can analyse the result. For this example we will assume a DataFrame was created with values over time for a certain location for which GRDC station data is available

In [None]:
#Plot hydrograph

import ewatercycle.analysis

#todo: also add forcing to this hydrograph in some smart way
ewatercycle.analysis.hydrograph(
    discharge=discharge,
    reference='observation',
)

#nice hydrograph

In [None]:
#finalize the model.
#as a side effect also destroys the container
model_instance.finalize()