![ewatercycle logo](https://raw.githubusercontent.com/eWaterCycle/ewatercycle/main/docs/examples/logo.png)

# PCRGlobWB example use case

This example shows how the [PCRGlobWB model](https://globalhydrology.nl/research/models/pcr-globwb-2-0/) can be used within the eWaterCycle system. It is assumed you have already seen [this tutorial notebook](../../example_model_run_HBV.ipynb) explaining how to run the simple HBV model for the River Leven at Newby Bridge. 

The PCRGlobWB model is an example of a distributed model where fluxes and stores in the balance are calculated for grid cells (often also called pixels). This requires both the forcing data as well as any parameters to also be spatially distributed. Depending on the complexity of the model, these datasets can be quite large in memory size.

Here we will be running PCRGLobWB for Great Brittain and will extract discharge data at the location of the River Leven again, to compare with the HBV model run. We will also demonstrate how to interact with the state of the model, during runtime, showcasing the benefit of using the BMI interface when building experiments using models.

In [None]:
# This cell is only used to suppress some distracting output messages
import warnings

warnings.filterwarnings("ignore", category=UserWarning)

In [None]:
import matplotlib.pyplot as plt
from cartopy import crs
from cartopy import feature as cfeature
from rich import print
import pandas as pd
import xarray as xr
from pathlib import Path
from datetime import datetime
from ipywidgets import IntProgress
from IPython.display import display
import fiona
import shapely.geometry
from pyproj import Geod

import ewatercycle.forcing
import ewatercycle.models
import ewatercycle.parameter_sets

In [None]:
station_latitude = 54.26935849558577  # Newby Bridge location from Google Maps
station_longitude = -2.9710855713537745

In [None]:
camelsgb_id = "camelsgb_73010"
forcing_path = Path.home() / "forcing" / camelsgb_id

prepared_forcing_path_caravan_central = forcing_path / "caravan"
# forcing_path = "forcing/camelsgb_73010/caravan"
shapeFile = prepared_forcing_path_caravan_central / f"{camelsgb_id}.shp"

pcr_glob_directory = Path("/data/shared/parameter-sets/pcrglobwb_global")
prepared_PCRGlob_forcing = Path("/data/datasets/egu/forcing/UK/work/diagnostic/script")

## Loading a parameter set

For this example we have prepared and hosted a global parameter set made by Utrecht University. For each model run, what needs to be specified to deliniate the region of interest is a "clone map". The config file has many options, one of which is the location of this clone map.

Note that this is very specific to PCRGlobWB. For complex (and legacy) models like PCRGlobWB one needs to know quite detailed information about the model before being able to run it. However, using eWaterCycle does reduce the time for seting up the model and getting it to run.

In [None]:
parameter_set = ewatercycle.parameter_sets.ParameterSet(
    name="custom_parameter_set",
    directory=pcr_glob_directory,
    config="./pcrglobwb_uk_05min.ini",
    target_model="pcrglobwb",
    supported_model_versions={"setters"},
)

In [None]:
print(parameter_set)

## Load forcing data

For this example case, the forcing is generated in [this seperate notebook](generate_forcing.ipynb). This is a common practice when generating forcing takes considerable (CPU, memory, disk) resources. 

In the cell below, we load the pre-generated forcing. Note that in contrast with HBV, PCRGlobWB only needs temperature and precipitation as forcing inputs. HBV also needs potential evaporation. PCRGlobWB calculated potential and actual evaporation as part of its update step.

In [None]:
forcing = ewatercycle.forcing.sources["PCRGlobWBForcing"].load(
    directory=prepared_PCRGlob_forcing,
)

print(forcing)

## Setting up the model

Note that the model version and the parameterset versions should be compatible.

In [None]:
pcrglob = ewatercycle.models.PCRGlobWB(
    parameter_set=parameter_set,
    forcing=forcing
)

print(pcrglob)

In [None]:
pcrglob.version

eWaterCycle exposes a selected set of configurable parameters. These can be modified in the `setup()` method.

In [None]:
print(pcrglob.parameters)

Calling `setup()` will start up the model container. Be careful with calling it multiple times!

In [None]:
cfg_file, cfg_dir = pcrglob.setup(
    # end_time="1997-08-31T00:00:00Z",
    end_time="2000-08-31T00:00:00Z",  # takes about 22 minutes when alone on server
    max_spinups_in_years=0
)

cfg_file, cfg_dir

In [None]:
print(pcrglob.parameters)

print(pcrglob.parameters)

dates = pcrglob.parameters

# Convert ISO 8601 strings to datetime objects
start_time = datetime.strptime(dates['start_time'], '%Y-%m-%dT%H:%M:%SZ')
end_time = datetime.strptime(dates['end_time'], '%Y-%m-%dT%H:%M:%SZ')

# Calculate the number of days between the two dates
delta = end_time - start_time
print(f"Number of days: {delta.days}")
number_of_days = delta.days

Note that the parameters have been changed. A new config file which incorporates these updated parameters has been generated as well. If you want to see or modify any additional model settings, you can acces this file directly. When you're ready, pass the path to the config file to `initialize()`.

In [None]:
pcrglob.initialize(cfg_file)

We prepare a small dataframe where we can store the discharge output from the model

In [None]:
time = pd.date_range(pcrglob.start_time_as_isostr, pcrglob.end_time_as_isostr)
timeseries = pd.DataFrame(
    index=pd.Index(time, name="time"), columns=["PCRGlobWB: Leven"]
)

timeseries.head()

## Running the model

Simply running the model from start to end is straightforward. At each time step we can retrieve information from the model.

In [None]:
# An object to show a progress bar, since this can take a while:
f = IntProgress(min=0, max=number_of_days) # instantiate the bar
display(f) # display the bar

while pcrglob.time < pcrglob.end_time:
    pcrglob.update()

    # Track discharge at station location
    discharge_at_station = pcrglob.get_value_at_coords(
        "discharge", lat=[station_latitude], lon=[station_longitude]
    )
    time = pcrglob.time_as_isostr
    timeseries["PCRGlobWB: Leven"][time] = discharge_at_station[0]

    # Show progress
    # print(time,end='\r')  # "\r" clears the output before printing the next timestamp

    # Update progress bar
    f.value += 1


## Interacting with the model

PCRGlobWB exposes many variables. Just a few of them are shown here:

In [None]:
list(pcrglob.output_var_names)[-15:-5]

Model fields can be fetched as xarray objects (or as flat numpy arrays using `get_value()`):

In [None]:
da = pcrglob.get_value_as_xarray("discharge")
da.thin(5)  # only show every 5th value in each dim

Xarray makes it very easy to plot the data. In the figure below, we add a cross at the location where we collected the discharge every timestep: Leven at Newby bridge.

In [None]:
fig = plt.figure(dpi=120)
ax = fig.add_subplot(111, projection=crs.PlateCarree())
da.plot(ax=ax, cmap="GnBu")

# Overlay ocean and coastines
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.RIVERS, color="k")
ax.coastlines()

# Add a red cross marker at the location of the Leven River at Newby Bridge
ax.scatter(station_longitude, station_latitude, s=250, c="r", marker="x", lw=2)

We can get (or set) the values at custom points as well:

In [None]:
# Extra
timeseries.plot()

We of course want to compare this both to observations as well as to the result of the HBV model.


In [None]:
camelsgb_forcing = ewatercycle.forcing.sources['CaravanForcing'].load(directory=prepared_forcing_path_caravan_central)
xr_camelsgb_forcing = xr.open_dataset(camelsgb_forcing['Q'])
xr_hbv_model_output = xr.open_dataset('~/river_discharge_data.nc')

# flux_out_Q unit conversion factor from mm/day to m3/s
conversion_mmday2m3s = 1 / (1000 * 86400)
shape = fiona.open(camelsgb_forcing.shape)
poly = [shapely.geometry.shape(p["geometry"]) for p in shape][0]
geod = Geod(ellps="WGS84")
poly_area, poly_perimeter = geod.geometry_area_perimeter(poly)
catchment_area_m2 = abs(poly_area)
print(catchment_area_m2)

xr_camelsgb_forcing["Q"]  = xr_camelsgb_forcing["Q"] * conversion_mmday2m3s * catchment_area_m2
xr_hbv_model_output['Modelled_discharge'] = xr_hbv_model_output['Modelled_discharge']* conversion_mmday2m3s * catchment_area_m2


In [None]:
timeseries.plot()
xr_camelsgb_forcing["Q"].plot(label="Observed discharge")
xr_hbv_model_output['Modelled_discharge'].plot(label="modelled discharge HBV")
plt.ylabel("Discharge [mm/day]")
plt.xlabel("Day")
plt.legend()

Doesn't look to good for PCRGlobWB. This is because for this small area we are only looking at 10-ish pixels and most likely there is a big mismatch between the pixels that drain through the outlet in PCRGlobWB and the actual catchment. Please ask Rolf for details.

## Cleaning up

Models usually perform some "wrap up tasks" at the end of a model run, such as writing the last outputs to disk and releasing memory. In the case of eWaterCycle, another important teardown task is destroying the container in which the model was running. This can free up a lot of resources on your system. Therefore it is good practice to always call `finalize()` when you're done with an experiment.

In [None]:
pcrglob.finalize()