# Part 3: Example application

By now, you should have looked through [Part 1](IntroductionToMetric.ipynb) and [Part 2](IntroductionToResidual.ipynb) of the introductory notebook series. These introduced the umami `Metric` and `Residual` classes. 

## Scope

In this application we will use umami alongside the [terrainbento](https://terrainbento.readthedocs.io/en/latest/) package. Terrainbento will be used to define a landscape evolution model, the details of which will be defined below. 

We will define a "synthetic truth" model evaluation with a specific set of input parameters, and then do a grid search in which we let two of those parameters vary. In this way we will explore which statistics for model-data comparison do best at identifying the "true" parameters. 

If you have comments or questions about the notebooks, the best place to get help is through [GitHub Issues](https://github.com/TerrainBento/umami/issues).

To begin, we import necessary modules. 

In [None]:
import warnings
warnings.filterwarnings('ignore')

from io import StringIO
from itertools import product

import numpy as np

import pandas as pd

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

from plotnine import *

import holoviews as hv
hv.notebook_extension('matplotlib')

from landlab import imshow_grid
from terrainbento import Basic
from umami import Metric, Residual

## Step 1: Define the truth model

We begin by defining an input string that defines the terrainbento model. We will use the simplest terrainbento model, called Basic. 

It evolves topography using stream power and linear diffusion and has the following governing equation:

$\frac{\partial \eta}{\partial t} = -KQ^{1/2}S + D\nabla^2 \eta$

where $K$ and $D$ are parameters, $Q$ is discharge, $S$ is local slope (positive downward), and $\eta$ is the topography. See the [model Basic documentation](https://terrainbento.readthedocs.io/en/latest/source/terrainbento.derived_models.model_basic.html) for additional information. 

In this input file we also indicate that the model will run with timesteps of 500 yr and the model grid will have a shape of (50, 80), with grid cell spacing of 100 m. The input file specifies that the model initial condition has all nodes set at an elevation of 100 m, with random noise added to the core nodes. During the model run, the boundary conditions are set to have node number 40 drop at a constant rate over the duration of the model run. This node will drop a total of 100 m over the course of the simulation. 

Note that a few places in the input file have curly braces around a name. These are as follows:
* Two inputs parameters, `{duration}` and `{water_erodibility}`, are modified using [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format). In this way we set the values for the "truth" model run and vary the parameters in a grid search numerical experiment.
* We set the `{lowering_rate}` based on the value for duration so that 100 m of lowering occurs during the simulation duration. 
* We also format the `{name}` of output files in order to prevent Windows file permissions errors. 

In [None]:
spec_string = """
# Create the Clock.
clock:
    start: 0
    step: 500
    stop: {duration}

# Create the Grid
grid: 
    RasterModelGrid: 
        - [50, 80]
        - xy_spacing: 100
        - fields: 
            node: 
                topographic__elevation:
                    random:
                        where: CORE_NODE
                    constant:
                        value: 100
                        
# Set up Boundary Handlers
boundary_handlers: 
    SingleNodeBaselevelHandler: 
        outlet_id: 40
        lowering_rate: -{lowering_rate}

# Parameters that control output.
output_interval: 1e3
save_first_timestep: True
output_prefix: 
    simple_application.{name}.
fields: 
    - topographic__elevation

# Parameters that control process and rates.
water_erodibility: {water_erodibility}
m_sp: 0.5
n_sp: 1.0
regolith_transport_parameter: 0.1
"""

Next we instantiate the "truth" model and run it. 

In [None]:
truth_duration = 1e4
truth_water_erodibility = 0.0005

lowering_rate = 100 / truth_duration

truth_params = StringIO(
    spec_string.format(duration=truth_duration,
                       water_erodibility=truth_water_erodibility,
                       lowering_rate=lowering_rate,
                       name="truth"))
np.random.seed(42)
truth = Basic.from_file(truth_params)
truth.run()

The [holoviews](https://holoviews.org) package provides capabilities to visualize the model run. 

In [None]:
ds = truth.to_xarray_dataset(time_unit='years', space_unit='meters')
hvds_topo = hv.Dataset(ds.topographic__elevation)
topo = hvds_topo.to(hv.Image, ['x', 'y'],
                    label='Truth').options(interpolation='bilinear',
                                           cmap='viridis',
                                           colorbar=True)
topo

You can see that in this model a drainage basin incises into the 100m high topography. This makes sense as we have dropped the elevation of node 40 by 100 m over the simulation. 

Before moving on, we close the xarray dataset and remove the output netcdf files. 

In [None]:
ds.close()
truth.remove_output_netcdfs()

## Step 2: Define the basis for model-data comparison

We consider six different statistics for model data comparison, each defined in the following code block (which serves as our input file):

* z_me : the mean of `topographic__elevation`.
* z_p10 : the 10th percentile of `topographic__elevation`.
* z_wsmean : the mean of `topographic__elevation` *within* the watershed that drains to node 40.
* ksw_z : the Kolmogorov-Smirnov test statistic for `topographic__elevation` *within* the watershed that drains to node 40.
* ksw_da : the Kolmogorov-Smirnov test statistic for `drainage_area` *within* the watershed that drains to node 40.
* ksw_s : the Kolmogorov-Smirnov test statistic for `topographic__steepest_slope` *within* the watershed that drains to node 40.

Consider reading the API documentation for the [kstest_watershed](https://umami.readthedocs.io/en/latest/umami.calculations.residual.ks_test.html#umami.calculations.residual.kstest.kstest_watershed) calculation.

In [None]:
residual_string = """
z_me:
    _func: aggregate
    method: mean
    field: topographic__elevation
z_p10:
    _func: aggregate
    method: percentile
    field: topographic__elevation
    q: 10
z_wsmean:
    _func: watershed_aggregation
    field: topographic__elevation
    method: mean
    outlet_id: 40
ksw_z:
    _func: kstest_watershed
    outlet_id: 40
    field: topographic__elevation
ksw_da:
    _func: kstest_watershed
    outlet_id: 40
    field: drainage_area
ksw_s:
    _func: kstest_watershed
    outlet_id: 40
    field: topographic__steepest_slope
"""

## Step 3: Create and run the grid search experiment

In this example, we will use a grid search to highlight how the misfit values calculated by umami vary across parameter space. 

We consider values for `duration` between $10^{3}$ and $10^{5}$ and values for $K$ (`water_erodibility`) between $10^{-4}$ and $10^{-2}$.

With a resolution of 10, we evaluate $10^2=100$ simulations. Feel free to change the resolution value, though note that it will impact the run time of this notebook. 

In [None]:
resolution = 10
durations = np.logspace(3, 5, num=resolution)
water_erodibilitys = np.logspace(-4, -2, num=resolution)

We evaluate each pair of duration and water erodability and save the model output as a dictionary. With the line 

    #np.random.seed(42)

commented out, each evaluation uses a different random seed. Feel free to uncomment this line to see how the results change if the *exact same* random seed is used for each model integration. 

In [None]:
out = {}
for i, (duration,
        water_erodibility) in enumerate(product(durations,
                                                water_erodibilitys)):
    lowering_rate = 100 / duration
    test_params = StringIO(
        spec_string.format(duration=duration,
                           water_erodibility=water_erodibility,
                           lowering_rate=lowering_rate,
                           name=i))
    #np.random.seed(42)
    test = Basic.from_file(test_params)
    test.run()

    test.remove_output_netcdfs()

    residual = Residual(test.grid, truth.grid)
    residual.add_from_file(StringIO(residual_string))
    residual.calculate()

    values = {name: residual.value(name) for name in residual.names}
    out[(duration, water_erodibility)] = values

## Step 4: Compile output, inspect, and plot

Next we will convert the output into a [pandas](http://pandas.pydata.org) dataframe and inspect it. The dataframe has two indices, the `duration` and `water_erodibility`. It has six columns, one each for the six outputs we defined above. 

In [None]:
df = pd.DataFrame.from_dict(out, orient="index")
df.index.names = ["duration", "water_erodibility"]
df.head()

In order to plot the results easily, we will use [plotnine](http://plotnine.readthedocs.io), which provides a [ggplot](http://ggplot2.tidyverse.org) implementation in python. We will also need to [melt](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html) the dataframe from wide format to long format. 

After doing this, and inspecting, we can see that we now have a column for `duration`, `water_erodibility`, the output variable, its value, and the associated squared residual. 

In [None]:
df_melt = df.reset_index().melt(id_vars=["duration", "water_erodibility"])
df_melt["squared_residual"] = df_melt.value**2
df_melt.head()

We will make two plots, the first of which plots the three Kolmogorov-Smirnov test statistics. The white dot indicates the location of the "truth". 

You can see that there is a zone of low misfit in the region of the truth parameters, but that good fits can be found elsewhere. We can also see that there is correlation between `water_erodability` and `duration`. 

In [None]:
p1 = (ggplot(df_melt[df_melt.variable.str.startswith("ksw")],
             aes(x="duration", y="water_erodibility",
                 fill="squared_residual")) + geom_tile() +
      geom_point(aes(x=truth_duration, y=truth_water_erodibility)) +
      scale_fill_continuous(limits=[0.001, 1], trans="log10") +
      facet_wrap("~variable") + theme_bw() + scale_x_log10() +
      scale_y_log10() + coord_equal())
print(p1)

Finally we plot the three statistics that relate to the topographic elevation. 

In [None]:
p2 = (
    ggplot(df_melt[df_melt.variable.str.startswith("z")],
           aes(x="duration", y="water_erodibility", fill="squared_residual")) +
    geom_tile() + scale_fill_continuous(limits=[0.001, 1000], trans="log10") +
    geom_point(aes(x=truth_duration, y=truth_water_erodibility)) +
    facet_wrap("~variable") + theme_bw() + scale_x_log10() + scale_y_log10() +
    coord_equal())
print(p2)

# Next steps

The next step is the final notebook in the four part introductory series: [Part 4: Application using the Discretized Misfit calculation](DiscretizedMisfit.ipynb).