### 1. PEST++ Installation

The PEST++ developers do a great job describing the installation process, so we won't cover it here.

Get the latest release of PEST++ for your operating system: https://github.com/usgs/pestpp/releases

Follow the installation instructions: https://github.com/usgs/pestpp/blob/master/documentation/cmake.md

### 2. Setup the calibration files

In order to use PEST++, we need to run through what is, perhaps, a common loop in model calibration:

1. Intialize the model with intial conditions.
2. Run the model, write the results.
3. Compare the results to observations.
4. Propose a new, hopefully better, set of parameters.
5. Run the model with the new parameters, write the results.
6. Repeat 3 - 5.

And so on, until we are satisfied with the performance of the model. 

The purpose of the SWIM calibration approach and this tutorial is to set up a system where the model and the calibration software can operate with minimal interaction. All we need SWIM to do is take the proposed parameters and use them in a model run, and write the results in a convenient format in a convenient place. All we need the calibration software to do is to compare the model results to observations, determine how to tweak the parameters we've told it are 'tunable', and write a new parameter proposal in a convenient format in a convenient place. If we succeed in building such a system, and have maintained independence between the calibration software and the model, we should be able to make changes to one and not need to make changes to the other. In theory, this objective makes development easier.

The `calibration` package in SWIM contains software to build what we need to do this with three modules:

1. `build_pp_files.py` uses several functions to build the files that control PEST++ behavior:
   - The function `build_pest` builds the main `.pst` control file, which defines the eight tunable SWIM model parameters `'aw', 'rew', 'tew', 'ndvi_alpha', 'ndvi_beta', 'mad', 'swe_alpha'`, and `'swe_beta'`. These are three soil water holding capacity parameters (`'aw', 'rew', 'tew'`), the coefficients that control the relationship between remote-sensing-based NDVI and the model transpiration rate parameter `Kcb` (`'ndvi_alpha', 'ndvi_beta'`), the control on when soil water deficit begins to impact transpiration rate (`'mad'`), and the two coefficients that determine the melting rate of snow (`'swe_alpha'`, `'swe_beta'`). The `.pst` file also contains the observation data, which we have derived from SNODAS (SWE) and SSEBop (ETf). Further, the file contains estimates of the noise we believe is in the data. Finally, the `.pst` points to the main Python file that will be used to call the `pestpp-ies` command, the function that runs the PEST++ implementation of Iterative Ensemble Smoother, the algorithm we'll use.
2. `custom_forward_run.py` has a single, simple function (`run`) that uses a system call to execute a SWIM script that runs the model, much like how we've run it ouselves previously. You will need to modify `custom_forward_run.py` to enter your machine's path.
3. `run_pest.py` is the module that we launch, and that starts PEST++ running. This will also need to be modified to use your machine's path.

The actual flow of code execution during calibration is a little confusing, because we use a Python script (`run_pest.py`) to run a command line executable (`'pestpp-ies'`), which itself then executes `custom_forward_run.py` to finally run our Python SWIM code! I know!


In [1]:
import json
import os
import sys

from tqdm import tqdm
import numpy as np
import pandas as pd
import geopandas as gpd

root = os.path.abspath('../../..')
sys.path.append(root)

from prep.prep_plots import preproc

from calibrate.pest_builder import PestBuilder
from swim.config import ProjectConfig

from calibrate.run_pest import run_pst

Let's instantiate our `ProjectConfig` object:

In [2]:
project = '2_Fort_Peck'
project_ws = os.path.join(root, 'tutorials', project)
if not os.path.isdir(project_ws):
    root_ = os.path.abspath('')
    project_ws_ = os.path.join(root, 'tutorials', project)

config_path = os.path.join(project_ws, 'config.toml')

config = ProjectConfig()
config.read_config(config_path, project_ws)



Config: /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/config.toml
CALIBRATION OFF
FORECAST OFF


In [3]:
# write the observed data to files within project workspace (tutorial directory)
preproc(config_path, project_ws)

Writing observations to file...


Config: /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/config.toml
CALIBRATION OFF
FORECAST OFF

US-FPe
preproc ETf mean: 0.14
preproc SWE mean: 0.01

Prepped 1 fields input


### The PestBuilder object

We use the PestBuilder class to do (almost) everything we need to set up for calibration. PestBuilder needs access to our `input_data` and `config` objects, plus we've got a custom python script we need to point it to:

In [7]:
py_script = os.path.join(project_ws, 'custom_forward_run.py')

builder = PestBuilder(project_ws=project_ws, config_file=config_path, use_existing=False, python_script=py_script)



Config: /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/config.toml
CALIBRATION OFF
FORECAST OFF


The `PestBuilder` calss initializes by reading in the configuration object with `ProjectConfig`, and the sample plot data with the `SamplePlots` object.

Next, we build the `.pst` file. The method `build_pest` will erase the existing pest directory if there is one! It will also copy everything from `project_ws` into the `pest` directory, which is nice because it will only manipulate copies after that. The function builds the `4_Flux_Network.pst` file, which is the only argument needed at this time to run PEST++ on the problem.

Note that during the processing of the ETf data, we wrote an e.g., `etf_inv_irr_ct.csv` table that simply marked the image capture dates. In build_pest(), the observations are given weight 1.0 on these dates, and weight 0.0 on non-capture dates, sp we don't use interpolated ETf values for calibration. The idea here is to only evaluate the objective function on capture dates to give the model the freedom to behave like a soil water balance model on in-between dates.

The `custom_forward_run.py` script is used by PEST++ to launch our model. This script is important to get right; it will ensure the model is writing the output to the correct location, so PEST++ can find it and use it to improve the parameter set.

Good. Now, let's build the `.pst` control file for our calibration project.

In [8]:
# Build the pest control file
# It will copy everything from the project_ws into a new 'pest' directory
builder.build_pest()

2025-01-27 17:04:32.397568 starting: opening PstFrom.log for logging
2025-01-27 17:04:32.397668 starting PstFrom process
2025-01-27 17:04:32.397711 starting: setting up dirs
2025-01-27 17:04:32.398200 starting: copying original_d '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck' to new_d '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest'
2025-01-27 17:04:32.680439 finished: copying original_d '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck' to new_d '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest' took: 0:00:00.282239
2025-01-27 17:04:32.680860 finished: setting up dirs took: 0:00:00.283149
2025-01-27 17:04:32.680966 starting: adding constant type m style parameters for file(s) ['params.csv']
2025-01-27 17:04:32.681045 starting: loading list-style /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/params.csv
2025-01-27 17:04:32.681091 starting: reading list-style file: /home/dgketchum/PycharmProjects/swi

In [9]:
os.getcwd()

AttributeError: module 'os' has no attribute 'cwd'

See that all the data from `project_ws` are now copied to the 'pest' directory at `swim-rs/2_Fort_Peck/pest`, including the data folder, the other steps to this tutorial, etc. We also see the new files that were built:

In [6]:
original_files = [f for f in sorted(os.listdir(builder.pest_dir)) if os.path.isfile(os.path.join(builder.pest_dir, f))]
original_files

['2_Fort_Peck.pst',
 '2_fort_peck.insfile_data.csv',
 '2_fort_peck.obs_data.csv',
 '2_fort_peck.par_data.csv',
 '2_fort_peck.pargp_data.csv',
 '2_fort_peck.tplfile_data.csv',
 'config.toml',
 'custom_forward_run.py',
 'etf_US-FPe.ins',
 'mult2model_info.csv',
 'p_aw_US-FPe_0_constant.csv.tpl',
 'p_mad_US-FPe_0_constant.csv.tpl',
 'p_ndvi_alpha_US-FPe_0_constant.csv.tpl',
 'p_ndvi_beta_US-FPe_0_constant.csv.tpl',
 'p_rew_US-FPe_0_constant.csv.tpl',
 'p_swe_alpha_US-FPe_0_constant.csv.tpl',
 'p_swe_beta_US-FPe_0_constant.csv.tpl',
 'p_tew_US-FPe_0_constant.csv.tpl',
 'params.csv',
 'swe_US-FPe.ins']

Check out the files. We see the PEST++ control file, several csv files pointing to parameter information, our python run script, and .tpl and .ins files that spell out to PEST++ where to put the parameter data, and how to read the observations. The params.csv holds our default parameter values and intial estimates of soil parameters from the soils database.

The PEST++ version 2 control file is succint; it delegates the work of detailing how to handle model output, observations, and parameter prosal file and format info to other files.

In [7]:
with open(builder.pst_file, 'r') as f: 
    print(f.read())

pcf version=2
* control data keyword
pestmode                                 estimation
noptmax                                 0
svdmode                                 1
maxsing                          10000000
eigthresh                           1e-06
eigwrite                                1
* parameter groups external
2_fort_peck.pargp_data.csv
* parameter data external
2_fort_peck.par_data.csv
* observation data external
2_fort_peck.obs_data.csv
* model command line
python custom_forward_run.py
* model input external
2_fort_peck.tplfile_data.csv
* model output external
2_fort_peck.insfile_data.csv



Once we have to control file built, we will want to use the `build_localizer` method that writes a `.loc` file matching the 'observations' from SNODAS and SSEBop to the parameters we want to tune. We only tune the SWE parameters `swe_alpha` and `swe_beta` using the SNODAS data, while we tune the other parameters using the SSEBop ETf data. The localizer matrix specifies that for PEST++.

We will also do a minimal model run to ensure the `pest` folder has all the files it will need when we use it as the base for a parallelized run. We do this with the `dry_run` method.

Finally, we run the `write_control_settings` that will change set the model up for many runs over three iterations.

In [8]:
builder.build_localizer()
builder.dry_run()
builder.write_control_settings(noptmax=3, reals=20)

noptmax:0, npar_adj:8, nnz_obs:4100
run():pestpp-ies /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst
pestpp-ies /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst


             pestpp-ies: a GLM iterative ensemble smoother

                   by the PEST++ development team


version: 5.2.7
binary compiled on Dec 12 2023 at 13:33:23

started at 01/27/25 13:56:21
...processing command line: ' pestpp-ies /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst'
...using serial run manager

using control file: "/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst"
in directory: "/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest"
on host: "dgketchum-r"

processing control file /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst

Note: 'NOPTMAX' == 0, switching to forgiveness mode when checking inputs

noptmax = 0, resetti

The control file settings have been changed. The `noptmax` (number of optimization iterations) was increased to 3, with 20 model 'realizations' (runs) per cycle. Once we get the calibration running smoothly, increase the `reals` parameter to a larger number, perhaps 100. We can also see the addition of the `loc.mat` localizer file.

In [9]:
with open(builder.pst_file, 'r') as f:
    print(f.read())

pcf version=2
* control data keyword
pestmode                                 estimation
noptmax                                 3
svdmode                                 1
maxsing                          10000000
eigthresh                           1e-06
eigwrite                                1
ies_localizer                  loc.mat
ies_num_reals                  20
* parameter groups external
2_fort_peck.pargp_data.csv
* parameter data external
2_fort_peck.par_data.csv
* observation data external
2_fort_peck.obs_data.csv
* model command line
python custom_forward_run.py
* model input external
2_fort_peck.tplfile_data.csv
* model output external
2_fort_peck.insfile_data.csv



**Congratulations** if you've made it this far. There is a lot going on in this project, and staying organized while preparing up to harness a powerful tool like PEST++ is a significant achievement!

Let's see if we can improve SWIM through calibration. We're using multiprocessing; feel free to change `workers` to suit your machine.

Run the calibration launcher:

In [12]:
workers = 6

run_pst(builder.pest_dir,
        'pestpp-ies',
        builder.pst_file,
        num_workers=workers,
        worker_root=builder.workers_dir,
        master_dir=builder.master_dir,
        cleanup=True,
        verbose=True)

rmtree: /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/workers/worker_1
rmtree: /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/workers/worker_0
master:pestpp-ies /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst /h :5005 in /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/master


             pestpp-ies: a GLM iterative ensemble smoother

                   by the PEST++ development team


version: 5.2.7
binary compiled on Dec 12 2023 at 13:33:23

started at 01/27/25 16:42:56
...processing command line: ' pestpp-ies /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst /h :5005'
...using panther run manager in master mode using port 5005

using control file: "/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.pst"
in directory: "/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/master"
on host: "dgketchum-r"

processing control file /home/dgk

Error: [('/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.rns', '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/workers/worker_1/2_Fort_Peck.rns', "[Errno 2] No such file or directory: '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.rns'")]

...saved obs+noise observation ensemble (obsval + noise realizations) to  /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.obs+noise.csv
...using subset in lambda testing, number of realizations used in subset testing:  4
...subset how:  RANDOM
...centering on ensemble mean vector
...running initial ensemble of size 20
    running model 20 times
    starting at 01/27/25 16:42:58

    waiting for agents to appear...


PANTHER progress
   avg = average model run time in minutes
   runs(C = completed | F = failed | T = timed out)
   agents(R = running | W = waiting | U = unavailable)
--------------------------------------------------------------------------------
01/27 16:43:40 mn:0.21  runs(C3    |F0    |T0    ) agents(R1   |W0   |U0   ) 0  

If it runs, you see a progress updater that will have something like `01/05 11:35:11 mn:0.16  runs(C5   |F0    |T0    ) agents(R1   |W0   |U0   ) 0`. The `C` stands for 'complete', and if it increases, PEST++ is running. Go get a coffee. 

Let's assume it didn't run.

### Debugging Tips

 - If you never saw the panther, then `pestpp-ies` was probably not executed. Make sure you can run `pestpp-ies` from the command line in any directory on your machine. You may need to point to the executable with a full path, like `/home/skywalker/software/pestpp-ies`, or a path that ends with the '.exe' extension, if on a Windows machine. In this case, you will need to update command above to ensure we are providing `run_pest.py`'s `run_pst` with the correct executable.
 - If you never saw the panther and got a Python error traceback, read it carefully. It's tricky to get the interface to work, as we need to launch `run_pest.py`, it needs to launch `pestpp-ies`, that launches `custom_forward_run.py`, which finally actually runs the model with `run/run_mp.py`.

   A good debugging approach is to start from the bottom up by getting `run_mp.optimize_fields` to run from arguments provided under `if __name__ == '__main__':` in `run_mp.py`. Then get `run_mp.py` to run by launching the `custom_forward_run.py` located in your 'pest' directory. Then try running `run/run_pest.py` with arguments provided under `if __name__ == '__main__':`. Trust a simpler way code flow that doesn't decrease flexibility is sought.

 - Try running the `pestpp-ies` commmand from the 'pest' folder. This runs the program in a single thread, and can rule out problems with the 'pest' folder's files and structure. If you can run this, the problem is likely with the `run/run_pest.py` function `run_pst`. Double check the paths and arguments. Try launching it from `run/run_pest.py` instead of from this notebook.
   
 - If you saw the panther, then `pestpp-ies` ran. Great. You are close. The traceback (message in the ouput) that traces your error is very informative, but the last error is likely not what you need to track down. It's common to see something like
    ```
    thread processing instruction file raised an exception: InstructionFile error in file 'swe_US-FPe.ins' : output file'pred/pred_swe_US-FPe.np' not found
    ```

This interrupted the PEST++ execution of the realization, but likely wasn't the true cause. The error of not finding SWIM's prediction in `pred_swe_US-FPe.np` is actually because SWIM never completed it's run, because SWIM itself has an error. When we run `run_pst`, we're launching the program in each of the number `workers`'s directories, which are copies of the `pest` directory. Set `cleanup` to `False` above, run the code, and go into e.g., `workers/worker_0/panther_worker.rec` and look for errors. 

Often, the problem is that SWIM isn't writing its output to the `pred` folder in each worker.

These are just a few ideas. As always, the key to debugging is reading the hints in the traceback and moving up the code operation chain until the problem is found. Science says 9/10 errors are due to paths not being set correctly.
 


Once we get a successful run, we see we have many more files in the 'pest' directory, but what we want are the calibrated parameters we'll need to use to run SWIM in forecast mode (i.e., a calibrated run of the model). The should be in the 'pest' directory, though in cases theu may end up in 'master', the location where multiprocessing by pyemu of PEST++ was coordinated:

In [None]:
[f for f in sorted(os.listdir(builder.pest_dir)) if '.par.csv' in f]

...saved obs+noise observation ensemble (obsval + noise realizations) to  /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fort_Peck.obs+noise.csv
...using subset in lambda testing, number of realizations used in subset testing:  4
...subset how:  RANDOM
...centering on ensemble mean vector
...running initial ensemble of size 20
    running model 20 times
    starting at 01/27/25 13:56:37

    waiting for agents to appear...


PANTHER progress
   avg = average model run time in minutes
   runs(C = completed | F = failed | T = timed out)
   agents(R = running | W = waiting | U = unavailable)
--------------------------------------------------------------------------------
01/27 14:00:49 mn:0.21  runs(C20   |F0    |T0    ) agents(R0   |W1   |U0   ) 0   

   20 runs complete :  0 runs failed
   0.206 avg run time (min) : 4.2 run mgr time (min)
   1 agents connected


...saved initial obs ensemble to /home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/2_Fo

Make sure these exist. There is a parameter file for each optimization run, the intial '0' run, and the three optimization runs we specified with `noptmax`. Each file has a row for each realization, with columns having a parameter value for each tunable parameter. This is the valuable data we will examine in the next step.

This workflow benefits from a powerful machine; the higher number of workers you can employ, the faster the otpimization will run.