# Calibration Tutorial

## Step 3: Preparing to Calibrate

Now we see if we are able to improve the model's performance through calibration.

The main calibration tool used in SWIM (PEST++) has been developed over many years by many clever and diligent developers. They've done us a huge favor by writing great documentation that covers the highly varied functionality of PEST++. It's worth it to check out the materials, which will serve those looking for a cursory look, all the way to those that want a deep dive. Several stand-out resources to refer to are the following:

1. The PEST Manual 4th. Ed., Doherty, J., 2002.: https://www.epa.gov/sites/default/files/documents/PESTMAN.PDF. This treats the use of PESTS++ predecessor PEST, but does a great job explaining how we might estimate parameters given observations and a model.
2. The GMDSI tutorial notebooks. These are applications of PEST++ using the groundwater modeling software MODFLOW and the modern Python inteface to PEST++, pyemu: https://github.com/gmdsi/GMDSI_notebooks
3. The PEST++ User's Manual (https://github.com/usgs/pestpp/blob/master/documentation/pestpp_users_manual.md).
4. Calibration and Uncertainty Analysis for Complex Environmental Models. Doherty, J., 2015. See https://pesthomepage.org/pest-book.


### 1. PEST++ Installation

The PEST++ developers do a great job describing the installation process, so we won't cover it here.

Get the latest release of PEST++ for your operating system: https://github.com/usgs/pestpp/releases

Follow the installation instructions: https://github.com/usgs/pestpp/blob/master/documentation/cmake.md

### 2. Setup the calibration files

In order to use PEST++, we need to run through what is, perhaps, a common loop in model calibration:

1. Intialize the model with intial conditions.
2. Run the model, write the results.
3. Compare the results to observations.
4. Propose a new, hopefully better, set of parameters.
5. Run the model with the new parameters, write the results.
6. Repeat 3 - 5.

And so on, until we are satisfied with the performance of the model. 

The purpose of the SWIM calibration approach and this tutorial is to set up a system where the model and the calibration software can operate with minimal interaction. All we need the model to do is take the proposed parameters and use them in a model run, and write the results in a convenient format in a convenient place. All we need the calibration software to do is to compare the model results to observations, determine how to tweak the parameters we've told it are 'tunable', and write a new parameter proposal in a convenient format in a convenient place. 

The `calibration` package in SWIM contains software to build what we need to do this with three modules:

1. `build_pp_files.py` uses several functions to build the files that control PEST++ behavior:
   - The function `build_pest` builds the main `.pst` control file, which defines the eight tunable SWIM model parameters `'aw', 'rew', 'tew', 'ndvi_alpha', 'ndvi_beta', 'mad', 'swe_alpha'`, and `'swe_beta'`. These are three soil water holding capacity parameters (`'aw', 'rew', 'tew'`), the coefficients that control the relationship between remote-sensing-based NDVI and the model transpiration rate parameter `Kcb` (`'ndvi_alpha', 'ndvi_beta'`), the control on when soil water deficit begins to impact transpiration rate (`'mad'`), and the two coefficients that determine the melting rate of snow (`'swe_alpha'`, `'swe_beta'`). The `.pst` file also contains the observation data, which we have derived from SNODAS (SWE) and SSEBop (ETf). Further, the file contains estimates of the noise we believe is in the data. Finally, the `.pst` points to the main Python file that will be used to call the `pestpp-ies` command, the function that runs the PEST++ implementation of Iterative Ensemble Smoother, the algorithm we'll use.
2. `custom_forward_run.py` has a single, simple function (`run`) that uses a system call to execute a SWIM script that runs the model, much like how we've run it ouselves previously. You will need to modify `custom_forward_run.py` to enter your machine's path.
3. `run_pest.py` is the module that we launch, and that starts PEST++ running. This will also need to be modified to use your machine's path.

The actual flow of code execution during calibration is a little confusing, because we use a Python script (`run_pest.py`) to run a command line executable (`'pestpp-ies'`), which itself then executes `custom_forward_run.py` to finally run our Python SWIM code! I know!


In [8]:
import json
import os
import sys

from tqdm import tqdm
import numpy as np
import pandas as pd
import geopandas as gpd

home = os.path.expanduser('~')
root = os.path.join(home, 'PycharmProjects', 'swim-rs')
sys.path.append(root)

from prep.prep_plots import preproc

Let's just build a `dict` with the paths we're going to need:

In [18]:
# We're going to use the need to refer repeatedly to our 'pest' directory to do all PEST++-related work, let's put it at the top level:
project_ws = os.path.join(root, 'tutorials', '2_Fort_Peck')
cal_dir = os.path.join(project_ws, 'pest')
data = os.path.join(project_ws, 'data')

# for convenience, we put all the paths we'll need in a dict
PATHS = {'prepped_input': os.path.join(data, 'prepped_input.json'),
         'input_ts_out': os.path.join(data, 'input_timeseries'),
        '_pst': 'fort_peck.pst',
        'exe_': 'pestpp-ies',
        'm_dir': os.path.join(cal_dir, 'master'),
        'p_dir': os.path.join(cal_dir, 'pest'),
        'w_dir': os.path.join(cal_dir, 'workers'),
         'obs': os.path.join(project_ws, 'obs'),
        'python_script': os.path.join(root, 'calibrate', 'custom_forward_run.py')}

if not os.path.isdir(PATHS['obs']):
    os.makedirs(PATHS['obs'], exist_ok=True)

In [19]:
# write the observed data to files within the 'pest' directory
shapefile_path = os.path.join(data, 'gis', 'flux_fields.shp')
gdf = gpd.read_file(shapefile_path)
FEATURE_ID = 'field_1'

# use the following for a full station network extract:
# stations = gdf[FEATURE_ID].tolist()

# use this for just Fort Peck
stations = ['US-FPe']

preproc(stations, PATHS['input_ts_out'], project_ws)


US-FPe
preproc ETf mean: 0.14


FileNotFoundError: [Errno 2] No such file or directory: '/home/dgketchum/PycharmProjects/swim-rs/tutorials/2_Fort_Peck/pest/obs/obs_etf_US-FPe.np'