# Python Glacier Evolution individual glacier calibration demo

Brandon Tober, David Rounce<br>
Carnegie Mellon University<br>
20241220<br> <br>
The onbjective of this notebook is to demonstrate the various calibration options of PyGEM. 
Regardless of the calibration option selected, a *<rgi_glacier_number>-modelprms_dict.json* file should be created within **~/pygem_data/Output/calibration/** which can then be used to run simulations. See [PyGEM's readthedocs](https://pygem.readthedocs.io/en/latest/calibration_options.html) for details on the various calibration options.
<br> <br>
The calibration options are each demonstrated below. Note, we recommend using the [MCMC](#MCMC-target) calibration option 
(Rounce et al. [2020a](https://www.cambridge.org/core/journals/journal-of-glaciology/article/quantifying-parameter-uncertainty-in-a-largescale-glacier-evolution-model-using-bayesian-inference-application-to-high-mountain-asia/61D8956E9A6C27CC1A5AEBFCDADC0432), [2020b](https://www.frontiersin.org/articles/10.3389/feart.2019.00331/full), [2023](https://www.science.org/doi/10.1126/science.abo1324)) as this enables the user to quantify the uncertainty associated with the model parameters in the simulations; however, it is very computationally expensive. The methods from [Huss and Hock (2015)](https://www.frontiersin.org/articles/10.3389/feart.2015.00054/full) provide a computationally cheap alternative. 

Note, in this notebook, any PyGEM Python scripts that are called will be preceeded by ```!```. In Jupyter Notebook, the ```!``` character is used to execute shell commands directly from a notebook cell. When you prefix a command with ```!```, Jupyter sends it to the system's command-line shell for execution, instead of interpreting it as Python code. Command-line arguments are also bracketed (e.g., ```{arg}```), when caling a command-line shell from within a Jupyter Notebook. If executing a given PyGEM script directly from one's terminal, remove the ```!``` character and brackets ```{}``` around arguments.

Also note that some useful information for any PyGEM script can be displayed by running
```!script -h```

In [None]:
### imports ###
import os, sys, glob, json
# pygem imports
import pygem.setup.config as config
# check for config
config.ensure_config()
# read the config
pygem_prms = config.read_config()   # NOTE: ensure that your root path in ~/PyGEM/config.yaml points to
                                    # the appropriate location. If any errors occur, check this first.
rootpath=pygem_prms['root']

In [None]:
# take a look how to call the run_calibration script:
!run_calibration -h

In [None]:
# specify glacier numbember (RGI6 id), we'll run Kaskawulsh in the Canadian Yukon
glac_no = 1.16201
# specify some variables that will remain constant for each calibration option
yr0=2000    # reference period startyear
yr1=2019    # reference period endyear
gcm='ERA5'  # reference period GCM

## HH2015
The calibration option **HH2015** follows the calibration steps from [Huss and Hock (2015)](https://www.frontiersin.org/articles/10.3389/feart.2015.00054/full). Specifically, the precipitation factor is initially adjusted between 0.8-2.0. If agreement between the observed and modeled mass balance is not reached, then the degree-day factor of snow is adjusted between 1.75-4.5 mm $d^{-1}$ $K^{-1}$. Note that the ratio of the degree-day factor of ice to snow is set to 2, so both parameters are adjusted simultaneously. Lastly, if agreement is still not achieved, then the temperature bias is adjusted.

In [None]:
# call run calibration
calib_opt = 'HH2015'
!run_calibration -rgi_glac_number {glac_no} -ref_startyear {yr0} -ref_endyear {yr1} -ref_gcm_name {gcm} -option_calibration {calib_opt}

In [None]:
# check the output - the parameter dictionary output should now have an `HH2015` key
glacier_str = str(glac_no)
reg = glacier_str.split('.')[0].zfill(2)
calib_path = f"{pygem_prms['root']}/Output/calibration/{reg}/{glacier_str}-modelprms_dict.json"

with open(calib_path, 'r') as f:
    modelprms_dict = json.load(f)

print(modelprms_dict['HH2015'])

## HH2015mod
The calibration option **HH2015mod** is a modification of the calibration steps from [Huss and Hock (2015)](https://www.frontiersin.org/articles/10.3389/feart.2015.00054/full) that are used to generate the prior distributions for the MCMC methods [(Rounce et al. 2020a)](https://www.cambridge.org/core/journals/journal-of-glaciology/article/quantifying-parameter-uncertainty-in-a-largescale-glacier-evolution-model-using-bayesian-inference-application-to-high-mountain-asia/61D8956E9A6C27CC1A5AEBFCDADC0432). Since the MCMC methods used degree-day factors of snow based on previous studies, only the precipitation factor and temperature bias are calibrated. The precipitation factor varies from 0.5-3 and if agreement is not reached between the observed and modeled mass balance, then the temperature bias is varied. Note the limits on the precipitation factor are estimated based on a rough estimate of the precipitation factors needed for the modeled winter mass balance of reference glacier to match the observations.

In [None]:
# call run_calibration
calib_opt = 'HH2015mod'
!run_calibration -rgi_glac_number {glac_no} -ref_startyear {yr0} -ref_endyear {yr1} -ref_gcm_name {gcm} -option_calibration {calib_opt}

In [None]:
# check the output - the parameter dictionary output should now have an `HH2015mod` key
with open(calib_path, 'r') as f:
    modelprms_dict = json.load(f)

print(modelprms_dict['HH2015mod'])

## Emulator - applying HH2015mod
The calibration option **emulator** creates an independent emulator for each glacier that is derived by performing 100 present-day simulations based on randomly sampled model parameter sets and then fitting a Gaussian Process to these parameter-response pairs. This model replaces the mass balance model within the MCMC sampler, which tests showed reduces the computational expense by two orders of magnitude.

```{note}
Note: The 'emulator' calibration option needs to be run before the ‘MCMC’ option.
```

In [None]:
# call run_calibration
calib_opt = 'emulator'
!run_calibration -rgi_glac_number {glac_no} -ref_startyear {yr0} -ref_endyear {yr1} -ref_gcm_name {gcm} -option_calibration {calib_opt}

In [None]:
# check the output - the parameter dictionary output should now have an `emulator` key
with open(calib_path, 'r') as f:
    modelprms_dict = json.load(f)

print(modelprms_dict['emulator'])

```{note}
Note: The 'emulator' calibration option stores all simulations used to derive the emulator in ~/pygem_data/Output/emulator/sims/
```

In [None]:
%%bash -s "$rootpath" "$reg" "$glac_no"
head $1/Output/emulator/sims/$2/$3-100_emulator_sims.csv

<a id="MCMC-target"></a>

## Bayesian inference calibration using Markov Chain Monte Carlo methods
The calibration option **MCMC** is the recommended option. Details of the methods are provided by Rounce et al. ([2020a](https://www.cambridge.org/core/journals/journal-of-glaciology/article/quantifying-parameter-uncertainty-in-a-largescale-glacier-evolution-model-using-bayesian-inference-application-to-high-mountain-asia/61D8956E9A6C27CC1A5AEBFCDADC0432), [2023](https://www.science.org/doi/10.1126/science.abo1324)). In short, Bayesian inference is performed using Markov Chain Monte Carlo (MCMC) methods, which requires a mass balance observation (including the uncertainty represented by a standard deviation) and prior distributions. In an ideal world, we would have enough data to use broad prior distributions (e.g., uniform distributions), but unfortunately the model is overparameterized meaning there are an infinite number of parameter sets that give us a perfect fit. We therefore must use an empirical Bayes approach by which we use a simple optimization scheme (the **HH2015mod** calibration option) to generate our prior distributions at the regional scale, and then use these prior distributions for the Bayesian inference. The prior distribution for the degree-day factor is based on previous data ([Braithwaite 2008](https://www.cambridge.org/core/journals/journal-of-glaciology/article/temperature-and-precipitation-climate-at-the-equilibriumline-altitude-of-glaciers-expressed-by-the-degreeday-factor-for-melting-snow/6C2362F61B7DE7F153247A039736D54C)), while the temperature bias and precipitation factor are derived using a simple optimization scheme based on each RGI Order 2 subregion. The temperature bias assumes a normal distribution and the precipitation factor assumes a gamma distribution to ensure positivity. Glacier-wide winter mass balance data ([WGMS 2020](https://wgms.ch/data_databaseversions/)) are used to determine a reasonable upper-level constraint for the precipitation factor for the simple optimization scheme. 

The MCMC methods thus require several steps. These steps can be skipped if the **emulator** calibration option has already been run for the region of interest and regional priors have been defined - if the file *~/pygem_data/Output/calibration/priors_region.csv* exists.<br>
1. run the calibration with <em>option_calibration = 'emulator'</em> (as shown in previous cell). This creates an emulator that helps speed up the simulations within the MCMC methods and helps generate an initial calibration to generate the regional priors. Run this initial calibration:
    ```
    run_calibration -option_calibration emulator
    ```
2. The regional priors are then determined by running the following:
    ```
    run_mcmc_prior
    ```
    This will output a .csv file that has the distributions for the temperature bias and precipitation factors for each Order 2 RGI subregion. This file is located in `~/pygem_data/Output/calibration/`
    <br><br>
3. Once the regional priors are set, the MCMC methods can be performed.  
    ```
    run_calibration -option_calibration MCMC
    ```
    In order to reduce the file size, the parameter sets are thinned by a factor of 10. This is reasonable given the correlation between subsequent parameter sets during the Markov Chain, but can be adjusted if thinning is not desired (change `[calib]['MCMC_params']['thin_interval']` to 1 in the *~/PyGEM/config.yaml* configuration file).

```{note}
Note: 'MCMC_fullsim' is another calibration option that runs full model simulations within the MCMC methods instead of using the emulator. It is computationally very expensive but allows one to assess the emulators impact on the MCMC methods.
```

In [None]:
# to demonstrate the MCMC calibration steps, we'll select a small region - this may still take some time if OGGM's preprocessed data is not yet downloaded
# recall we need to first run the 'emulator' calibration option for an entire RGI region to develop regional priors
# note this cell can be skipped if the desired region has already been calibrated using 'emulator' option
# we'll also run in parallel to speed things up a bit
calib_opt = 'emulator'
region = 6
num_cores=8     # change depending on how many cores you have/want to utilize
!run_calibration -rgi_region01 {region} -ref_startyear {yr0} -ref_endyear {yr1} -ref_gcm_name {gcm} -option_calibration {calib_opt} -ncores {num_cores}

In [None]:
!run_mcmc_priors -h

In [None]:
# next, develop regional priors from 'emulator' calibration
!run_mcmc_priors -rgi_region01 6 -v -p # optionally remove '-p' nad '-v' flags

Take a quick look at the output regional priors dataset

In [None]:
%%bash -s "$rootpath"
head $1/Output/calibration/priors_region.csv

In [None]:
# now one may run MCMC calibration, which will use the developed mass balance emulator for each glacier, as well as the regional priors
# note, this can take a long time to run - performance is sped up dramatically on a supercomputer with many cores 
# to demonstrate, we'll simply cut the chain_length down - 10,000 to 20,000 samples is advisable for proper calibration and chain convergence
calib_opt = 'MCMC'
chain_length = 100
!run_calibration -rgi_region01 {region} -ref_startyear {yr0} -ref_endyear {yr1} -ref_gcm_name {gcm} -option_calibration {calib_opt} -chain_length {chain_length} -ncores {num_cores}

## Next: you are now ready to run simulations
See, the *run_simulation.ipynb* Jupyter Notebook for demonstration.