# Gradient checks

It is best practice to do gradient checks before and after gradient-based optimization.

1. Find suitable tolerances to use during optimization. Importantly, test your gradients using the settings you will use later on.
2. At the optimum the values should be close to 0, except for parameters with active bounds. 
3. Gradient checks can help you identify inconsistencies and errors, especially when using custom gradient calculation or objectives.

Here we show, how to use the gradient check methods that are implemented in pyPESTO, using the finite differences (FD) method as a comparison. There is a trade-off between the quality of the approximation and numerical noise, so it is recommended to try different FD step sizes.


In [1]:
import numpy as np

import benchmark_models_petab as models
import pypesto.optimize as optimize
import pypesto.petab

np.random.seed(2)

import pandas as pd
import seaborn as sns

### Set up an example problem

Create the pypesto problem and a random vector of parameter values.  
Here, we use the startpoint sampling method to generate random parameter vectors.

In [2]:
%%capture

model_name = "Boehm_JProteomeRes2014"
petab_problem = models.get_problem(model_name)

importer = pypesto.petab.PetabImporter(petab_problem)
pypesto_problem = importer.create_problem(verbose=False)

In [3]:
startpoints = pypesto_problem.get_startpoints(n_starts=4)

### Gradient check before optimization

Perform a gradient check at the location of one of the random parameter vectors. `check_grad` compares the gradients obtained by the finite differences (FD) method and the objective gradient. You can modify the finite differences step size via the argument `eps`.  

In [4]:
pypesto_problem.objective.check_grad(
    x = startpoints[0], 
    eps = 1e-5,  # default
    verbosity = 0,
)

Unnamed: 0,grad,fd_f,fd_b,fd_c,fd_err,abs_err,rel_err
Epo_degradation_BaF3,2220912000.0,2220538000.0,2220430000.0,2220484000.0,108935.5,427995.7,0.0001927489
k_exp_hetero,70.22117,-3842651.0,-1103638.0,-2473145.0,2739014.0,2473215.0,1.000028
k_exp_homo,84022.13,-1905737.0,1543799.0,-180969.2,3449536.0,264991.4,1.46429
k_imp_hetero,28258500.0,33018330.0,29310400.0,31164370.0,3707935.0,2905869.0,0.09324331
k_imp_homo,5956.185,376171.9,-1238330.0,-431079.1,1614502.0,437035.3,1.013817
k_phos,-2433153000.0,-2430767000.0,-2432919000.0,-2431843000.0,2152539.0,1309842.0,0.0005386211
sd_pSTAT5A_rel,-5862715000000.0,-5862577000000.0,-5862854000000.0,-5862715000000.0,277160200.0,2296.471,3.917077e-10
sd_pSTAT5B_rel,-47591220000.0,-47586540000.0,-47595900000.0,-47591220000.0,9363916.0,5.641838,1.185479e-10
sd_rSTAT5A_rel,14.15492,3586157.0,-3586108.0,24.41406,7172266.0,10.25914,0.4202142


Explanation of the gradient check result columns:

- `grad`: Objective gradient
- `fd_f`: FD forward difference
- `fd_b`: FD backward difference
- `fd_c`: Approximation of FD central difference (reusing the information from `fd_f` and `fd_b`)
- `fd_err`: Deviation between forward and backward differences `fd_f`, `fd_b`
- `abs_err`: Absolute error between `grad` and the central FD gradient `fd_c`
- `rel_err` Relative error between `grad` and the central FD gradient `fd_c`

If there are fixed parameters in your vector you might invoke an error due to the dimension mismatch. Use the helper method `Problem.get_reduced_vector` to get the reduced vector with only free (estimated) parameters.  
Here we set a smaller FD step size `eps = 1e-6` and observe that the errors change:

In [None]:
parameter_vector = pypesto_problem.get_reduced_vector(startpoints[0])

pypesto_problem.objective.check_grad(
    x = parameter_vector,
    eps = 1e-6,
    verbosity = 0,
)

Unnamed: 0,grad,fd_f,fd_b,fd_c,fd_err,abs_err,rel_err
Epo_degradation_BaF3,2220912000.0,2246010000.0,2218966000.0,2232488000.0,27043460.0,11575810.0,0.005185163
k_exp_hetero,70.22117,20994870.0,-36944340.0,-7974731.0,57939210.0,7974802.0,1.000009
k_exp_homo,84022.13,2855469.0,-28021480.0,-12583010.0,30876950.0,12667030.0,1.006677
k_imp_hetero,28258500.0,-4105957.0,10725340.0,3309692.0,14831300.0,24948810.0,7.538104
k_imp_homo,5956.185,29872560.0,-4359131.0,12756710.0,34231690.0,12750760.0,0.9995331
k_phos,-2433153000.0,-2414397000.0,-2440591000.0,-2427494000.0,26193850.0,5659110.0,0.002331256
sd_pSTAT5A_rel,-5862715000000.0,-5862666000000.0,-5862765000000.0,-5862715000000.0,98720210.0,1063.561,1.814109e-10
sd_pSTAT5B_rel,-47591220000.0,-47555250000.0,-47627190000.0,-47591220000.0,71941890.0,67.60034,1.420437e-09
sd_rSTAT5A_rel,14.15492,35861080.0,-35861080.0,0.0,71722170.0,14.15492,14154920.0


The method `check_grad_multi_eps` calls the `check_grad` method multiple times with different settings for the FD step size and reports the setting that results in the smallest error. 
You can supply a list of FD step sizes to be tested via the `multi_eps` argument (or use the default ones), and use the `label` argument to switch between the FD, or absolute or relative error.

In [19]:
gc = pypesto_problem.objective.check_grad_multi_eps(
    x=parameter_vector,
    verbosity=0,
    label='rel_err',  # default
)

Use the pandas style methods to visualise the results of the gradient check, e.g.:

In [20]:
def highlight_value_above_threshold(x, threshold=1):
    return ['color: darkorange' if xi > threshold else None for xi in x]

def highlight_gradient_check(gc: pd.DataFrame):
    return gc.style.apply(
        highlight_value_above_threshold, subset=["fd_err"],
    ).background_gradient(
        cmap=sns.light_palette("purple", as_cmap=True), subset=["abs_err"],
    ).background_gradient(
        cmap=sns.light_palette("red", as_cmap=True), subset=["rel_err"],
    ).background_gradient(
        cmap=sns.color_palette("viridis", as_cmap=True), subset=["eps"],
    )

highlight_gradient_check(gc)

Unnamed: 0,grad,fd_f,fd_b,fd_c,fd_err,abs_err,rel_err,eps
Epo_degradation_BaF3,2220911980.111206,2218815574.707031,2223017339.599609,2220916457.15332,4201764.892578,4477.042115,2e-06,0.001
k_exp_hetero,70.221175,501.23291,265.585938,383.409424,235.646973,313.188249,0.816638,0.1
k_exp_homo,84022.128076,85081.540527,82042.426758,83561.983643,3039.11377,460.144434,0.005507,0.1
k_imp_hetero,28258498.729311,28309819.091797,28160520.996094,28235170.043945,149298.095703,23328.685366,0.000826,0.001
k_imp_homo,5956.185241,5343.374023,6553.217773,5948.295898,1209.84375,7.889343,0.001326,0.1
k_phos,-2433152761.910034,-2435951032.958984,-2430378865.966797,-2433164949.462891,5572166.992188,12187.552857,5e-06,0.001
sd_pSTAT5A_rel,-5862715285801.674,-5862576707983.398,-5862853868212.891,-5862715288098.145,277160229.492188,2296.470703,0.0,1e-05
sd_pSTAT5B_rel,-47591221503.14722,-47586539550.78124,-47595903466.796875,-47591221508.789055,9363916.015625,5.641838,0.0,1e-05
sd_rSTAT5A_rel,14.154924,35874.755859,-35847.167969,13.793945,71721.923828,0.360978,0.026167,0.001


There are consistently large discrepancies between forward and backward FD and a large relative error for the parameter `k_exp_hetero`.  

Ideally, all gradients would agree, but especially at not-so-smooth points of the objective, like (local) optima, large FD errors can occur.
It is recommended to check gradients over a lot of random points and check if there are consistently large errors for specific parameters.  

Below we perform a gradient check for another random point and observe small errors:

In [None]:
parameter_vector = startpoints[1]

gc = pypesto_problem.objective.check_grad_multi_eps(
    x=parameter_vector,
    verbosity=0,
    label='rel_err',  # default
)
highlight_gradient_check(gc)

Unnamed: 0,grad,fd_f,fd_b,fd_c,fd_err,abs_err,rel_err,eps
Epo_degradation_BaF3,0.000266,0.000299,0.000238,0.000268,6.1e-05,2e-06,2.4e-05,0.1
k_exp_hetero,0.000174,0.00017,0.000176,0.000173,7e-06,1e-06,7e-06,0.1
k_exp_homo,-0.000254,-0.00027,-0.000237,-0.000253,3.2e-05,0.0,1e-06,0.1
k_imp_hetero,0.125483,0.125345,0.125628,0.125486,0.000283,3e-06,2.6e-05,0.001
k_imp_homo,0.153206,0.153031,0.15337,0.153201,0.000339,6e-06,3.6e-05,0.001
k_phos,-0.280337,-0.28066,-0.280019,-0.280339,0.000641,2e-06,8e-06,0.001
sd_pSTAT5A_rel,-2344.11583,-2344.060317,-2344.17133,-2344.115824,0.111014,7e-06,0.0,1e-05
sd_pSTAT5B_rel,-122250603.677674,-122247788.796946,-122253418.645635,-122250603.721291,5629.848689,0.043617,0.0,1e-05
sd_rSTAT5A_rel,-31780.671353,-31779.938564,-31781.404465,-31780.671515,1.465902,0.000162,0.0,1e-05


### Gradient check after optimization

Next, we do optimization and perform a gradient check at a local optimum.

In [22]:
%%capture

result = optimize.minimize(
    problem=pypesto_problem, 
    optimizer=optimize.ScipyOptimizer(), 
    n_starts=4,
)

(Local) optima can be points with weird gradients. At a steep optimum, the `fd_err` is expected to be high.  

At the local optimum shown below, the `sd_pSTAT5B_rel` forward and backward FD have opposite signs and are quite large, resulting in a substantial `fd_err`. 

In [23]:
# parameter vector at the local optimum, obtained from optimization
parameter_vector = pypesto_problem.get_reduced_vector(result.optimize_result[0].x)

highlight_gradient_check(
    gc = pypesto_problem.objective.check_grad_multi_eps(
        x=parameter_vector,
        verbosity=0,
        label='rel_err',  # default
    )
)

Unnamed: 0,grad,fd_f,fd_b,fd_c,fd_err,abs_err,rel_err,eps
Epo_degradation_BaF3,-0.045478,1.18147,-1.275828,-0.047179,2.457299,0.001701,0.036844,0.001
k_exp_hetero,0.028032,0.031503,0.025003,0.028253,0.0065,0.000221,0.001725,0.1
k_exp_homo,-0.006328,0.030595,-0.039241,-0.004323,0.069836,0.002005,0.603263,0.001
k_imp_hetero,-0.062421,1.423755,-1.548763,-0.062504,2.972518,8.3e-05,0.00135,0.001
k_imp_homo,-0.015293,0.189569,-0.222109,-0.01627,0.411677,0.000977,0.064003,0.001
k_phos,0.016825,0.61239,-0.577662,0.017364,1.190052,0.00054,0.029381,0.001
sd_pSTAT5A_rel,-0.005559,-0.492414,0.48131,-0.005552,0.973724,7e-06,0.001335,1e-05
sd_pSTAT5B_rel,-0.001795,-4877.105482,4877.101901,-0.001791,9754.207383,5e-06,0.002716,0.0
sd_rSTAT5A_rel,0.006739,-48.764286,48.777771,0.006743,97.542058,3e-06,0.000501,0.0


### How to "fix" my gradients?

- Find suitable simulation tolerances.

Specific to the petab-amici-pipeline:

- Check the simulation logs for Warnings and Errors.
- Consider switching between forward and adjoint sensitivity algorithms.
