You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the adjoint sensitivity analysis functionality of sunodes, and sporadically I get segmentation faults during the backward pass. Unfortunately I could not reproduce this with a small example so far, but the coredump seems to indicate to me, that CVAfindIndex tries to access checkpoints that do not exist relatively near t0 of the forward problem, in a region where the solver is making (ridiculously?) small steps.
It seems that CVAfindIndex is trying to find a checkpoint for t = 161.33623519238427 but the largest of the 600 entries in ca_mem->dt_mem has only t = 161.33623519238293.
The details of how I'm using sundials are somewhat hidden in a python wrapper and pymc3 (I'm sampling the parameter space with an hamiltonian sampler), but here is a rough outline of what I'm doing:
Initialize forward and backward solvers with polynomial interpolation and checkpoints every 600
Repeat (a lot):
Change user_data
Call CVodeReInit and CVodeAdjReInit
Run forward solver
Call CVodeReInitB, CVodeQuadReInitB and CVodeBsolve repeatedly, as the adjoint rhs is not continuous.
The t of the segfault is nowhere near the discontinuities of the rhs, the first one of those is at t ~ 12000.
I think I figured out what the problem here seems to be:
Let's assume there is only one backward problem.
At the beginning of the loop that advances all the backward problems (here), ck_mem is initialized so that ck_mem->ck_t0 < cvB_mem->cv_mem->cv_tn < ck_mem->ck_t1.
The solver sets ck_mem->ck_t0 as stop time (here) and advances the backward problem. If the solver reached that stop time (so cvB_mem->cv_tout == ck_mem->ck_t0), then cvB_mem->cv_mem->cv_tn will still be larger than the stop time by a small amount, since it (incorrectly in this case) assumes it can not compute the rhs at the stop time itself.
In the next step after advancing the checkpoint, the invariant from above will not be true anymore, and CVStep will continue at cv_tn, so that CVfindIndex will access out-of-bounds memory (here) when looking for a step with t >= ck_mem->ck_t1.
Wouldn't it be better to compute a few more points when re-integrating the forward problem so that the checkpoint data sections overlap slightly? Then the solver would not have to integrate right up to the stop time in all but the last checkpoint sections. That might also lower interpolation errors somewhat I guess.
@balos1 Not sure who to ping, I hope this is alright.
I just ran into an example where I think this bug leads to silently incorrect results. I'd really appreciate it if someone who knows the code could have a look.
I am using the adjoint sensitivity analysis functionality of sunodes, and sporadically I get segmentation faults during the backward pass. Unfortunately I could not reproduce this with a small example so far, but the coredump seems to indicate to me, that CVAfindIndex tries to access checkpoints that do not exist relatively near t0 of the forward problem, in a region where the solver is making (ridiculously?) small steps.
It seems that CVAfindIndex is trying to find a checkpoint for
t = 161.33623519238427
but the largest of the 600 entries in ca_mem->dt_mem has onlyt = 161.33623519238293
.The details of how I'm using sundials are somewhat hidden in a python wrapper and pymc3 (I'm sampling the parameter space with an hamiltonian sampler), but here is a rough outline of what I'm doing:
CVodeReInit
andCVodeAdjReInit
CVodeReInitB
,CVodeQuadReInitB
andCVodeB
solve repeatedly, as the adjoint rhs is not continuous.The
t
of the segfault is nowhere near the discontinuities of the rhs, the first one of those is at t ~ 12000.The source for the solver calls is here: https://github.com/aseyboldt/sunode/blob/master/sunode/solver.py#L365
I can also provide the coredump if that is helpful.
The text was updated successfully, but these errors were encountered: