ConditionalDimension leads to kernel segfault #1829

mwcvitkovic opened this issue Jan 31, 2022 · 6 comments

mwcvitkovic opened this issue Jan 31, 2022 · 6 comments


The following minimal reproducible example kills the kernel in a jupyter notebook when nsnaps equals 1, 2, 4, 5, 50, 53, or 54. It runs when nsnaps equals 3, 51, or 52.

from devito import Grid, TimeFunction, Eq, Operator, solve, ConditionalDimension


grid = Grid(shape=(100, 100))

# For snapshotting
factor = round(num_timesteps / nsnaps)
time_subsampled = ConditionalDimension('t_sub', parent=grid.time_dim, factor=factor)
psave = TimeFunction(name='psave', grid=grid, save=nsnaps, time_dim=time_subsampled)

# Base simulation
p = TimeFunction(name='p', grid=grid, space_order=2, time_order=2)

# Source[0, 0, 0] = 1.

weq = Eq(0.5**2 * p.laplace - p.dt2)
step = Eq(p.forward, solve(weq, p.forward))
op = Operator([step] + [Eq(psave, p)])
summary = op.apply(time=num_timesteps, dt=1e-2)

Is there an obvious reason for this?

System info

Python env

MacBook Pro (2021)
Apple M1 Max
32 GB memory

@mwcvitkovic mwcvitkovic changed the title Snapshotting kills kernel Snapshotting kills jupyter kernel Jan 31, 2022
FabioLuporini commented Jan 31, 2022

This is a segfault due to an out-of-bounds (OOB) access to psave

The fact that you'll go OOB with the given arguments should have been detected in the Python layer by Devito, and a suitable exception should have been raised, rather than letting the kernel die in that horrible way once in C-land. However we aren't unfortunately doing this kind of checks, yet. This method probably needs to be overridden in ConditionalDimension (external contributions always welcome!). There should be an open issue about this.

Anyway, explanation.

Take nsnaps=54.

You then that

In [1]: psave.shape
Out[1]: (54, 100, 100)

You will save one snapshot every factor time iterations. Also, the computation starts at time_m=1 and runs until time_M=500. For confirmation:

In [3]: op.arguments(time=num_timesteps, dt=1e-2)['time_m']
Out[3]: 1

In [4]: op.arguments(time=num_timesteps, dt=1e-2)['time_M']
Out[4]: 500

note that op.apply(time=X ...) is syntactically equivalent to op.apply(time_M=X ...), and in your case X=num_timesteps=500.

And you see that 500/factor = 500/9 = 55.55...

So before reaching the end of the computation, you'll attempt to write to e.g. psave[498 / 9][...] = psave[55] ... that is an OOB access, which causes a segfault

Hope this helps.

I think I'll add an FAQ and keep the issue open until this behaviour gets properly caught in Python

Updated FAQ, see here:

I'm updating this issue's title accordingly

@FabioLuporini FabioLuporini changed the title Snapshotting kills jupyter kernel ConditionalDimension leads to kernel segfault Jan 31, 2022
Understood - thanks!

Not sure your preferences for closing issues vs. leaving them open, so feel free to close it if you want. I'm good on my end.

If you are happy with the resolution, closing it yourself is fine. Glad your issue is answered.

Actually, I suggest to leave it open, or open a new one that describes the underlying issue, as I thought there was one already, but I can't find it, so I guess I was wrong

