Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConditionalDimension leads to kernel segfault #1829

Closed
mwcvitkovic opened this issue Jan 31, 2022 · 6 comments
Closed

ConditionalDimension leads to kernel segfault #1829

mwcvitkovic opened this issue Jan 31, 2022 · 6 comments
Labels

Comments

@mwcvitkovic
Copy link

The following minimal reproducible example kills the kernel in a jupyter notebook when nsnaps equals 1, 2, 4, 5, 50, 53, or 54. It runs when nsnaps equals 3, 51, or 52.

from devito import Grid, TimeFunction, Eq, Operator, solve, ConditionalDimension

num_timesteps=500
nsnaps=54

grid = Grid(shape=(100, 100))

# For snapshotting
factor = round(num_timesteps / nsnaps)
time_subsampled = ConditionalDimension('t_sub', parent=grid.time_dim, factor=factor)
psave = TimeFunction(name='psave', grid=grid, save=nsnaps, time_dim=time_subsampled)

# Base simulation
p = TimeFunction(name='p', grid=grid, space_order=2, time_order=2)

# Source
p.data[0, 0, 0] = 1.

weq = Eq(0.5**2 * p.laplace - p.dt2)
step = Eq(p.forward, solve(weq, p.forward))
op = Operator([step] + [Eq(psave, p)])
summary = op.apply(time=num_timesteps, dt=1e-2)

Is there an obvious reason for this?

System info

Python env

absl-py 1.0.0 pypi_0 pypi
appnope 0.1.2 py38h50d1736_1 conda-forge
argon2-cffi 20.1.0 py38h5406a74_2 conda-forge
async_generator 1.10 py_0 conda-forge
attrs 20.3.0 pyhd3deb0d_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.1 py_0 conda-forge
bleach 3.2.3 pyh44b312d_0 conda-forge
boto 2.49.0 pypi_0 pypi
boto3 1.16.58 pypi_0 pypi
botocore 1.19.58 pypi_0 pypi
brotlipy 0.7.0 py38h9ed2024_1003
bzip2 1.0.8 h0d85af4_4 conda-forge
ca-certificates 2021.10.8 h033912b_0 conda-forge
certifi 2021.10.8 py38h50d1736_1 conda-forge
cffi 1.14.3 py38h2125817_2
chardet 3.0.4 py38hecd8cb5_1003
cirpy 1.0.2 pypi_0 pypi
conda 4.11.0 py38h50d1736_0 conda-forge
conda-package-handling 1.7.2 py38h22f3db7_0
cryptography 3.2.1 py38hbcfaee0_1
cycler 0.10.0 py_2 conda-forge
dbus 1.13.18 h18a8e69_0
decorator 4.4.2 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
entrypoints 0.3 pyhd8ed1ab_1003 conda-forge
expat 2.2.10 hb1e8313_2
ffmpeg 4.2.2 h97e5cf8_0
flatbuffers 2.0 pypi_0 pypi
freetype 2.10.4 h4cff582_1 conda-forge
gettext 0.19.8.1 h1f1d5ed_1 conda-forge
glib 2.55.0 0 conda-forge
gmp 6.1.2 h0a44026_1000 conda-forge
gnutls 3.6.13 hc269f14_0 conda-forge
hashids 1.3.1 pypi_0 pypi
icu 64.2 h6de7cb9_1 conda-forge
idna 2.10 py_0
importlib-metadata 3.4.0 py38h50d1736_0 conda-forge
importlib_metadata 3.4.0 hd8ed1ab_0 conda-forge
ipykernel 5.4.3 py38h9bb44b7_0 conda-forge
ipython 7.21.0 py38h9bb44b7_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.6.3 pyhd3deb0d_0 conda-forge
jax 0.2.25 pypi_0 pypi
jaxdf 0.1.0 pypi_0 pypi
jaxlib 0.1.74 pypi_0 pypi
jedi 0.18.0 py38h50d1736_2 conda-forge
jinja2 2.11.2 pyh9f0ad1d_0 conda-forge
jmespath 0.10.0 pypi_0 pypi
jpeg 9d hbcb3906_0 conda-forge
jsonschema 3.2.0 py_2 conda-forge
jupyter 1.0.0 py38h50d1736_5 conda-forge
jupyter_client 6.1.11 pyhd8ed1ab_1 conda-forge
jupyter_console 6.2.0 py_0 conda-forge
jupyter_core 4.7.0 py38h50d1736_1 conda-forge
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_widgets 1.0.0 pyhd8ed1ab_1 conda-forge
kiwisolver 1.3.1 py38h23ab428_0
lame 3.100 h35c211d_1001 conda-forge
lcms2 2.12 h577c468_0 conda-forge
libblas 3.9.0 7_openblas conda-forge
libcblas 3.9.0 7_openblas conda-forge
libcxx 10.0.0 1
libedit 3.1.20191231 h1de35cc_1
libffi 3.3 hb1e8313_2
libgfortran 5.0.0 9_3_0_h6c81a4c_16 conda-forge
libgfortran5 9.3.0 h6c81a4c_16 conda-forge
libiconv 1.15 h0b31af3_1006 conda-forge
liblapack 3.9.0 7_openblas conda-forge
libopenblas 0.3.12 openmp_h54245bb_1 conda-forge
libopus 1.3.1 hc929b4f_1 conda-forge
libpng 1.6.37 h7cec526_2 conda-forge
libsodium 1.0.18 hbcb3906_1 conda-forge
libtiff 4.2.0 h9da4c3f_0
libvpx 1.7.0 h378b8a2_0
libwebp-base 1.2.0 hbcf498f_0 conda-forge
llvm-openmp 11.0.1 h7c73e74_0 conda-forge
lz4-c 1.9.2 h4a8c4bd_1 conda-forge
markupsafe 1.1.1 py38h5406a74_3 conda-forge
matplotlib 3.3.4 py38h50d1736_0 conda-forge
matplotlib-base 3.3.4 py38h8b3ea08_0
memory-profiler 0.58.0 pypi_0 pypi
mistune 0.8.4 py38h5406a74_1003 conda-forge
nbclient 0.5.1 py_0 conda-forge
nbconvert 6.0.7 py38h50d1736_3 conda-forge
nbformat 5.1.2 pyhd8ed1ab_1 conda-forge
ncurses 6.2 h0a44026_1
nest-asyncio 1.4.3 pyhd8ed1ab_0 conda-forge
nettle 3.4.1 h3efe00b_1002 conda-forge
notebook 6.2.0 py38h50d1736_0 conda-forge
numpy 1.19.5 py38h6ced74f_1 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openh264 2.1.1 hd174df1_0 conda-forge
openssl 1.1.1l h0d85af4_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 20.8 pyhd3deb0d_0 conda-forge
pandas 1.0.1 py38h4f17bb1_0 conda-forge
pandoc 2.11.4 h35c211d_0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parso 0.8.1 pyhd8ed1ab_0 conda-forge
pcre 8.44 h4a8c4bd_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 8.1.2 py38h83525de_0 conda-forge
pip 21.3.1 pypi_0 pypi
prometheus_client 0.9.0 pyhd3deb0d_0 conda-forge
prompt-toolkit 3.0.11 pyha770c72_0 conda-forge
prompt_toolkit 3.0.11 hd8ed1ab_0 conda-forge
psutil 5.8.0 pypi_0 pypi
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pycosat 0.6.3 py38h1de35cc_1
pycparser 2.20 py_2
pygments 2.7.4 pyhd8ed1ab_0 conda-forge
pyopenssl 19.1.0 pyhd3eb1b0_1
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.9.2 py38h655552a_2
pyrsistent 0.17.3 py38h5406a74_2 conda-forge
pysocks 1.7.1 py38_1
python 3.8.5 h26836e1_1
python-dateutil 2.8.1 py_0 conda-forge
python.app 2 py38_10
python_abi 3.8 1_cp38 conda-forge
pytz 2020.5 pyhd8ed1ab_0 conda-forge
pyzmq 20.0.0 py38h23ab428_1
qt 5.9.7 h8cf7e54_3 conda-forge
qtconsole 5.0.2 pyhd8ed1ab_0 conda-forge
qtpy 1.9.0 py_0 conda-forge
readline 8.0 h1de35cc_0
requests 2.24.0 py_0
ruamel_yaml 0.15.87 py38haf1e3a3_1
s3transfer 0.3.4 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
selenium 3.141.0 pypi_0 pypi
selenium-wire 3.0.5 pypi_0 pypi
send2trash 1.5.0 py_0 conda-forge
setuptools 50.3.1 py38hecd8cb5_1
sip 4.19.8 py38h0a44026_0
six 1.15.0 py38hecd8cb5_0
sqlite 3.33.0 hffcf06c_0
stem 1.8.0 pypi_0 pypi
tbselenium 0.5.3 pypi_0 pypi
terminado 0.9.2 py38h50d1736_0 conda-forge
testpath 0.4.4 py_0 conda-forge
tk 8.6.10 hb0a8c7a_0
toripchanger 1.1.4 pypi_0 pypi
tornado 6.1 py38h5406a74_1 conda-forge
tqdm 4.51.0 pyhd3eb1b0_0
traitlets 5.0.5 py_0 conda-forge
typing-extensions 4.0.1 pypi_0 pypi
urllib3 1.25.11 py_0
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.35.1 pyhd3eb1b0_0
widgetsnbextension 3.5.1 py38h50d1736_4 conda-forge
x264 1!157.20191217 h1de35cc_0
xz 5.2.5 h1de35cc_0
yaml 0.2.5 haf1e3a3_0
zeromq 4.3.3 hb1e8313_3
zipp 3.4.0 py_0 conda-forge
zlib 1.2.11 h1de35cc_3
zstd 1.4.5 h41d2c2f_0

Hardware

MacBook Pro (2021)
Apple M1 Max
32 GB memory

@mwcvitkovic mwcvitkovic changed the title Snapshotting kills kernel Snapshotting kills jupyter kernel Jan 31, 2022
@FabioLuporini
Copy link
Contributor

FabioLuporini commented Jan 31, 2022

This is a segfault due to an out-of-bounds (OOB) access to psave

The fact that you'll go OOB with the given arguments should have been detected in the Python layer by Devito, and a suitable exception should have been raised, rather than letting the kernel die in that horrible way once in C-land. However we aren't unfortunately doing this kind of checks, yet. This method probably needs to be overridden in ConditionalDimension (external contributions always welcome!). There should be an open issue about this.

Anyway, explanation.

Take nsnaps=54.

You then that

In [1]: psave.shape
Out[1]: (54, 100, 100)

You will save one snapshot every factor time iterations. Also, the computation starts at time_m=1 and runs until time_M=500. For confirmation:

In [3]: op.arguments(time=num_timesteps, dt=1e-2)['time_m']
Out[3]: 1

In [4]: op.arguments(time=num_timesteps, dt=1e-2)['time_M']
Out[4]: 500

note that op.apply(time=X ...) is syntactically equivalent to op.apply(time_M=X ...), and in your case X=num_timesteps=500.

And you see that 500/factor = 500/9 = 55.55...

So before reaching the end of the computation, you'll attempt to write to e.g. psave[498 / 9][...] = psave[55] ... that is an OOB access, which causes a segfault

Hope this helps.

I think I'll add an FAQ and keep the issue open until this behaviour gets properly caught in Python

@FabioLuporini
Copy link
Contributor

Updated FAQ, see here: https://github.com/devitocodes/devito/wiki/FAQ#why-does-my-operator-kernel-die-suddenly

I'm updating this issue's title accordingly

@FabioLuporini FabioLuporini changed the title Snapshotting kills jupyter kernel ConditionalDimension leads to kernel segfault Jan 31, 2022
@mwcvitkovic
Copy link
Author

Understood - thanks!

@mwcvitkovic
Copy link
Author

Not sure your preferences for closing issues vs. leaving them open, so feel free to close it if you want. I'm good on my end.

@mloubout
Copy link
Contributor

If you are happy with the resolution, closing it yourself is fine. Glad your issue is answered.

@FabioLuporini
Copy link
Contributor

Actually, I suggest to leave it open, or open a new one that describes the underlying issue, as I thought there was one already, but I can't find it, so I guess I was wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants