Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-level profiler issue on OSX #2608

Open
pentschev opened this issue Apr 11, 2019 · 8 comments
Open

Low-level profiler issue on OSX #2608

pentschev opened this issue Apr 11, 2019 · 8 comments

Comments

@pentschev
Copy link
Member

pentschev commented Apr 11, 2019

@mrocklin encountered the following issue on OSX when attempting to use the low-level profiler:

In [1]: from dask.distributed import Client

In [2]: client = Client(low_level_profiler=True)

In [3]: import dask.array as da

In [4]: x = da.random.random((10000, 10000), chunks=(1000, 1000))

In [5]: (x + x.T - x.mean(axis=0)).sum().compute()
libunwind: malformed __unwind_info at 0x116CF648C bad second level page  # <--- here
Out[5]: 50003173.58741013

The error message comes from LLVM's libunwind, this is possible to fail also on Linux if we attempt using LLVM.

It's not clear at the moment if this is a libunwind bug, stacktrace or from Dask distributed.

@mrocklin
Copy link
Member

@sklam do you have any thoughts on how best to proceed with this?

@sklam
Copy link

sklam commented Apr 11, 2019

I believe the error comes from https://github.com/llvm/llvm-project/blob/release/4.x/libunwind/src/UnwindCursor.hpp#L1121 and it should be harmless. We can check if there's a way to turn off the debug log from libunwind. It's also possible to try newer libunwind version. Currently, we are getting libunwind=4.0.1 from conda's defaults.

In additional, stacktrace is guarding any segmentation fault from libunwind due to misinterpretion of the unwind info. See https://github.com/numba/stacktrace/blob/e472bb124f266a64568326756033f23b5a6205b6/stacktrace/bt.c#L166-L167. So, even if libunwind fails to read the unwind info, stacktrace will not crash. It will just return truncated stack info.

@sklam
Copy link

sklam commented Apr 11, 2019

I cannot replicate the log message. Can you share your conda-environment listing? conda list --export --explicit

@pentschev
Copy link
Member Author

Sure, here's mine:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
@EXPLICIT
https://repo.anaconda.com/pkgs/main/osx-64/blas-1.0-mkl.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/bzip2-1.0.6-h1de35cc_5.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/ca-certificates-2019.1.23-0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/intel-openmp-2019.3-199.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/jpeg-9b-he5867d9_2.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libcxxabi-4.0.1-hcfea43d_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libgfortran-3.0.1-h93005f0_2.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/xz-5.2.4-h1de35cc_4.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/yaml-0.1.7-hc338f04_2.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/zlib-1.2.11-h1de35cc_3.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libcxx-4.0.1-hcfea43d_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libpng-1.6.36-ha441bb4_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/mkl-2019.3-199.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/openssl-1.1.1b-h1de35cc_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/tk-8.6.8-ha441bb4_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/zstd-1.3.7-h5bba6e5_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/expat-2.2.6-h0a44026_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/freetype-2.9.1-hb4e5f40_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libffi-3.2.1-h475c297_4.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libssh2-1.8.0-ha12b0ac_4.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libtiff-4.0.10-hcb84e12_2.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libunwind-4.0.1-hcfea43d_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/ncurses-6.1-h0a44026_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/rhash-1.3.8-ha12b0ac_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libedit-3.1.20181209-hb402a30_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/readline-7.0-h1de35cc_5.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/krb5-1.16.1-hddcf347_7.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/sqlite-3.27.2-ha441bb4_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/libcurl-7.64.0-h051b688_2.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/python-3.7.3-h359304d_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/atomicwrites-1.3.0-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/attrs-19.1.0-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/certifi-2019.3.9-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/click-7.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/cloudpickle-0.8.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/cmake-3.14.0-haff7e42_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/dask-core-1.1.5-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/heapdict-1.0.0-py37_2.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/llvmlite-0.28.0-py37h8c7ce04_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/locket-0.2.0-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/markupsafe-1.1.1-py37h1de35cc_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/more-itertools-6.0.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/msgpack-python-0.6.1-py37h04f5b5a_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/olefile-0.46-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pluggy-0.9.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/psutil-5.6.1-py37h1de35cc_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/py-1.8.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pyparsing-2.3.1-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pytz-2018.9-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pyyaml-5.1-py37h1de35cc_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/six-1.12.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/sortedcontainers-2.1.0-py37_0.tar.bz2
https://conda.anaconda.org/numba/osx-64/stacktrace-0.1dev0-py37h1de35cc_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/tblib-1.3.2-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/toolz-0.9.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/tornado-6.0.2-py37h1de35cc_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/cytoolz-0.9.0.1-py37h1de35cc_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/packaging-19.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/partd-0.3.10-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pillow-5.4.1-py37hb68e598_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/python-dateutil-2.8.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/setuptools-40.8.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/zict-0.1.4-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/distributed-1.26.1-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/jinja2-2.10-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pytest-4.3.1-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/wheel-0.33.1-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pip-19.0.3-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/bokeh-1.1.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/mkl_fft-1.0.10-py37h5e564d8_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/mkl_random-1.0.2-py37h27c97d8_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/numba-0.43.1-py37h6440ff4_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/numpy-1.16.2-py37hacdab7b_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/numpy-base-1.16.2-py37h6575580_0.tar.bz2
https://repo.anaconda.com/pkgs/main/osx-64/pandas-0.24.2-py37h0a44026_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/dask-1.1.5-py_0.tar.bz2

@pentschev
Copy link
Member Author

I also noticed that this issue is not deterministic, it's more likely to get it after a few (~5) iterations.

This issue may be harmless as you said @sklam, but it always freezes for me after some iterations of the (x + x.T - x.mean(axis=0)).sum().compute() line from @mrocklin's example.

@pentschev
Copy link
Member Author

I've just compiled libunwind from llvm's master branch and then installed stacktrace linking to that, but I still get the same issues:

libunwind: malformed __unwind_info at 0x1045ABE88 bad second level page
49994648.18507994
libunwind: malformed __unwind_info at 0x7FFF78135240 bad second level page
49994648.18507994
libunwind: malformed __unwind_info at 0x7FFF78135240 bad second level page
49994648.18507994
49994648.18507994
49994648.18507994
49994648.18507994
49994648.18507994
49994648.18507994
49994648.18507994
49994648.18507994
49994648.18507994
libunwind: malformed __unwind_info at 0x1045ABE88 bad second level page
49994648.18507994
49994648.18507994
49994648.18507994
libunwind: malformed __unwind_info at 0x7FFF78135240 bad second level page

The process got stuck at this point, it was supposed to run 100 iterations, but froze after about 15.

I decided to attach lldb to the process, this is what I see immediately:

(lldb) process attach --pid 6843



_bt.cpython-37m-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 6843 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x0000000114b54ab0 _bt.cpython-37m-darwin.so`backtrace_thread [inlined] _wait_and_reset_signal at bt.c:182:35 [opt]
   179 	static
   180 	void _wait_and_reset_signal(struct sigaction *old_sa) {
   181 	  // spin and wait.
-> 182 	  while (TheSignalConfig.spinlock == 0);
   183 	  // reset signal handlers
   184 	  sigaction(SIG_STACKTRACE, old_sa, NULL);
   185 	}
Target 0: (python) stopped.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000114b54ab0 _bt.cpython-37m-darwin.so`backtrace_thread [inlined] _wait_and_reset_signal at bt.c:182:35 [opt]
    frame #1: 0x0000000114b54aab _bt.cpython-37m-darwin.so`backtrace_thread(tid=0x00007000054dd000, cb=(0x0000000103022fc0), maxdepth=100) at bt.c:215 [opt]
    frame #2: 0x000000010301e884 libffi.6.dylib`ffi_call_unix64 + 76
    frame #3: 0x000000010301de8b libffi.6.dylib`ffi_call + 939
    frame #4: 0x0000000103002c48 _ctypes.cpython-37m-darwin.so`_ctypes_callproc + 632
    frame #5: 0x0000000102ffce9c _ctypes.cpython-37m-darwin.so`PyCFuncPtr_call + 268
    frame #6: 0x0000000101cabae3 python`_PyObject_FastCallKeywords + 179
    frame #7: 0x0000000101de8ed5 python`call_function + 453
    frame #8: 0x0000000101de6aec python`_PyEval_EvalFrameDefault + 46092
    frame #9: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #10: 0x0000000101cabfe3 python`_PyFunction_FastCallKeywords + 195
    frame #11: 0x0000000101de8dc7 python`call_function + 183
    frame #12: 0x0000000101de6be0 python`_PyEval_EvalFrameDefault + 46336
    frame #13: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #14: 0x0000000101de8dc7 python`call_function + 183
    frame #15: 0x0000000101de6aec python`_PyEval_EvalFrameDefault + 46092
    frame #16: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #17: 0x0000000101cabfe3 python`_PyFunction_FastCallKeywords + 195
    frame #18: 0x0000000101de8dc7 python`call_function + 183
    frame #19: 0x0000000101de6b27 python`_PyEval_EvalFrameDefault + 46151
    frame #20: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #21: 0x0000000101cabfe3 python`_PyFunction_FastCallKeywords + 195
    frame #22: 0x0000000101de8dc7 python`call_function + 183
    frame #23: 0x0000000101de6aec python`_PyEval_EvalFrameDefault + 46092
    frame #24: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #25: 0x0000000101caece2 python`method_call + 130
    frame #26: 0x0000000101cac752 python`PyObject_Call + 130
    frame #27: 0x0000000101ea937d python`partial_call + 285
    frame #28: 0x0000000101cabae3 python`_PyObject_FastCallKeywords + 179
    frame #29: 0x0000000101de8ed5 python`call_function + 453
    frame #30: 0x0000000101de6b27 python`_PyEval_EvalFrameDefault + 46151
    frame #31: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #32: 0x0000000101caece2 python`method_call + 130
    frame #33: 0x0000000101cabae3 python`_PyObject_FastCallKeywords + 179
    frame #34: 0x0000000101e353aa python`context_run + 106
    frame #35: 0x0000000101cab27f python`_PyMethodDef_RawFastCallDict + 335
    frame #36: 0x0000000101cac8fd python`PyCFunction_Call + 61
    frame #37: 0x0000000101de6cd7 python`_PyEval_EvalFrameDefault + 46583
    frame #38: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #39: 0x0000000101de8dc7 python`call_function + 183
    frame #40: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #41: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #42: 0x0000000101de8dc7 python`call_function + 183
    frame #43: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #44: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #45: 0x0000000101de8dc7 python`call_function + 183
    frame #46: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #47: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #48: 0x0000000101de8dc7 python`call_function + 183
    frame #49: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #50: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #51: 0x0000000101cabfe3 python`_PyFunction_FastCallKeywords + 195
    frame #52: 0x0000000101de8dc7 python`call_function + 183
    frame #53: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #54: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #55: 0x0000000101caade7 python`_PyFunction_FastCallDict + 231
    frame #56: 0x0000000101caece2 python`method_call + 130
    frame #57: 0x0000000101cac752 python`PyObject_Call + 130
    frame #58: 0x0000000101de6d58 python`_PyEval_EvalFrameDefault + 46712
    frame #59: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #60: 0x0000000101caece2 python`method_call + 130
    frame #61: 0x0000000101cac752 python`PyObject_Call + 130
    frame #62: 0x0000000101de6d58 python`_PyEval_EvalFrameDefault + 46712
    frame #63: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #64: 0x0000000101de8dc7 python`call_function + 183
    frame #65: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #66: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #67: 0x0000000101de8dc7 python`call_function + 183
    frame #68: 0x0000000101de6a56 python`_PyEval_EvalFrameDefault + 45942
    frame #69: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #70: 0x0000000101de8dc7 python`call_function + 183
    frame #71: 0x0000000101de6aec python`_PyEval_EvalFrameDefault + 46092
    frame #72: 0x0000000101cab8d5 python`function_code_fastcall + 117
    frame #73: 0x0000000101de8dc7 python`call_function + 183
    frame #74: 0x0000000101de6b27 python`_PyEval_EvalFrameDefault + 46151
    frame #75: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #76: 0x0000000101caade7 python`_PyFunction_FastCallDict + 231
    frame #77: 0x0000000101de6d58 python`_PyEval_EvalFrameDefault + 46712
    frame #78: 0x0000000101dda49e python`_PyEval_EvalCodeWithName + 414
    frame #79: 0x0000000101e40699 python`PyRun_StringFlags + 185
    frame #80: 0x0000000101e4052f python`PyRun_SimpleStringFlags + 143
    frame #81: 0x0000000101e6a2c8 python`pymain_main + 6984
    frame #82: 0x0000000101c7e66d python`main + 125
    frame #83: 0x00007fff77f4c3d5 libdyld.dylib`start + 1
    frame #84: 0x00007fff77f4c3d5 libdyld.dylib`start + 1

@sklam
Copy link

sklam commented Apr 11, 2019

If it is freezing, it is bad.

Were there other threads running. The mainthread is waiting (spinlocked) for the "profiled" thread to finish.

@pentschev
Copy link
Member Author

Yes, there are other threads. Please see https://gist.github.com/pentschev/0e13fa1fef01565c1c63101db1e22d68. I think the only threads running some stacktrace code at that moment are threads 1, 4 and 6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants