Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging "Exception in thread" from worker thread #827

Closed
addisonklinke opened this issue Apr 26, 2019 · 12 comments
Closed

Debugging "Exception in thread" from worker thread #827

addisonklinke opened this issue Apr 26, 2019 · 12 comments
Labels
question Further information is requested

Comments

@addisonklinke
Copy link

Sometimes my pipeline object throws an exception #### Exception in thread from line 182 of worker_thread.h where #### is the thread ID. No other information is provided. Based on the try-catch block in the code, this would be something other than a runtime error from the work() function. However, without additional information I am having difficulty pinning down the source of the error.

Do you have any suggestions for viewing additional details of the stack trace? I have tried gdb but once the exception is raised, it shows that the thread has been killed. The program does not respond to any signals (i.e. SIGINT, SIGTERM, etc.) except for SIGKILL.

I am running DALI installed from the wheel file for v0.8.0, so a method that does not involve modifying the C++ code and compiling from scratch would be preferable, if possible

@JanuszL JanuszL added the question Further information is requested label Apr 26, 2019
@JanuszL
Copy link
Contributor

JanuszL commented Apr 26, 2019

Hi,
You can try to issue catch throw in the GDB then you should break whenever it is thrown and your worked is still alive.
There is no more information as the only std::runtime_error thrown by DALI has stack trace embedded. If this is something else than that we cannot capture the stack trace when the exception is thrown (it could be even thrown by some external lib as well).

@addisonklinke
Copy link
Author

@JanuszL I was able to break before the worker died and captured the following stack trace

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7805801 in __GI_abort () at abort.c:79
#2  0x00007ffff07fe3df in __gnu_cxx::__verbose_terminate_handler ()
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff07fcb16 in __cxxabiv1::__terminate (handler=<optimized out>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007ffff07fcb4c in std::terminate ()
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007ffff07fcd28 in __cxxabiv1::__cxa_throw (obj=0x7fff48018a80, tinfo=0x7ffff0891ab0 <typeinfo for std::bad_alloc>, 
    dest=0x7ffff07fb7c0 <std::bad_alloc::~bad_alloc()>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007fff9cf438b5 in std::unique_ptr<char, dali::kernels::memory::Deleter> dali::kernels::memory::alloc_unique<char>(dali::kernels::AllocType, unsigned long) [clone .part.55] () from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#7  0x00007fff9d23705e in dali::kernels::ScratchpadAllocator::Reserve(dali::kernels::AllocType, unsigned long) ()
   from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#8  0x00007fff9d236082 in dali::ResizeBase::RunGPU(dali::TensorList<dali::GPUBackend>&, dali::TensorList<dali::GPUBackend> const&, CUstream_st*)
    () from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#9  0x00007fff9d3002d6 in dali::Resize<dali::GPUBackend>::RunImpl(dali::DeviceWorkspace*, int) ()
   from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#10 0x00007fff9d004042 in dali::Operator<dali::GPUBackend>::Run(dali::DeviceWorkspace*) ()
   from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#11 0x00007fff7fccbb7c in dali::BbResize<dali::GPUBackend>::RunImpl (this=0x5555abdd8e70, ws=0x5555abde4a00, idx=0)
    at /storage/projects/alpr/modules/alprdali/operators/bb_resize.cu:94
#12 0x00007fff7fcbc26a in dali::Operator<dali::GPUBackend>::Run (this=0x5555abdd8e70, ws=0x5555abde4a00)
    at /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/include/dali/pipeline/operators/operator.h:192
#13 0x00007fff9d0e25fd in dali::Executor<dali::AOT_WS_Policy<dali::UniformQueuePolicy>, dali::UniformQueuePolicy>::RunGPU() ()
   from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#14 0x00007fff9d0ebd85 in std::_Function_handler<void (), dali::AsyncPipelinedExecutor::RunGPU()::{lambda()#1}>::_M_invoke(std::_Any_data const&)
    () from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#15 0x00007fff9d1058f3 in dali::WorkerThread::ThreadMain(int, bool) ()
   from /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/libdali.so
#16 0x00007ffff0818678 in std::execute_native_thread_routine_compat (__p=<optimized out>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:94
#17 0x00007ffff7bbd6db in start_thread (arg=0x7fff5e7a0700) at pthread_create.c:463
#18 0x00007ffff78e688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Is there a way to set CMAKE_BUILD_TYPE=Debug when using the docker build approach described in the README?

@JanuszL
Copy link
Contributor

JanuszL commented Apr 26, 2019

You need to modify the Docker file in https://github.com/NVIDIA/DALI/blob/master/Dockerfile#L55 by adding this flag to CMake invocation.
It looks like you just cannot allocate scratch pad memory for the resize operator. Can you tell us what is your GPU memory utilization? What commit you are using as a base (we lowered the memory pressure for this scratch buffer some time ago)? How big batch you are trying to process?
@mzient - any idea?

@addisonklinke
Copy link
Author

@JanuszL Thank you for the cmake note

My GPU memory utilization reaches a steady state around 5750/6078 MB for a batch size of 24. When trying larger batches, I used to get a CUDA out of memory error, but have not seen that occur with the batch size at 24. I used the Docker build from commit 9a0eff3

If it makes a difference, the resize operator is called from a custom operator that instantiates it using the approach suggested by @jantonguirao in this thread

std::unique_ptr<OperatorBase> resize_ptr = InstantiateOperator(
    OpSpec("Resize")
        .AddArgumentInput("resize_x", "resize_x")
        .AddArgumentInput("resize_y", "resize_y")
        .AddArg("device", "gpu")
        .AddArg("num_threads", spec_.GetArgument<int>("num_threads"))
        .AddArg("batch_size", spec_.GetArgument<int>("batch_size")));
resize_ptr->Run(ws);

@JanuszL
Copy link
Contributor

JanuszL commented Apr 29, 2019

Hi,
Just recently we fixed problem with context creation on GPU0 no matter which GPU DALI is on 4cbe6a5.
You can also play with the minibatch_size option of the resize operator. @mzient?

@addisonklinke
Copy link
Author

addisonklinke commented May 1, 2019

@JanuszL I recompiled from commit b72d83c (two after the one you mentioned, most recent at the time). The std::bad_alloc still persists, see the backtrace from gdb below

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Thread 19 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff4ffff700 (LWP 15851)]

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7805801 in __GI_abort () at abort.c:79
#2  0x00007fffefed93df in __gnu_cxx::__verbose_terminate_handler ()
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007fffefed7b16 in __cxxabiv1::__terminate (handler=<optimized out>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007fffefed7b4c in std::terminate ()
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007fffefed7d28 in __cxxabiv1::__cxa_throw (obj=0x7fff3c018b10, tinfo=0x7fffeff6cab0 <typeinfo for std::bad_alloc>, 
    dest=0x7fffefed67c0 <std::bad_alloc::~bad_alloc()>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007fff850c7bc6 in dali::kernels::memory::alloc_unique<char> (type=dali::kernels::AllocType::GPU, count=270622464)
    at /opt/dali/dali/kernels/alloc.h:57
#7  0x00007fff850c6a48 in dali::kernels::ScratchpadAllocator::Reserve (this=0x5555b51234c0, type=dali::kernels::AllocType::GPU, size=246020352)
    at /opt/dali/dali/kernels/scratch.h:152
#8  0x00007fff850c68a3 in dali::kernels::ScratchpadAllocator::Reserve (this=0x5555b51234c0, sizes=...) at /opt/dali/dali/kernels/scratch.h:131
#9  0x00007fff850c4ad7 in dali::ResizeBase::RunGPU (this=0x7fff3c026b20, output=..., input=..., stream=0x5555ab240200)
    at /opt/dali/dali/pipeline/operators/resize/resize_base.cc:146
#10 0x00007fff8516fcf2 in dali::Resize<dali::GPUBackend>::RunImpl (this=0x7fff3c026830, ws=0x5555ab242db0, idx=0)
    at /opt/dali/dali/pipeline/operators/resize/resize.cu:63
#11 0x00007fff84e5c3e0 in dali::Operator<dali::GPUBackend>::Run (this=0x7fff3c026830, ws=0x5555ab242db0)
    at /opt/dali/dali/pipeline/operators/operator.h:192
#12 0x00007fff775886d4 in dali::BbResize<dali::GPUBackend>::RunImpl (this=0x5555ab233860, ws=0x5555ab242db0, idx=0)
    at /storage/projects/alpr/modules/alprdali/operators/bb_resize.cu:80
#13 0x00007fff77579194 in dali::Operator<dali::GPUBackend>::Run (this=0x5555ab233860, ws=0x5555ab242db0)
    at /home/addison/miniconda3/envs/openalpr/lib/python3.6/site-packages/nvidia/dali/include/dali/pipeline/operators/operator.h:192
#14 0x00007fff84f23c4d in dali::Executor<dali::AOT_WS_Policy<dali::UniformQueuePolicy>, dali::UniformQueuePolicy>::RunGPU (this=0x5555a67bcd10)
    at /opt/dali/dali/pipeline/executor/executor.h:419
#15 0x00007fff84f16d51 in dali::AsyncPipelinedExecutor::<lambda()>::operator()(void) const (__closure=0x7fff3c01f200)
    at /opt/dali/dali/pipeline/executor/async_pipelined_executor.cc:87
#16 0x00007fff84f1729d in std::_Function_handler<void(), dali::AsyncPipelinedExecutor::RunGPU()::<lambda()> >::_M_invoke(const std::_Any_data &)
    (__functor=...) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039
#17 0x00007fff84f75a44 in dali::WorkerThread::ThreadMain (this=0x5555a67bd710, device_id=0, set_affinity=false)
    at /opt/dali/dali/pipeline/util/worker_thread.h:173
#18 0x00007fff84f7f14e in std::_Mem_fn<void (dali::WorkerThread::*)(int, bool)>::operator()<int, bool, void> (this=0x5555a6703978, 
    __object=0x5555a67bd710) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:569
#19 0x00007fff84f7f069 in std::_Bind_simple<std::_Mem_fn<void (dali::WorkerThread::*)(int, bool)> (dali::WorkerThread*, int, bool)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x5555a6703968) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:1700
#20 0x00007fff84f7eef3 in std::_Bind_simple<std::_Mem_fn<void (dali::WorkerThread::*)(int, bool)> (dali::WorkerThread*, int, bool)>::operator()()
    (this=0x5555a6703968) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:1688
#21 0x00007fff84f7ee70 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (dali::WorkerThread::*)(int, bool)> (dali::WorkerThread*, int, bool)> >::_M_run() (this=0x5555a6703950) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/thread:115
#22 0x00007fffefef3678 in std::execute_native_thread_routine_compat (__p=<optimized out>)
    at /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:94
#23 0x00007ffff7bbd6db in start_thread (arg=0x7fff4ffff700) at pthread_create.c:463
#24 0x00007ffff78e688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Note: Before compiling with Docker, I purposefully commented out lines 181-187 of worker_thread.h so that the catch (...) block is not executed and the thrown error is caught by gdb instead

@JanuszL
Copy link
Contributor

JanuszL commented May 6, 2019

@mzient - any idea how to debug further?

@mzient
Copy link
Contributor

mzient commented May 6, 2019

@addisonklinke You can type catch throw in gdb to break on any exception as soon as it's thrown (before processing any catch handlers). Latest DALI shouldn't throw any exceptions in normal execution flow.
For now, you can reduce the amount of memory required by resampling by specifying minibatch_size to something smaller than your batch size - or try #847, which fixes an issue with excessive peak memory consumption in PreallocatedScratchpad::Reserve.

@addisonklinke
Copy link
Author

@JanuszL @mzient I have been able to avoid the exception by reducing memory usage (lower batch size). Poking around in gdb showed the allocator trying to get more room than the GPU had available. Is there a reason this couldn't throw a more descriptive CUDA out of memory error? That would have made the debugging much more straightforward

@JanuszL
Copy link
Contributor

JanuszL commented May 10, 2019

@addisonklinke - you are right. @mzient - maybe we can get rid of throwing quite generic bad_alloc from alloc.h, and more the throwing to alloc.cc where we can get more detailed status from CUDA allocators?

@mzient
Copy link
Contributor

mzient commented May 10, 2019

@addisonklinke You can try latest master nonetheless - now we free old buffer before trying to allocate a new one, so maybe you may be able to run on a larger batch.
@JanuszL It's on my agenda (although not immediately) - error handling needs some cleanup anyway.

@JanuszL
Copy link
Contributor

JanuszL commented May 14, 2019

#867 should address this

@JanuszL JanuszL closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants