-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging "Exception in thread" from worker thread #827
Comments
Hi, |
@JanuszL I was able to break before the worker died and captured the following stack trace
Is there a way to set |
You need to modify the Docker file in https://github.com/NVIDIA/DALI/blob/master/Dockerfile#L55 by adding this flag to CMake invocation. |
@JanuszL Thank you for the cmake note My GPU memory utilization reaches a steady state around 5750/6078 MB for a batch size of 24. When trying larger batches, I used to get a CUDA out of memory error, but have not seen that occur with the batch size at 24. I used the Docker build from commit 9a0eff3 If it makes a difference, the resize operator is called from a custom operator that instantiates it using the approach suggested by @jantonguirao in this thread
|
@JanuszL I recompiled from commit b72d83c (two after the one you mentioned, most recent at the time). The
Note: Before compiling with Docker, I purposefully commented out lines 181-187 of worker_thread.h so that the |
@mzient - any idea how to debug further? |
@addisonklinke You can type |
@JanuszL @mzient I have been able to avoid the exception by reducing memory usage (lower batch size). Poking around in |
@addisonklinke - you are right. @mzient - maybe we can get rid of throwing quite generic |
@addisonklinke You can try latest |
#867 should address this |
Sometimes my pipeline object throws an exception
#### Exception in thread
from line 182 of worker_thread.h where####
is the thread ID. No other information is provided. Based on the try-catch block in the code, this would be something other than a runtime error from thework()
function. However, without additional information I am having difficulty pinning down the source of the error.Do you have any suggestions for viewing additional details of the stack trace? I have tried
gdb
but once the exception is raised, it shows that the thread has been killed. The program does not respond to any signals (i.e.SIGINT
,SIGTERM
, etc.) except forSIGKILL
.I am running DALI installed from the wheel file for v0.8.0, so a method that does not involve modifying the C++ code and compiling from scratch would be preferable, if possible
The text was updated successfully, but these errors were encountered: