Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Suggestion: Catch exceptions in CudaGridExecutioner#flushQueue and rethrow with help message #6493
Sorry for confusing title, here is the story:
It seems like CudaGridExecutioner sometimes queues up operations which to be executed at some later point in time. This might (and sometimes do) cause exceptions for "past faults" to be thrown when doing some later operation, making it hard to understand what the origin of the fault is.
For instance, an exception for trying to access an element out of bounds could be thrown when calling Nd4j.create far from where the fault was made (resulting in the Nd4j.create call being blamed for something like "Op.x.length not equals to Op.y.length").
Inspecting the stacktrace might raise the user's suspicions that there error was delayed, but it seems like a low hanging fruit to just put a try-catch block around the execution of the queued op and have the catch rethrow with a clear message to the user that the exception might been delayed, preferably with some instructions on what to do should that be the case (e.g. try with native CPU backend).
Here is one which triggers 100% on my setup on 1.0.0-beta2. It fails at a different place compared to what I saw in the real program, but it seems to be doing roughly the same thing (dequeuing an op and then executing it).
I haven't dug into the internals but in case the queueing of execution is due to some low level stuff I guess it could behave differently on another setup. I'm using windows 10 with a GTX 980 ti and driver version 398.36.
Btw, is the class javadoc for CudaGridExecutioner really correct? Not that I'm running production code, but is there a choice besides CPU backend for those who do?