New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Corruption? when trying to use external errors with ComputationGraph #4539
Comments
What's workspaceMode used? |
I did not explicitly set one. Could that already be the source of the problem? |
No, it shouldn't be. What NAN_PANIC mode says? |
I'd say this is very likely workspaces related. The code is obviously from 0.9.1. I ran this on current master.
This has changed on master... now they are cleared, which results in the following error:
I'll take a look at this today. |
According to the example on external error usage, I would guess that they shouldn't actually be cleared. Running it with NAN_PANIC it panics on the backpropGradient call (still on 0.9.1):
So, another pointer to it being related to workspaces there. |
@AlexDBlack I set both workspaces to NONE and that seems to be a workaround. "seems", because the error occurs randomly and I am not 100% sure that it's gone. Maybe I just lowered the probability. Yet, another pointer to the workspaces. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Issue Description
I've been trying to find out why the following sometimes results in NaNs in the gradient:
https://gist.github.com/Broele/22b5f7e9bde28a8ca4b58c41ddd343e3
The example is pretty non-sensical, as in: it doesn't do anything useful other than reproducing the bug. I've asked @Broele to reduce his problematic code as far as possible such that it still has the problematic behavior.
While stepping through the code I've found one weird thing:
silentOutput
ignores thetrain
flag, when callingfeedForward
:https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/ComputationGraph.java#L1631
I've found this, as I realized that the inputs to each Vertex / Layer aren't reset when calling
And most of the time when the gradient ended up with NaNs, the DenseLayer input was also already corrupted and contained either very large numbers or also NaNs. However, whatever I've tried, I couldn't work out at which point that corruption happens. During the run of
feedForward
, neither the output nor the input are corrupted at this point, however, as soon as it has finished, it seems to get corrupted.My best guess is that it is somehow related to Workspaces.
I could workaround the problem, by adding a "public api"
feedForward
before runningbackpropGradient
:This results in valid gradient, even over many runs.
@raver119, do you have an Idea what is going on there?
Version Information
Please indicate relevant versions, including, if relevant:
The text was updated successfully, but these errors were encountered: