-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Hybridization SequenceLast Bug #13968
Comments
Hey, this is the MXNet Label Bot. |
@mxnet-label-bot add [Gluon, operator] |
@stephenrawls thanks for reporting the bug. If it's hard to reproduce the error, you may consider setting this environment variable: |
@szha thanks for the tip. The stack trace is revealing the error happens in the backward pass at a specific line checking that a certain array is non-empty, which is confirmed by our work around of detaching the problematic variable from the computation graph. (And since the variable in question shouldn't have any gradient flowing through it anyway, this does not effect our calculations and is an effective work around). But I'll play around with MXNET_ENGINE_TYPE and see if I can get a more reproducible error. |
@stephenrawls Can you provide a minimum reproducible example so that we can help nail down this issue ? |
Hi @stephenrawls, Were you able to get a reproducible example for this error ? |
No I wasn't, and our code base has moved on, so I'm closing the issue. Thanks. |
Description
I am experiencing a crash related to hybridization and the SequenceLast operator. It is currently consistently crashing for me, but others in team haven't been able to reproduce.
Environment info (Required)
I'm using Python
Unfortunately I don't have a minimal reproducible example yet. Every time I try, the bug does not occur in my condensed example.
The problem I have is that I am seeing the following error message:
When I print out the symbolic debug string for my network and go to "node_264" it leads me to the following section of my model code:
The things that I think are important about this are:
(1) The tags_t array is an integer sequence that we generate from ground truth data. So this SequenceLast call shouldn't be trying to flow any gradients back through it. (Note that the stack trace is in the backward pass)
(2) The model we run is a HybridBlock, but this particular code is part of our loss function which is not a hybrid block. It takes the output from the model, and combines it with the ground truth values from this SequenceLast operation, and calculates our loss
(3) Everything worked fine when our model was a regular gluon Bock. It works fine as a HybridBlock as long as we don't hybridize() it. But when I do hybridize() the model, I am consistently seeing this crash. (Although other team members have yet to run into it).
We currently have a work around to add the following statement directly after the SequenceLast operation:
When this is in place I never encounter the crash I highlighted above.
Additionally I have another hackier work-around of not using SequenceLast, but instead using a combination of
Take
anddiag
to perform the same function. When I do that I also do not ever see the crash.I'm happy to provide further info as needed, apologies for not being able to condense down into an actual reproducible crash.
The text was updated successfully, but these errors were encountered: