-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Bug in Hybridize w/ Concat #14062
Comments
Hey, this is the MXNet Label Bot. |
@mxnet-label-bot add [bug, gluon] |
@stephenrawls after taking a close look at the code, I think the problem mainly comes from the call on slice. Its doc says:
Since in the code snippet, it has However, since the code did't check for it, NDArray actually returned result as in MXNet the dimension value 0 is used as a placeholder value for unknown size, which in symbolic mode (i.e. Gluon hybridized mode) would trigger further shape inference, which properly triggers the shape inference error. I haven't yet traced what happened in slice after getting 0 in size in the slice kernel, though that might need fix too. |
I see, thanks. Basically what I wanted to do is this:
In this case I think I can use In general, it would be nice if dynamic shapes were supported and I didn't have to find "tricks" to get an array of the right shape. But this works for now. Thanks again! |
Hi,
I think I found a bug in the Concat operator after calling hybridize() on my network. It is possible it is a mis-understanding on my part about how hybridization is supposed to work, however, so please let me know if that is the case.
I have the following code (which works when
model.hybridize()
is commented out, and crashes when it is not):The expected behavior, which occurs when I do not hybridize, is the code prints out the following:
The behavior that occurs when I do hybridize is a crash with the following error message:
I was very careful inside my hybrid_forward function to restrict myself to dynamic shaping operators like
F.ones_like()
,slice()
and using the special values forreshape()
like 0 and -1, so that my code should work with arbitrary input shapes using the Symbolic api. So it is my understanding that hybridize() should work on this code, since I didn't do anything that depended on either the ndarray api, or hard-coded shape information.Can you please let me know if this is a bug, or if I have some fundamental mis-understanding of how hybridize() is supposed to work?
(As to why I would want to use the code above ... I have a use case where it is more efficient to construct a broadcast-able mask once, and then use it many times inside a loop, versus calling SequenceMask() repeatedly in the loop, especially because SequenceMask only works when the timestep dimension is on axis 0 or axis 1, which with my data layout would mean I would need a lot of transposing back and forth inside a loop, which I would rather avoid).
The text was updated successfully, but these errors were encountered: