-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Memory layout in the LSTM operator #15745
Comments
Hey, this is the MXNet Label Bot. |
could you take a look for CPU implementation as well? @zixuanweeei can help you to understand from the formula? |
@eloi-loomai The memory layout of weights is:
So it should be |
Ok, thanks! |
I also realized that the order of gates in the planes is: |
Is there any problem with the order? The native LSTM implementation of MXNet shares the same order of gates with that of MKL-DNN, but differs in the number of bias. And the gates order of their GRU implementations are different, which might be concerned. |
Not sure, LSTM seems to work. |
Feel free to directly mention me here if there is any question 😃. BTW, we are working on integrating the LBR-GRU of MKL-DNN into MXNet. It will be completed in these days. |
Description
Suspicious bug in the LSTM RNN operator
Environment info (Required)
Package used (Python/R/Scala/Julia):
C++
Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio): clang
MXNet commit hash:
24cce9e3c99e499b696b779cbb3b863145f473f1
Build config:
Error Message:
This line looks wrong
https://github.com/apache/incubator-mxnet/blob/24cce9e3c99e499b696b779cbb3b863145f473f1/src/operator/rnn.cc#L320
DType* bias_n = weight_iter_n + L * H * ngates * H;
Shouldn't it be:
DType* bias_n = weight_iter_n + L * ngates * H;
Just trying to understand the memory order of the weights.
The text was updated successfully, but these errors were encountered: