Conversation
|
Thank you for this. Please add your name into https://github.com/dmlc/mxnet/blob/master/CONTRIBUTORS.md if you haven't. |
|
Plz think twice before submitting PR Using the low-level APIs to construct LSTM is not your original work, though you replaced |
|
@Puriney I think I have mentioned that "according to the implementation of rnn in python". I saw the issue #1420 and rewrote this Python implementation in R as @terrytangyuan suggested. I don't think there are any credit problems about my codes. Firstly, I don't notice @FBracun's code before you mentioned that. I implemented this example on my own. Secondly, the codes on Issue #837 is not complete, while my example works fine in the my own testing. Thirdly, since @FBracun and I implement LSTM in R both according to the python RNN example, i think it is natural that the structures of our codes are similar. Hope this example would give the user who wants to use LSTM in R some help. |
|
It is good for you to wrap everything up and debug codes (e.g. you used But please give appropriate credits. Still beautiful work, though. |
|
I have change Please wait. |
|
@ziyeqinghan I am testing this code. But sometimes it works, sometimes not. |
|
@thirdwing I retested and found this problem too and haven't found the reason right now. I wonder whether it is caused by the same problem in Issue #1618. |
|
I can't reproduce this error every time. |
|
@tqchen Do you have any idea why we got different results randomly? How can we track this down? |
|
Could due to too big initialization. I do not have immediate things jump from my mind. One way we can do is do a pair-test between python(or julia) and R code. I might recommend look at julia code as the matrix order of julia is same as R |
|
I will do that now. On Sun, Mar 27, 2016 at 12:51 PM, Tianqi Chen notifications@github.com
Qiang Kou |
|
Although I still didn't find the real reason, but sometimes this line gets wrong result. |
|
I think I found the error. It is in these lines https://github.com/ziyeqinghan/mxnet/blob/lstm/R-package/demo/rnn/lstm.R#L244-L245
|
|
Hi, @ziyeqinghan I think I have fixed the problems in my own repo (see thirdwing@022659a). Please help to test it. If it works, you can update your PR according to it, or we can just merge that commit. |
|
@thirdwing Thank you for your correction. I have also modified the code in these lines https://github.com/thirdwing/mxnet/blob/master/example/rnn/lstm.R#L357-L358 I think there are still some other bugs. I will also try to debug it. Thanks! |
|
I think the problem might be in https://github.com/ziyeqinghan/mxnet/blob/lstm/R-package/demo/rnn/lstm.R#L235 |
|
One reason about diverge could simply due to too large learning rate. |
|
@tqchen I have tried some smaller learning rates and initialized weights but still find the same problem. |
|
@thirdwing I think that I have fixed the bug. Since the |
| toc <- Sys.time() | ||
| cat(paste0("Iter [", iteration, | ||
| "] Train: Time: ", toc - tic, | ||
| " sec, NLL=", train.nll / nbatch, |
There was a problem hiding this comment.
In R, toc - tic will produce an object with type of difftime, but we can't guarantee it will be displayed with unit of seconds, so please change it into as.numeric(toc - tic, units="secs").
|
@ziyeqinghan I think you are right. Just some minor changes you might need to do: (1) change a little bit of your timing code; (2) Can we move this demo into the same folder with python one (https://github.com/dmlc/mxnet/tree/master/example/rnn)? I think it is better for user to find demos; (3) please update with the master branch. |
add LSTM in R fix bug: the gradient of init.c and init.h should be zero change the folder of lstm in R and modify timing code
|
@thirdwing I have changed my timing code and moved the demo into the same folder with python one. I have also rebased the master branch and combined the commits. |
|
@ziyeqinghan @thirdwing It would be better to put some reusable functions into the library and add enough documentations. The example probably needs some more comments so a normal R user would be able to follow more easily. |
|
@ziyeqinghan @thirdwing Sorry for not bring this up up-front. But it would be great if you can contribute a tutorial on how to use this feature like others in http://mxnet.dmlc.ml/en/latest/packages/r/index.html#tutorials You are also invited to convert the tutorial to a guest blogpost, which we can forward to r-blogger |
|
Good idea. This will be done. On Fri, May 20, 2016 at 12:24 PM, Tianqi Chen notifications@github.com
Qiang Kou |

I wrote LSTM in R using low level symbol interface according to the implementation of rnn in python. It mainy contains two files:
lstm.RFunctions for building a LSTM Networkchar_lstm.Rdemo how to train a character LSTM by using lstm.RMeanwhile, in order to clear the gradient arrays for the argument
rad.req="add"of functionmx.simple.bind, I added a functionmx.exec.update.grad.arrays.The following is part of the "char LSTM example" result: