add LSTM in R by ziyeqinghan · Pull Request #1673 · apache/mxnet

ziyeqinghan · 2016-03-20T04:40:16Z

I wrote LSTM in R using low level symbol interface according to the implementation of rnn in python. It mainy contains two files:

lstm.R Functions for building a LSTM Network
char_lstm.R demo how to train a character LSTM by using lstm.R

Meanwhile, in order to clear the gradient arrays for the argument rad.req="add" of function mx.simple.bind, I added a function mx.exec.update.grad.arrays.

The following is part of the "char LSTM example" result:

Epoch [31] Train: NLL=3.836905475197, Perp=46.3817227414766
Epoch [62] Train: NLL=3.60287136350845, Perp=36.7034722942903
...
Epoch [961] Train: NLL=3.23440136826571, Perp=25.3911672698533
Iter [1] Train: Time: 3.10378745794296 sec, NLL=3.22799701509071, Perp=25.2290728760392
Iter [1] Val: NLL=2.88174404985709, Perp=17.8453692688881
...
Epoch [20553] Train: NLL=1.44098787073839, Perp=4.22486737903258
Iter [21] Train: Time: 3.150696794192 sec, NLL=1.44111800900158, Perp=4.22541723171315
Iter [21] Val: NLL=1.55106124501652, Perp=4.71647286090669

thirdwing · 2016-03-20T14:42:57Z

Thank you for this. Please add your name into https://github.com/dmlc/mxnet/blob/master/CONTRIBUTORS.md if you haven't.

Puriney · 2016-03-20T15:37:24Z

Plz think twice before submitting PR

Using the low-level APIs to construct LSTM is not your original work, though you replaced _ with .. Dating back to Dec 5 2015, MXNet’s user @FBracun has kindly shared his/her codes on issue (See here). While from your PR from Line here to line here (or equivalently see codes in your branch, see here), codes are highly duplicated without giving credits to @FBracun.

@thirdwing @tqchen

ziyeqinghan · 2016-03-20T16:49:48Z

@Puriney I think I have mentioned that "according to the implementation of rnn in python". I saw the issue #1420 and rewrote this Python implementation in R as @terrytangyuan suggested.

I don't think there are any credit problems about my codes.

Firstly, I don't notice @FBracun's code before you mentioned that. I implemented this example on my own.

Secondly, the codes on Issue #837 is not complete, while my example works fine in the my own testing.

Thirdly, since @FBracun and I implement LSTM in R both according to the python RNN example, i think it is natural that the structures of our codes are similar.

Hope this example would give the user who wants to use LSTM in R some help.

@thirdwing @tqchen

Puriney · 2016-03-20T17:16:27Z

It is good for you to wrap everything up and debug codes (e.g. you used mxnet:::mx.varg.symbol.Concat rather than default mx.symbol.Concat referring to issue #1578 ).

But please give appropriate credits.

Still beautiful work, though.

thirdwing · 2016-03-21T00:48:29Z

I have change mx.symbol.Concat API in #1682 .

Please wait.

thirdwing · 2016-03-26T02:14:49Z

@ziyeqinghan I am testing this code.

But sometimes it works, sometimes not.

Training with train.shape=(32,31360) 
Training with val.shape=(32,3456) 
Epoch [31] Train: NLL=NaN, Perp=NaN

ziyeqinghan · 2016-03-26T12:15:46Z

@thirdwing I retested and found this problem too and haven't found the reason right now. I wonder whether it is caused by the same problem in Issue #1618.

thirdwing · 2016-03-26T15:21:04Z

I can't reproduce this error every time.

thirdwing · 2016-03-26T15:50:56Z

@tqchen Do you have any idea why we got different results randomly? How can we track this down?

tqchen · 2016-03-27T16:51:25Z

Could due to too big initialization. I do not have immediate things jump from my mind. One way we can do is do a pair-test between python(or julia) and R code.

I might recommend look at julia code as the matrix order of julia is same as R

thirdwing · 2016-03-27T18:00:32Z

I will do that now.

On Sun, Mar 27, 2016 at 12:51 PM, Tianqi Chen notifications@github.com
wrote:

Could due to too big initialization. I do not have immediate things jump
from my mind. One way we can do is do a pair-test between python(or julia)
and R code.

I might recommend look at julia code as the matrix order of julia is same
as R

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1673 (comment)

Qiang Kou
qkou@umail.iu.edu
School of Informatics and Computing, Indiana University

thirdwing · 2016-03-27T22:29:11Z

Although I still didn't find the real reason, but sometimes this line gets wrong result.

thirdwing · 2016-03-30T00:27:41Z

I think I found the error. It is in these lines https://github.com/ziyeqinghan/mxnet/blob/lstm/R-package/demo/rnn/lstm.R#L244-L245

init.states[[paste0("l", i, ".init.c")]] <- m$rnn.exec$arg.arrays[[paste0("l", i, ".last.c_output")]]
init.states[[paste0("l", i, ".init.h")]] <- m$rnn.exec$arg.arrays[[paste0("l", i, ".last.h_output")]]

l1.last.c_output is in m$rnn.exec$outputs, not in m$rnn.exec$arg.arrays

thirdwing · 2016-03-30T02:04:46Z

Hi, @ziyeqinghan

I think I have fixed the problems in my own repo (see thirdwing@022659a).

Please help to test it. If it works, you can update your PR according to it, or we can just merge that commit.

ziyeqinghan · 2016-03-30T02:58:45Z

@thirdwing Thank you for your correction.

I have also modified the code in these lines https://github.com/thirdwing/mxnet/blob/master/example/rnn/lstm.R#L357-L358
and retested it but still find the same problem.

I think there are still some other bugs. I will also try to debug it. Thanks!

thirdwing · 2016-03-30T04:00:10Z

I think the problem might be in https://github.com/ziyeqinghan/mxnet/blob/lstm/R-package/demo/rnn/lstm.R#L235

tqchen · 2016-04-01T22:36:42Z

One reason about diverge could simply due to too large learning rate.

ziyeqinghan · 2016-04-03T03:12:04Z

@tqchen I have tried some smaller learning rates and initialized weights but still find the same problem.

ziyeqinghan · 2016-04-03T13:16:53Z

@thirdwing I think that I have fixed the bug. Since the init.c and init.h should be treated as input data, their gradients need to be zeros. Thus, I set their gradients to be zero in lines.

thirdwing · 2016-04-03T14:31:25Z

R-package/demo/rnn/lstm.R

+        toc <- Sys.time()
+        cat(paste0("Iter [", iteration, 
+                   "] Train: Time: ", toc - tic,
+                   " sec, NLL=", train.nll / nbatch,


In R, toc - tic will produce an object with type of difftime, but we can't guarantee it will be displayed with unit of seconds, so please change it into as.numeric(toc - tic, units="secs").

thirdwing · 2016-04-03T14:34:35Z

@ziyeqinghan I think you are right. Just some minor changes you might need to do:

(1) change a little bit of your timing code;

(2) Can we move this demo into the same folder with python one (https://github.com/dmlc/mxnet/tree/master/example/rnn)? I think it is better for user to find demos;

(3) please update with the master branch.

add LSTM in R fix bug: the gradient of init.c and init.h should be zero change the folder of lstm in R and modify timing code

ziyeqinghan · 2016-04-03T15:42:11Z

@thirdwing I have changed my timing code and moved the demo into the same folder with python one. I have also rebased the master branch and combined the commits.

terrytangyuan · 2016-05-20T16:51:47Z

@ziyeqinghan @thirdwing It would be better to put some reusable functions into the library and add enough documentations. The example probably needs some more comments so a normal R user would be able to follow more easily.

tqchen · 2016-05-20T17:24:34Z

@ziyeqinghan @thirdwing Sorry for not bring this up up-front. But it would be great if you can contribute a tutorial on how to use this feature like others in http://mxnet.dmlc.ml/en/latest/packages/r/index.html#tutorials

You are also invited to convert the tutorial to a guest blogpost, which we can forward to r-blogger

thirdwing · 2016-05-21T16:24:05Z

Good idea. This will be done.

On Fri, May 20, 2016 at 12:24 PM, Tianqi Chen notifications@github.com
wrote:

@ziyeqinghan https://github.com/ziyeqinghan @thirdwing
https://github.com/thirdwing Sorry for not bring this up up-front. But
it would be great if you can contribute a tutorial on how to use this
feature like others in
http://mxnet.dmlc.ml/en/latest/packages/r/index.html#tutorials

You are also invited to convert the tutorial to a guest blogpost, which we
can forward to r-blogger

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1673 (comment)

Qiang Kou
qkou@umail.iu.edu
School of Informatics and Computing, Indiana University

ziyeqinghan closed this Mar 20, 2016

ziyeqinghan reopened this Mar 20, 2016

thirdwing mentioned this pull request Mar 27, 2016

Non-windows release? rcppmlpack/RcppMLPACK1#9

Closed

ziyeqinghan force-pushed the lstm branch from 32077dd to f3bec09 Compare April 1, 2016 05:50

ziyeqinghan force-pushed the lstm branch from f3bec09 to 11ab148 Compare April 3, 2016 13:04

thirdwing reviewed Apr 3, 2016
View reviewed changes

ziyeqinghan force-pushed the lstm branch from 01a2828 to 93a8c87 Compare April 3, 2016 15:36

add function mx.exec.update.grad.arrays to clear gradient array

93a8c87

add LSTM in R fix bug: the gradient of init.c and init.h should be zero change the folder of lstm in R and modify timing code

thirdwing merged commit a12ead8 into apache:master Apr 3, 2016

ziyeqinghan mentioned this pull request Jun 3, 2016

refactor lstm model and add documentations in R #2329

Merged

Conversation

ziyeqinghan commented Mar 20, 2016

Uh oh!

thirdwing commented Mar 20, 2016

Uh oh!

Puriney commented Mar 20, 2016

Uh oh!

ziyeqinghan commented Mar 20, 2016

Uh oh!

Puriney commented Mar 20, 2016

Uh oh!

thirdwing commented Mar 21, 2016

Uh oh!

thirdwing commented Mar 26, 2016

Uh oh!

ziyeqinghan commented Mar 26, 2016

Uh oh!

thirdwing commented Mar 26, 2016

Uh oh!

thirdwing commented Mar 26, 2016

Uh oh!

tqchen commented Mar 27, 2016

Uh oh!

thirdwing commented Mar 27, 2016

Uh oh!

thirdwing commented Mar 27, 2016

Uh oh!

thirdwing commented Mar 30, 2016

Uh oh!

thirdwing commented Mar 30, 2016

Uh oh!

ziyeqinghan commented Mar 30, 2016

Uh oh!

thirdwing commented Mar 30, 2016

Uh oh!

tqchen commented Apr 1, 2016

Uh oh!

ziyeqinghan commented Apr 3, 2016

Uh oh!

ziyeqinghan commented Apr 3, 2016

Uh oh!

thirdwing Apr 3, 2016

Choose a reason for hiding this comment

Uh oh!

thirdwing commented Apr 3, 2016

Uh oh!

ziyeqinghan commented Apr 3, 2016

Uh oh!

terrytangyuan commented May 20, 2016

Uh oh!

tqchen commented May 20, 2016

Uh oh!

thirdwing commented May 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants