Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

add LSTM in R#1673

Merged
thirdwing merged 1 commit intoapache:masterfrom
ziyeqinghan:lstm
Apr 3, 2016
Merged

add LSTM in R#1673
thirdwing merged 1 commit intoapache:masterfrom
ziyeqinghan:lstm

Conversation

@ziyeqinghan
Copy link
Contributor

I wrote LSTM in R using low level symbol interface according to the implementation of rnn in python. It mainy contains two files:

  • lstm.R Functions for building a LSTM Network
  • char_lstm.R demo how to train a character LSTM by using lstm.R

Meanwhile, in order to clear the gradient arrays for the argument rad.req="add" of function mx.simple.bind, I added a function mx.exec.update.grad.arrays.

The following is part of the "char LSTM example" result:

Epoch [31] Train: NLL=3.836905475197, Perp=46.3817227414766
Epoch [62] Train: NLL=3.60287136350845, Perp=36.7034722942903
...
Epoch [961] Train: NLL=3.23440136826571, Perp=25.3911672698533
Iter [1] Train: Time: 3.10378745794296 sec, NLL=3.22799701509071, Perp=25.2290728760392
Iter [1] Val: NLL=2.88174404985709, Perp=17.8453692688881
...
Epoch [20553] Train: NLL=1.44098787073839, Perp=4.22486737903258
Iter [21] Train: Time: 3.150696794192 sec, NLL=1.44111800900158, Perp=4.22541723171315
Iter [21] Val: NLL=1.55106124501652, Perp=4.71647286090669

@thirdwing
Copy link
Contributor

Thank you for this. Please add your name into https://github.com/dmlc/mxnet/blob/master/CONTRIBUTORS.md if you haven't.

@Puriney
Copy link
Contributor

Puriney commented Mar 20, 2016

Plz think twice before submitting PR

Using the low-level APIs to construct LSTM is not your original work, though you replaced _ with .. Dating back to Dec 5 2015, MXNet’s user @FBracun has kindly shared his/her codes on issue (See here). While from your PR from Line here to line here (or equivalently see codes in your branch, see here), codes are highly duplicated without giving credits to @FBracun.

@thirdwing @tqchen

screen shot 2016-03-20 at 12 00 47 pm

@ziyeqinghan ziyeqinghan reopened this Mar 20, 2016
@ziyeqinghan
Copy link
Contributor Author

@Puriney I think I have mentioned that "according to the implementation of rnn in python". I saw the issue #1420 and rewrote this Python implementation in R as @terrytangyuan suggested.

I don't think there are any credit problems about my codes.

Firstly, I don't notice @FBracun's code before you mentioned that. I implemented this example on my own.

Secondly, the codes on Issue #837 is not complete, while my example works fine in the my own testing.

Thirdly, since @FBracun and I implement LSTM in R both according to the python RNN example, i think it is natural that the structures of our codes are similar.

Hope this example would give the user who wants to use LSTM in R some help.

@thirdwing @tqchen

@Puriney
Copy link
Contributor

Puriney commented Mar 20, 2016

It is good for you to wrap everything up and debug codes (e.g. you used mxnet:::mx.varg.symbol.Concat rather than default mx.symbol.Concat referring to issue #1578 ).

But please give appropriate credits.

Still beautiful work, though.

@thirdwing
Copy link
Contributor

I have change mx.symbol.Concat API in #1682 .

Please wait.

@thirdwing
Copy link
Contributor

@ziyeqinghan I am testing this code.

But sometimes it works, sometimes not.

Training with train.shape=(32,31360) 
Training with val.shape=(32,3456) 
Epoch [31] Train: NLL=NaN, Perp=NaN

@ziyeqinghan
Copy link
Contributor Author

@thirdwing I retested and found this problem too and haven't found the reason right now. I wonder whether it is caused by the same problem in Issue #1618.

@thirdwing
Copy link
Contributor

I can't reproduce this error every time.

@thirdwing
Copy link
Contributor

@tqchen Do you have any idea why we got different results randomly? How can we track this down?

@tqchen
Copy link
Member

tqchen commented Mar 27, 2016

Could due to too big initialization. I do not have immediate things jump from my mind. One way we can do is do a pair-test between python(or julia) and R code.

I might recommend look at julia code as the matrix order of julia is same as R

@thirdwing
Copy link
Contributor

I will do that now.

On Sun, Mar 27, 2016 at 12:51 PM, Tianqi Chen notifications@github.com
wrote:

Could due to too big initialization. I do not have immediate things jump
from my mind. One way we can do is do a pair-test between python(or julia)
and R code.

I might recommend look at julia code as the matrix order of julia is same
as R


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1673 (comment)

Qiang Kou
qkou@umail.iu.edu
School of Informatics and Computing, Indiana University

@thirdwing
Copy link
Contributor

Although I still didn't find the real reason, but sometimes this line gets wrong result.

@thirdwing
Copy link
Contributor

I think I found the error. It is in these lines https://github.com/ziyeqinghan/mxnet/blob/lstm/R-package/demo/rnn/lstm.R#L244-L245

init.states[[paste0("l", i, ".init.c")]] <- m$rnn.exec$arg.arrays[[paste0("l", i, ".last.c_output")]]
init.states[[paste0("l", i, ".init.h")]] <- m$rnn.exec$arg.arrays[[paste0("l", i, ".last.h_output")]]

l1.last.c_output is in m$rnn.exec$outputs, not in m$rnn.exec$arg.arrays

@thirdwing
Copy link
Contributor

Hi, @ziyeqinghan

I think I have fixed the problems in my own repo (see thirdwing@022659a).

Please help to test it. If it works, you can update your PR according to it, or we can just merge that commit.

@ziyeqinghan
Copy link
Contributor Author

@thirdwing Thank you for your correction.

I have also modified the code in these lines https://github.com/thirdwing/mxnet/blob/master/example/rnn/lstm.R#L357-L358
and retested it but still find the same problem.

I think there are still some other bugs. I will also try to debug it. Thanks!

@thirdwing
Copy link
Contributor

@tqchen
Copy link
Member

tqchen commented Apr 1, 2016

One reason about diverge could simply due to too large learning rate.

@ziyeqinghan
Copy link
Contributor Author

@tqchen I have tried some smaller learning rates and initialized weights but still find the same problem.

@ziyeqinghan
Copy link
Contributor Author

@thirdwing I think that I have fixed the bug. Since the init.c and init.h should be treated as input data, their gradients need to be zeros. Thus, I set their gradients to be zero in lines.

toc <- Sys.time()
cat(paste0("Iter [", iteration,
"] Train: Time: ", toc - tic,
" sec, NLL=", train.nll / nbatch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In R, toc - tic will produce an object with type of difftime, but we can't guarantee it will be displayed with unit of seconds, so please change it into as.numeric(toc - tic, units="secs").

@thirdwing
Copy link
Contributor

@ziyeqinghan I think you are right. Just some minor changes you might need to do:

(1) change a little bit of your timing code;

(2) Can we move this demo into the same folder with python one (https://github.com/dmlc/mxnet/tree/master/example/rnn)? I think it is better for user to find demos;

(3) please update with the master branch.

add LSTM in R

fix bug: the gradient of init.c and init.h should be zero

change the folder of lstm in R and modify timing code
@ziyeqinghan
Copy link
Contributor Author

@thirdwing I have changed my timing code and moved the demo into the same folder with python one. I have also rebased the master branch and combined the commits.

@thirdwing thirdwing merged commit a12ead8 into apache:master Apr 3, 2016
@terrytangyuan
Copy link
Member

@ziyeqinghan @thirdwing It would be better to put some reusable functions into the library and add enough documentations. The example probably needs some more comments so a normal R user would be able to follow more easily.

@tqchen
Copy link
Member

tqchen commented May 20, 2016

@ziyeqinghan @thirdwing Sorry for not bring this up up-front. But it would be great if you can contribute a tutorial on how to use this feature like others in http://mxnet.dmlc.ml/en/latest/packages/r/index.html#tutorials

You are also invited to convert the tutorial to a guest blogpost, which we can forward to r-blogger

@thirdwing
Copy link
Contributor

Good idea. This will be done.

On Fri, May 20, 2016 at 12:24 PM, Tianqi Chen notifications@github.com
wrote:

@ziyeqinghan https://github.com/ziyeqinghan @thirdwing
https://github.com/thirdwing Sorry for not bring this up up-front. But
it would be great if you can contribute a tutorial on how to use this
feature like others in
http://mxnet.dmlc.ml/en/latest/packages/r/index.html#tutorials

You are also invited to convert the tutorial to a guest blogpost, which we
can forward to r-blogger


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1673 (comment)

Qiang Kou
qkou@umail.iu.edu
School of Informatics and Computing, Indiana University

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants