Forward implementation for LSTM operator. #4929

qingqing01 · 2017-10-19T03:46:59Z

Finish the implementation kernel, but the code style will be refined later.
Finish the forward implementation and the unit testing.
Support reversed LSTM.
Support peephole connections by default.
TODO in next PR:
- Backward implementation and it's unit testing.
- Improve code about the kernel implementation in math/detail.
- Support to disable peephole connections.
- Enable initial hidden state and initial cell state.
- Now use a SequenceToBatch functor to reorganize and sort the input sequence, I'll try to use TensorArray in the future.

Directory structure:

├── lstm_op.cc
├── lstm_op.cu
├── lstm_op.h
├── math
│   ├── lstm_compute.cc
│   ├── lstm_compute.cu
│   ├── lstm_compute.h [One batch computing functor for lstm.]
│   ├── sequence2batch.cc
│   ├── sequence2batch.cu
│   ├── sequence2batch.h [Reorganize and sort the input sequence,  try to use `TensorArray` in the future.]
│   ├── detail [The code in this directory will be refined later.]
│   │   ├── CMakeLists.txt
│   │   ├── hl_activation_functions.h
│   │   ├── hl_avx_functions.cc
│   │   ├── hl_avx_functions.h
│   │   ├── hl_cpu_functions.cc
│   │   ├── hl_functions.h  [Activations for cpu, gpu and avx]
│   │   ├── hl_gpu_functions.h 
│   │   ├── lstm_cpu_kernel.h [CPU implenmentation kernel for one batch and one sequence.]
│   │   ├── lstm_gpu_kernel.h [GPU implenmentation kernel for one batch.]
│   │   └── lstm_kernel.h

… lstm

reyoung · 2017-10-20T03:53:30Z

paddle/operators/math/sequence2batch.h

+    //           s0: 0 0 0 0, s1: 1 1 1 1 1, s2: 2 2 2
+    //           seq_info[3] = {(4, 5, 1), (0, 4, 0), (9, 3, 2)}
+    //
+    struct SeqInfo {


Move struct SeqInfo out of this function.

Done by @reyoung

reyoung · 2017-10-20T03:54:26Z

paddle/operators/math/sequence2batch.h

+    // input LodTensor. It is also the maximum length of input sequence.
+
+    paddle::framework::LoD batch_lods;
+    batch_lods.push_back(std::vector<size_t>{0});


I am not sure, but is it working to push_back std::vector<size_t> in GPU.

Maybe emplace_back is better.

在GPU下，push_back/emplace_back 都可以正常使用，都可以直接赋给thrust::host_vector，单测里是没问题的。

reyoung · 2017-10-20T03:54:38Z

paddle/operators/math/sequence2batch.h

+
+    paddle::framework::LoD batch_lods;
+    batch_lods.push_back(std::vector<size_t>{0});
+    batch_lods.push_back(std::vector<size_t>{0});


Same as above

luotao1

赞详细的LSTM注释。

luotao1 · 2017-10-20T11:37:04Z

paddle/operators/CMakeLists.txt

@@ -126,6 +127,7 @@ op_library(softmax_with_cross_entropy_op DEPS cross_entropy softmax)
 op_library(sum_op DEPS net_op)
 op_library(pool_op DEPS pooling)
 op_library(pool_with_index_op DEPS pooling)
+op_library(lstm_op DEPS sequence2batch lstm_compute math_function)


这里不需要加math_function，12行已经加过依赖了。

luotao1 · 2017-10-20T11:39:05Z

paddle/operators/lstm_op.cc

+    AddInput("Input",
+             "(LoDTensor) the first input is a LodTensor, which support "
+             "variable-time length input sequence. The underlying tensor in "
+             "this LoDTenosr is a matrix with shape (T X 4D), where, T is the "


where后面不需要逗号

luotao1 · 2017-10-20T11:40:13Z

paddle/operators/lstm_op.cc

+             "batch size. `H0` and `C0` can be NULL but only at the same time");
+    AddInput("Weight",
+             "(Tensor) the learnable hidden-hidden weights."
+             " - The shape is (D x 4*D), where D is the hidden size. "


4*D-》4D，看89行是用4D，下同。或者都用4*D的格式。

luotao1 · 2017-10-20T11:42:10Z

paddle/operators/lstm_op.cc

+    AddInput("Bias",
+             "(Tensor) the learnable weights, which contains two parts: "
+             "input-hidden bias weight and peephole connections weight if "
+             "seting `usePeepholes` True. "


seting-》setting

luotao1 · 2017-10-20T11:43:30Z

paddle/operators/lstm_op.cc

+             " - Bias = {b_i, b_f, b_c, b_o, W_ic, W_fc, W_oc}.");
+    AddOutput("BatchGate",
+              "(LoDTensor) This LoDTensor contains input gate, forget gate "
+              "and output gate aftern the nonlinear computation. This "


aftern-》after，笔误

luotao1 · 2017-10-20T12:43:22Z

paddle/operators/lstm_op.h

+    lstm_value.checkOg = lstm_value.checkFg + frame_size;
+    lstm_value.prevStateValue = nullptr;
+
+    framework::LoDTensor batch_out;


using LoDTensor = framework::LoDTensor;
这里直接写LoDTensor会更清爽
79,81,83能写成一行么：

LoDTensor batch_out, batch_cell, batch_cell_pre_act

luotao1 · 2017-10-20T12:56:02Z

paddle/operators/math/detail/lstm_cpu_kernel.h

+  T rState;
+  T rPrevState = 0;
+  T rStateAtv;
+  T rOut;


33-43行可以合并一些么?

T rValueIn, rValueIg, rValueIg, rValueIg; T rCheckI, rCheckF, rCheckO; T rState, rPrevState = 0, rStateAtv; T rOut;

Good suggestion. 后续要更改code style，到时候一起修改。Thanks!

luotao1 · 2017-10-20T13:02:35Z

paddle/operators/math/detail/lstm_cpu_kernel.h

+  T rCheckO;
+  T rCheckIGrad;
+  T rCheckFGrad;
+  T rCheckOGrad;


同上，可以分组合并么。
下同。

luotao1 · 2017-10-20T13:12:46Z

paddle/operators/math/sequence2batch.h

+    // sort sequence index by the length.
+    // example:  sequences = {s0, s1, s2}
+    //           s0: 0 0 0 0, s1: 1 1 1 1 1, s2: 2 2 2
+    //           seq_info[3] = {(4, 5, 1), (0, 4, 0), (9, 3, 2)}


51行是怎么出来的，注释能更详细点么

luotao1 · 2017-10-20T13:16:01Z

paddle/operators/math/sequence2batch.h

+          if (!is_reverse) {
+            seq2batch_idx[batch_id] = start + n;
+          } else {
+            seq2batch_idx[batch_id] = start + seq_len - 1 - n;


seq2batch_idx[batch_id]=is_reverse? start + seq_len - 1 - n: start + n;

reyoung · 2017-10-20T18:21:52Z

Please note #4952 has been merged

reyoung · 2017-10-20T18:23:28Z

paddle/operators/lstm_op.h

+      math::LstmUnitFunctor<Place, T>::compute(ctx.device_context(), lstm_value,
+                                               frame_size, cur_batch_size,
+                                               gate_act, cell_act, cand_act);
+      lstm_value.prevStateValue = lstm_value.stateValue;


这里是故意这么做的么？ lstm_value.prevStateValue永远等于 lstm_value.stateValue ?

因为lstm_value.stateValue没有被compute函数修改过。这一行如果移动到第117行之前效果也是一样的。

reyoung · 2017-10-20T18:25:50Z

paddle/operators/math/detail/lstm_cpu_kernel.h

+#ifndef __NVCC__
+
+template <class T, class Op>
+void naive_lstm_forward_one_sequence(Op op, LstmMetaValue<T> value,


这里是值传递，故调用这个函数的value永远不会修改。

嗯，value里存的是T*,里面的值会变，下个PR中将value本身作为引用传递吧。

reyoung · 2017-10-20T18:26:42Z

paddle/operators/math/detail/lstm_kernel.h

+#ifndef __AVX__
+  static const bool avx = false;
+#else
+  static const bool avx = true;


看起来如果T是float， avx也应该等于false。
是不是double类型在单测里没测过？

单测是double类型，在AVX函数调用的地方有类型判断： std::is_same<T, float>::value

https://github.com/PaddlePaddle/Paddle/pull/4929/files/694bc64aafa95b065ca5e2aecef3bf4116208a55#diff-3bfbafd6e96e201350bbbef605a0de03R282

if (Op::avx && !(frameSize & (8 - 1)) && (std::is_same<T, float>::value)) { avx_lstm_backward_one_sequence<T>(op, value, grad, frameSize, ...) }

reyoung · 2017-10-20T18:28:25Z

paddle/operators/math/lstm_compute.h

+  HL_ACTIVATION_RELU = 1,
+  HL_ACTIVATION_TANH = 2,
+  HL_ACTIVATION_LINEAR = 3,
+  HL_ACTIVATION_END


HL_ACTIVATION_END is not needed.

另外，即使需要，也叫做 NUM_OF_ACTIVATIONS比较好。

Thanks! 后续PR修改。

reyoung · 2017-10-20T19:53:21Z

Please review qingqing01#3 for several enhancements

reyoung

Basically LGTM. However, the develop branch should be merged.

Do not let me block to merge this PR. Please let other approve if I am not online after merging the develop branch.

Several Enhancement

qingqing01 · 2017-10-23T01:39:21Z

@reyoung Thanks very much for your enhancements!

luotao1

LGTM

qingqing01 added 7 commits October 12, 2017 11:30

Add LSTM Operators.

8728b3c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9106a4b

… lstm

Add lstm implementation.

3cace73

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

83627d3

… lstm

LSTM Operator forward implementation.

2a8dbd1

Add unit testing for forwad implementation.

d1fbf50

update to develop branch.

b325213

qingqing01 changed the title ~~[WIP]Lstm Operator for all time steps.~~ [WIP]LSTM operator for all time-steps. Oct 19, 2017

qingqing01 added the OpPorting label Oct 19, 2017

qingqing01 added 2 commits October 19, 2017 13:30

Add missing file.

a461bf1

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ee39b37

… lstm

qingqing01 force-pushed the lstm branch from ac5f60c to ee39b37 Compare October 19, 2017 09:13

qingqing01 added 3 commits October 19, 2017 17:28

Add missing file of math/detail/CMakeLists.txt

8bec26b

Enhance unit testing and fix bug.

17e3373

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

694bc64

… lstm

qingqing01 changed the title ~~[WIP]LSTM operator for all time-steps.~~ Forward implementation in LSTM operator. Oct 19, 2017

qingqing01 requested review from reyoung, zchen0211, guoshengCS, hedaoyuan, wangkuiyi and luotao1 October 19, 2017 14:21

qingqing01 changed the title ~~Forward implementation in LSTM operator.~~ Forward implementation for LSTM operator. Oct 20, 2017

qingqing01 requested review from Superjomn and lcy-seso October 20, 2017 03:23

reyoung reviewed Oct 20, 2017

View reviewed changes

luotao1 reviewed Oct 20, 2017

View reviewed changes

reyoung reviewed Oct 20, 2017

View reviewed changes

Several Enhancement

65906ef

reyoung reviewed Oct 20, 2017

View reviewed changes

Merge pull request #3 from reyoung/pr/4929

34aac18

Several Enhancement

qingqing01 added 2 commits October 23, 2017 17:51

Update lstm comments and fix bug.

64fe9bc

update to the develop branch.

cf2608e

luotao1 approved these changes Oct 23, 2017

View reviewed changes

qingqing01 merged commit 3f1062d into PaddlePaddle:develop Oct 23, 2017

qingqing01 deleted the lstm branch March 7, 2018 12:03

Forward implementation for LSTM operator. #4929

Forward implementation for LSTM operator. #4929

Conversation

qingqing01 commented Oct 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung commented Oct 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung commented Oct 20, 2017

reyoung left a comment

Choose a reason for hiding this comment

qingqing01 commented Oct 23, 2017

luotao1 left a comment

Choose a reason for hiding this comment

qingqing01 commented Oct 19, 2017 •

edited

Loading