design/sequence decoder #4905

Superjomn · 2017-10-18T22:15:57Z

An LoD-based Sequence Decoder (Beam Search)

refactor the beamSearch in RecurrentGradientMachine

wangkuiyi · 2017-10-18T23:42:12Z

doc/design/ops/sequence_decoder.md

+## Beam Search
+Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set.
+
+It is the core component of Sequence Decoder.


Sequence Decoder => *sequence decoder*

wangkuiyi · 2017-10-18T23:43:03Z

doc/design/ops/sequence_decoder.md

+
+It is the core component of Sequence Decoder.
+
+In the original implementation of `RecurrentGradientMachine`, the beam search is a method in RNN,


Are there more than one implementations of RecrurentGradientMachine? What is the other implementations besides the "original" one?

wangkuiyi · 2017-10-19T21:59:47Z

doc/design/ops/sequence_decoder.md

@@ -0,0 +1,436 @@
+# A LoD-based Sequence Decoder 
+In tasks such as machine translation and image to text, 
+a **sequence decoder** is necessary to generate sequences.


a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary

wangkuiyi · 2017-10-19T22:00:52Z

doc/design/ops/sequence_decoder.md

@@ -0,0 +1,436 @@
+# A LoD-based Sequence Decoder 


Design: Sequence Decoder Generating LoDTensors

wangkuiyi · 2017-10-19T22:02:12Z

doc/design/ops/sequence_decoder.md

+For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as `LoDTensors`;
+the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated.
+
+## Changing LoD's absolute offset to relative offsets


This is not part of the design. This is an issue. I copy-n-pasted the content into issue #4945

lcy-seso

some simple comments first.

lcy-seso · 2017-10-19T22:38:55Z

doc/design/ops/sequence_decoder.md

+```
+
+the first level represents that there are two sequences, 
+their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`.


Is here [3, 5) or [3, 6), I think it is the latter one.

lcy-seso · 2017-10-19T22:40:25Z

doc/design/ops/sequence_decoder.md

+the first level represents that there are two sequences, 
+their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`.
+
+The second level is the same with the relative offset example because the lower level is a tensor.


the same as or the same to？

lcy-seso · 2017-10-19T22:49:04Z

doc/design/ops/sequence_decoder.md

+        # for example
+        # decoder_mem.lod is
+        # [[0 1 3],
+        #  [0 1 3 6]]


Why is decoder_mem here a sequence? Usually, the memory of an RNN is its hidden state in the last time step. Is memory here a TensorArray? and we decide to memorize hidden states in all the previous time step? If so, is this design compatible to the memory in dynamic RNN?

But I guess, maybe this memory has something different?

lcy-seso · 2017-10-19T22:52:28Z

doc/design/ops/sequence_decoder.md

+        # its tensor content is [a1 a2 a3 a4 a5]
+        # which means there are 2 sentences to translate
+        #   - the first sentence has 1 translation prefixes, the offsets are [0, 1)
+        #   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)


Does here means memory need to memorize the entire unfinished prefixes? (This is required for beam search.)

lcy-seso · 2017-10-19T22:54:08Z

doc/design/ops/sequence_decoder.md

+        # the following has 2, 3, 2, 3 candidates
+        # the encoder_ctx_expanded's content will be
+        # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]
+        encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word)


The name target word is confusing. In the generation, we do not have target word.

lcy-seso · 2017-10-19T22:55:47Z

doc/design/ops/sequence_decoder.md

+        # which means there are 2 sentences to translate
+        #   - the first sentence has 1 translation prefixes, the offsets are [0, 1)
+        #   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
+        # the target_word.lod is 


The name target word for generation is really confusing. For the generation, we only have source words to be translated into target words.

lcy-seso · 2017-10-19T22:57:53Z

doc/design/ops/sequence_decoder.md

+            bias=None,
+            act=pd.activation.Softmax())
+        # topk_scores, a tensor, [None, k]
+        topk_scores, topk_ids = pd.top_k(scores)


This top_k is special. It is not only a simple "selecting top k items", but also "selecting the top k from a distribution to form a new batch". How do we handle this?

lcy-seso · 2017-10-19T22:59:24Z

doc/design/ops/sequence_decoder.md

+        # selected_ids is the selected candidates that will be append to the translation
+        # selected_scores is the scores of the selected candidates
+        # generated_scores is the score of the translations(with candidates appended)
+        selected_ids, selected_scores, generated_scores = decoder.beam_search(


First, I want to leave some of my questions here for our discussion.

The difficulties for dynamic beam search through some preliminary operators lies in (maybe not limited to):

how to loop.

maybe this is done by dynamic RNN currently, but the loop is a generally used operation, should it be independent of RNN?

how to stop the loop (the condition operation?).

Samples in a batch may hit different conditions, as a result, the batch size is dynamically changing.

Maybe all branches of a condition will be executed? I am not sure what is the design of the current condition operator, and do we decide to use it in beam search?

How to construct the beam?

Dynamic expansion to form a larger batch but have to track the entire unfinished prefixes.

How the track the unfinished prefixes, and who to track this? It seems that in the current design the decoder memory tracks this?

How to shrink the beam?

…sequence_decoder

…into design/sequence_decoder

jacquesqiao

LGTM!

Superjomn added 3 commits October 17, 2017 20:00

init

4bdb7bf

first version

b79000d

fix grammer

804cb89

wangkuiyi reviewed Oct 18, 2017

View reviewed changes

Superjomn added 3 commits October 18, 2017 20:22

fix chapter 1

2261157

add more details

4bfac75

add more details

29fed7c

Superjomn requested review from abhinavarora, lcy-seso and qingqing01 October 19, 2017 04:58

Superjomn added 2 commits October 19, 2017 13:19

add details about LoD and shape change

2a2888c

add tensorarray usage

c3b5eb0

wangkuiyi reviewed Oct 19, 2017

View reviewed changes

change title

a7c5cbe

lcy-seso reviewed Oct 19, 2017

View reviewed changes

Superjomn added 4 commits October 20, 2017 17:37

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into design/…

02a32fe

…sequence_decoder

Merge branch 'design/sequence_decoder' of github.com:Superjom/Paddle …

3c0ce07

…into design/sequence_decoder

add new decoder syntax

1231562

update decoder syntax

a8eb671

Superjomn requested a review from emailweixu October 23, 2017 16:58

change sequence decoder to while_loop

6d05ea3

jacquesqiao approved these changes Nov 9, 2017

View reviewed changes

Superjomn merged commit 53cb4df into PaddlePaddle:develop Nov 9, 2017

Superjomn deleted the design/sequence_decoder branch November 9, 2017 05:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design/sequence decoder #4905

design/sequence decoder #4905

Superjomn commented Oct 18, 2017

wangkuiyi Oct 18, 2017

wangkuiyi Oct 18, 2017

wangkuiyi Oct 19, 2017

wangkuiyi Oct 19, 2017

wangkuiyi Oct 19, 2017

lcy-seso left a comment

lcy-seso Oct 19, 2017

lcy-seso Oct 19, 2017 •

edited

Loading

lcy-seso Oct 19, 2017 •

edited

Loading

lcy-seso Oct 19, 2017

lcy-seso Oct 19, 2017

lcy-seso Oct 19, 2017

lcy-seso Oct 19, 2017

lcy-seso Oct 19, 2017 •

edited

Loading

jacquesqiao left a comment


		It is the core component of Sequence Decoder.

		In the original implementation of `RecurrentGradientMachine`, the beam search is a method in RNN,

design/sequence decoder #4905

design/sequence decoder #4905

Conversation

Superjomn commented Oct 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Oct 19, 2017 • edited Loading

Choose a reason for hiding this comment

lcy-seso Oct 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Oct 19, 2017 • edited Loading

Choose a reason for hiding this comment

jacquesqiao left a comment

Choose a reason for hiding this comment

lcy-seso Oct 19, 2017 •

edited

Loading

lcy-seso Oct 19, 2017 •

edited

Loading

lcy-seso Oct 19, 2017 •

edited

Loading