Added Pointer Networks implementation based on the Vocabulary Approach #505

Zarana-Parekh · 2018-08-09T23:47:17Z

Added the pointer networks implementation for summarization models. This is based on the vocabulary approach (for copying probability) as described in:
See, Abigail, Peter J. Liu, and Christopher D. Manning. "Get to the point: Summarization with pointer-generator networks." arXiv preprint arXiv:1704.04368 (2017).
It also includes adapting the coverage mechanism of Sockeye to work with the pointer networks model as given in the paper.

Pull Request Checklist

[] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
until you can check this box.
Unit tests pass (pytest)
Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
System tests pass (pytest test/system)
Passed code style checking (./style-check.sh)
You have considered writing a test
Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made as an Amazon employee.

Zarana-Parekh · 2018-08-10T22:11:17Z

ping @mjpost @fhieber @tdomhan

mjpost · 2018-08-12T14:53:44Z

Hi Zarana—thanks, will look at this soon.

mjpost

I made a first-pass code review, focusing mostly on training and not yet on inference.

A few high-level comments:

See the notes about passing arguments to the decoder, which should read this information from the model, instead.
Can you provide a paragraph or two describing what your code does? You cite See's paper, but it would be nice to have a description from you to go along with the PR. Do all OOV's share the same embedding representation?
I don't like the change to tokens2ids(). This is a simple function that is now encumbered with a lot of arguments that are redundant in most cases. Can you find a way to refactor this or make it simpler, perhaps with default arguments, or by using another function for pointer nets?

mjpost · 2018-08-13T08:28:59Z

sockeye/checkpoint_decoder.py

@@ -115,13 +115,19 @@ def __init__(self,
                    context)

    def decode_and_evaluate(self,
+                            use_pointer_nets: bool,
+                            max_oov_words: int,
+                            pointer_nets_type: str,
                            checkpoint: Optional[int] = None,


These three arguments will be read by the model from the ModelConfig. You shouldn't need to pass them through at all.

mjpost · 2018-08-13T08:29:59Z

sockeye/constants.py

@@ -27,6 +27,7 @@
 VOCAB_SYMBOLS = [PAD_SYMBOL, UNK_SYMBOL, BOS_SYMBOL, EOS_SYMBOL]
 # reserve extra space for the EOS or BOS symbol that is added to both source and target
 SPACE_FOR_XOS = 1
+MAX_OOV_WORDS = 50


Why not reuse another variable for this, such as the maximum input length? That way we avoid creating another variable by using a good default. Is there any reason to set it to something other than that?

mjpost · 2018-08-13T08:35:21Z

sockeye/data_io.py

+            if pointer_nets_type != C.POINTER_NET_SUMMARY:
+                labels = target[1:] + [self.eos_id]
+                if self.aligner is not None:
+                    labels = self.aligner.get_labels(sources[0], target, labels)



This is redundant code that I just removed...

mjpost · 2018-08-13T08:38:55Z

sockeye/decoder.py

@@ -606,7 +611,7 @@ def decode_sequence(self,
        hidden_states = []  # type: List[mx.sym.Symbol]
        context_vectors = []  # type: List[mx.sym.Symbol]
        attention_probs = []  # type: List[mx.sym.Symbol]
-        # TODO: possible alternative: feed back the context vector instead of the hidden (see lamtram)


Let's keep this line in.

mjpost · 2018-08-13T08:40:21Z

sockeye/decoder.py

@@ -582,6 +586,7 @@ def decode_sequence(self,
        """

        # target_embed: target_seq_len * (batch_size, num_target_embed)
+        target_embed_local = target_embed


It isn't necessary to save this and then return it. This reassignment here does not change the value in the caller.

mjpost · 2018-08-13T08:46:33Z

sockeye/inference.py

-                else:
-#                    target_embed_prev = mx.sym.Custom(op_type="PrintValue", data=target_embed_prev, print_name="TARG")
-                    outputs = self.output_layer(target_decoded, attention=attention_probs,
-                                                context=attention_context, target_embed=target_embed_prev)


You renamed context above but not here...

mjpost · 2018-08-13T08:47:36Z

sockeye/inference.py

+            if self.config.use_pointer_nets and self.config.pointer_net_type == C.POINTER_NET_SUMMARY:
+                prob_vocab, prob_source = self.output_layer(target_decoded, context=context, attention=attention_probs,
+                                                            target_embed=target_embed_prev)
+


This is the exact same invocation as the else block below. Can you refactor this so the pointer net cases are above? It seems like there is no need to check pointer_net_type here.

mjpost · 2018-08-13T08:52:59Z

sockeye/training.py

@@ -65,6 +65,7 @@ def __init__(self,
                 provide_label: List[mx.io.DataDesc],
                 default_bucket_key: Tuple[int, int],
                 bucketing: bool,
+                 batch_size: Optional[int],


Optional just means the value can be None, but for an int, you probably want to set a default...

mjpost · 2018-08-13T08:56:01Z

sockeye/training.py

-            process_manager = DecoderProcessManager(self.model.output_dir, decoder=decoder)
+            process_manager = DecoderProcessManager(self.model.output_dir, decoder=decoder,
+                                                    use_pointer_nets=use_pointer_nets, max_oov_words=max_oov_words,
+                                                    pointer_nets_type=pointer_nets_type)


This shouldn't need to know about pointer nets...

mjpost · 2018-08-13T11:22:55Z

sockeye/data_io.py

-             num_samples_per_bucket: List[int]) -> 'ParallelDataSet':
+             num_samples_per_bucket: List[int],
+             use_pointer_nets: bool,
+             pointer_nets_type: str) -> 'ParallelDataSet':



These variables aren't used, right?

mjpost · 2018-08-13T15:59:49Z

sockeye/training.py

+                                                                          target_embed=target_embed)
+
+                    # to correctly add source word probabilities based on the source word indices in the vocabulary
+                    # generate corresponding indices matrix


I understand the net effect of the code below, but following the details is difficult. Can you add a comment above each line, describing what exactly that line is doing, and why? Basically, a detailed walk-through that will leave the reader understanding how the code sums up probabilities from the target vocabulary and from pointed words in the source sentence.

Note also that the documentation for MXNet's scatter_nd() function has a call-out claiming the behavior is undefined if there are duplicate indices, which can definitely occur. Do you have any idea of how to address this?

lambdaofgod · 2018-12-18T15:10:03Z

@Zarana-Parekh do you have any examples on running architectures with copy component? Did you succeed in running configuration from pointer-generator paper? What would be required parameters?

fhieber · 2019-08-11T09:20:38Z

Closing this in favor of a similar approach to pointer networks (#697), which is already merged.

Zarana Parekh added 5 commits August 8, 2018 18:15

Added Pointer Networks code for Summarization

0395f00

fixed minor code integration bugs

66b2a6f

more bug fixes

70a6f7b

fixed unit test error in arguement parsing

3d13ed2

updated changelog and sockeye minor version

4e5beff

Zarana-Parekh requested review from davvil, fhieber, mjdenkowski and tdomhan as code owners August 9, 2018 23:47

WilliamTambellini mentioned this pull request Aug 10, 2018

Add a link to the sockeye implementation of Pointer-generator summarizer sebastianruder/NLP-progress#76

Closed

undo system test modifications

6d86641

Zarana-Parekh changed the title ~~Added Pointer Networks implementation based on the Voacabulary Approach~~ Added Pointer Networks implementation based on the Vocabulary Approach Aug 10, 2018

Zarana Parekh added 4 commits August 10, 2018 13:34

undo one more system test modification

3aa8c07

commented type declaration causing system test python 3.4 to fail

f833eb8

updated manifest for scripts in tutorials directory

e8ad32b

fixed type annotation error

0b18d6a

mjpost reviewed Aug 13, 2018

View reviewed changes

fhieber closed this Aug 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Pointer Networks implementation based on the Vocabulary Approach #505

Added Pointer Networks implementation based on the Vocabulary Approach #505

Zarana-Parekh commented Aug 9, 2018

Zarana-Parekh commented Aug 10, 2018

mjpost commented Aug 12, 2018

mjpost left a comment

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

mjpost Aug 13, 2018

lambdaofgod commented Dec 18, 2018

fhieber commented Aug 11, 2019

Added Pointer Networks implementation based on the Vocabulary Approach #505

Added Pointer Networks implementation based on the Vocabulary Approach #505

Conversation

Zarana-Parekh commented Aug 9, 2018

Pull Request Checklist

Zarana-Parekh commented Aug 10, 2018

mjpost commented Aug 12, 2018

mjpost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lambdaofgod commented Dec 18, 2018

fhieber commented Aug 11, 2019