Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Pointer Networks implementation based on the Vocabulary Approach #505

Closed
wants to merge 10 commits into from

Conversation

Zarana-Parekh
Copy link
Contributor

Added the pointer networks implementation for summarization models. This is based on the vocabulary approach (for copying probability) as described in:
See, Abigail, Peter J. Liu, and Christopher D. Manning. "Get to the point: Summarization with pointer-generator networks." arXiv preprint arXiv:1704.04368 (2017).
It also includes adapting the coverage mechanism of Sockeye to work with the pointer networks model as given in the paper.

Pull Request Checklist

  • [] Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
    until you can check this box.
  • Unit tests pass (pytest)
  • Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
  • System tests pass (pytest test/system)
  • Passed code style checking (./style-check.sh)
  • You have considered writing a test
  • Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
  • Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made as an Amazon employee.

@Zarana-Parekh Zarana-Parekh changed the title Added Pointer Networks implementation based on the Voacabulary Approach Added Pointer Networks implementation based on the Vocabulary Approach Aug 10, 2018
@Zarana-Parekh
Copy link
Contributor Author

ping @mjpost @fhieber @tdomhan

@mjpost
Copy link
Contributor

mjpost commented Aug 12, 2018

Hi Zarana—thanks, will look at this soon.

Copy link
Contributor

@mjpost mjpost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first-pass code review, focusing mostly on training and not yet on inference.

A few high-level comments:

  • See the notes about passing arguments to the decoder, which should read this information from the model, instead.

  • Can you provide a paragraph or two describing what your code does? You cite See's paper, but it would be nice to have a description from you to go along with the PR. Do all OOV's share the same embedding representation?

  • I don't like the change to tokens2ids(). This is a simple function that is now encumbered with a lot of arguments that are redundant in most cases. Can you find a way to refactor this or make it simpler, perhaps with default arguments, or by using another function for pointer nets?

@@ -115,13 +115,19 @@ def __init__(self,
context)

def decode_and_evaluate(self,
use_pointer_nets: bool,
max_oov_words: int,
pointer_nets_type: str,
checkpoint: Optional[int] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three arguments will be read by the model from the ModelConfig. You shouldn't need to pass them through at all.

@@ -27,6 +27,7 @@
VOCAB_SYMBOLS = [PAD_SYMBOL, UNK_SYMBOL, BOS_SYMBOL, EOS_SYMBOL]
# reserve extra space for the EOS or BOS symbol that is added to both source and target
SPACE_FOR_XOS = 1
MAX_OOV_WORDS = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse another variable for this, such as the maximum input length? That way we avoid creating another variable by using a good default. Is there any reason to set it to something other than that?

if pointer_nets_type != C.POINTER_NET_SUMMARY:
labels = target[1:] + [self.eos_id]
if self.aligner is not None:
labels = self.aligner.get_labels(sources[0], target, labels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant code that I just removed...

@@ -606,7 +611,7 @@ def decode_sequence(self,
hidden_states = [] # type: List[mx.sym.Symbol]
context_vectors = [] # type: List[mx.sym.Symbol]
attention_probs = [] # type: List[mx.sym.Symbol]
# TODO: possible alternative: feed back the context vector instead of the hidden (see lamtram)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this line in.

@@ -582,6 +586,7 @@ def decode_sequence(self,
"""

# target_embed: target_seq_len * (batch_size, num_target_embed)
target_embed_local = target_embed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't necessary to save this and then return it. This reassignment here does not change the value in the caller.

else:
# target_embed_prev = mx.sym.Custom(op_type="PrintValue", data=target_embed_prev, print_name="TARG")
outputs = self.output_layer(target_decoded, attention=attention_probs,
context=attention_context, target_embed=target_embed_prev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You renamed context above but not here...

if self.config.use_pointer_nets and self.config.pointer_net_type == C.POINTER_NET_SUMMARY:
prob_vocab, prob_source = self.output_layer(target_decoded, context=context, attention=attention_probs,
target_embed=target_embed_prev)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact same invocation as the else block below. Can you refactor this so the pointer net cases are above? It seems like there is no need to check pointer_net_type here.

@@ -65,6 +65,7 @@ def __init__(self,
provide_label: List[mx.io.DataDesc],
default_bucket_key: Tuple[int, int],
bucketing: bool,
batch_size: Optional[int],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional just means the value can be None, but for an int, you probably want to set a default...

process_manager = DecoderProcessManager(self.model.output_dir, decoder=decoder)
process_manager = DecoderProcessManager(self.model.output_dir, decoder=decoder,
use_pointer_nets=use_pointer_nets, max_oov_words=max_oov_words,
pointer_nets_type=pointer_nets_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't need to know about pointer nets...

num_samples_per_bucket: List[int]) -> 'ParallelDataSet':
num_samples_per_bucket: List[int],
use_pointer_nets: bool,
pointer_nets_type: str) -> 'ParallelDataSet':

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables aren't used, right?

target_embed=target_embed)

# to correctly add source word probabilities based on the source word indices in the vocabulary
# generate corresponding indices matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the net effect of the code below, but following the details is difficult. Can you add a comment above each line, describing what exactly that line is doing, and why? Basically, a detailed walk-through that will leave the reader understanding how the code sums up probabilities from the target vocabulary and from pointed words in the source sentence.

Note also that the documentation for MXNet's scatter_nd() function has a call-out claiming the behavior is undefined if there are duplicate indices, which can definitely occur. Do you have any idea of how to address this?

@lambdaofgod
Copy link

@Zarana-Parekh do you have any examples on running architectures with copy component? Did you succeed in running configuration from pointer-generator paper? What would be required parameters?

@fhieber
Copy link
Contributor

fhieber commented Aug 11, 2019

Closing this in favor of a similar approach to pointer networks (#697), which is already merged.

@fhieber fhieber closed this Aug 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants