New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Pointer Networks implementation based on the Vocabulary Approach #505
Conversation
Hi Zarana—thanks, will look at this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a first-pass code review, focusing mostly on training and not yet on inference.
A few high-level comments:
-
See the notes about passing arguments to the decoder, which should read this information from the model, instead.
-
Can you provide a paragraph or two describing what your code does? You cite See's paper, but it would be nice to have a description from you to go along with the PR. Do all OOV's share the same embedding representation?
-
I don't like the change to
tokens2ids()
. This is a simple function that is now encumbered with a lot of arguments that are redundant in most cases. Can you find a way to refactor this or make it simpler, perhaps with default arguments, or by using another function for pointer nets?
@@ -115,13 +115,19 @@ def __init__(self, | |||
context) | |||
|
|||
def decode_and_evaluate(self, | |||
use_pointer_nets: bool, | |||
max_oov_words: int, | |||
pointer_nets_type: str, | |||
checkpoint: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three arguments will be read by the model from the ModelConfig. You shouldn't need to pass them through at all.
@@ -27,6 +27,7 @@ | |||
VOCAB_SYMBOLS = [PAD_SYMBOL, UNK_SYMBOL, BOS_SYMBOL, EOS_SYMBOL] | |||
# reserve extra space for the EOS or BOS symbol that is added to both source and target | |||
SPACE_FOR_XOS = 1 | |||
MAX_OOV_WORDS = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reuse another variable for this, such as the maximum input length? That way we avoid creating another variable by using a good default. Is there any reason to set it to something other than that?
if pointer_nets_type != C.POINTER_NET_SUMMARY: | ||
labels = target[1:] + [self.eos_id] | ||
if self.aligner is not None: | ||
labels = self.aligner.get_labels(sources[0], target, labels) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant code that I just removed...
@@ -606,7 +611,7 @@ def decode_sequence(self, | |||
hidden_states = [] # type: List[mx.sym.Symbol] | |||
context_vectors = [] # type: List[mx.sym.Symbol] | |||
attention_probs = [] # type: List[mx.sym.Symbol] | |||
# TODO: possible alternative: feed back the context vector instead of the hidden (see lamtram) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this line in.
@@ -582,6 +586,7 @@ def decode_sequence(self, | |||
""" | |||
|
|||
# target_embed: target_seq_len * (batch_size, num_target_embed) | |||
target_embed_local = target_embed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't necessary to save this and then return it. This reassignment here does not change the value in the caller.
else: | ||
# target_embed_prev = mx.sym.Custom(op_type="PrintValue", data=target_embed_prev, print_name="TARG") | ||
outputs = self.output_layer(target_decoded, attention=attention_probs, | ||
context=attention_context, target_embed=target_embed_prev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You renamed context
above but not here...
if self.config.use_pointer_nets and self.config.pointer_net_type == C.POINTER_NET_SUMMARY: | ||
prob_vocab, prob_source = self.output_layer(target_decoded, context=context, attention=attention_probs, | ||
target_embed=target_embed_prev) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the exact same invocation as the else
block below. Can you refactor this so the pointer net cases are above? It seems like there is no need to check pointer_net_type
here.
@@ -65,6 +65,7 @@ def __init__(self, | |||
provide_label: List[mx.io.DataDesc], | |||
default_bucket_key: Tuple[int, int], | |||
bucketing: bool, | |||
batch_size: Optional[int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional
just means the value can be None
, but for an int
, you probably want to set a default...
process_manager = DecoderProcessManager(self.model.output_dir, decoder=decoder) | ||
process_manager = DecoderProcessManager(self.model.output_dir, decoder=decoder, | ||
use_pointer_nets=use_pointer_nets, max_oov_words=max_oov_words, | ||
pointer_nets_type=pointer_nets_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't need to know about pointer nets...
num_samples_per_bucket: List[int]) -> 'ParallelDataSet': | ||
num_samples_per_bucket: List[int], | ||
use_pointer_nets: bool, | ||
pointer_nets_type: str) -> 'ParallelDataSet': | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables aren't used, right?
target_embed=target_embed) | ||
|
||
# to correctly add source word probabilities based on the source word indices in the vocabulary | ||
# generate corresponding indices matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the net effect of the code below, but following the details is difficult. Can you add a comment above each line, describing what exactly that line is doing, and why? Basically, a detailed walk-through that will leave the reader understanding how the code sums up probabilities from the target vocabulary and from pointed words in the source sentence.
Note also that the documentation for MXNet's scatter_nd()
function has a call-out claiming the behavior is undefined if there are duplicate indices, which can definitely occur. Do you have any idea of how to address this?
@Zarana-Parekh do you have any examples on running architectures with copy component? Did you succeed in running configuration from pointer-generator paper? What would be required parameters? |
Closing this in favor of a similar approach to pointer networks (#697), which is already merged. |
Added the pointer networks implementation for summarization models. This is based on the vocabulary approach (for copying probability) as described in:
See, Abigail, Peter J. Liu, and Christopher D. Manning. "Get to the point: Summarization with pointer-generator networks." arXiv preprint arXiv:1704.04368 (2017).
It also includes adapting the coverage mechanism of Sockeye to work with the pointer networks model as given in the paper.
Pull Request Checklist
until you can check this box.
pytest
)pytest test/system
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made as an Amazon employee.