Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Add the attention visualization to the textual entailment demo #1219

Merged
merged 5 commits into from
May 16, 2018

Conversation

murphp15
Copy link
Contributor

screen shot 2018-05-15 at 14 50 50

@joelgrus
Copy link
Contributor

could I ask that you give your PRs (especially) and commits (less crucial) more descriptive names? most of us don't have the issue numbers memorized, so the titles aren't very helpful.

we typically link to the relevant issue in the description of the PR (which has the benefit of automatically associating the PR with the issue)

@schmmd
Copy link
Member

schmmd commented May 15, 2018

Great point Joel! For example, here I would name the PR "Add the attention visualization to the textual entailment demo" and in the description I would have "Fixes #1033". GitHub treats that phrase specially--see https://help.github.com/articles/closing-issues-using-keywords/.

@murphp15 murphp15 changed the title Fix for https://github.com/allenai/allennlp/issues/1033 Add the attention visualization to the textual entailment demo May 15, 2018
@murphp15
Copy link
Contributor Author

Fixes #1033

@matt-gardner
Copy link
Contributor

Github is weird - you have to edit the original PR description with the Fixes #1033 comment, instead of adding a new comment with that text.

@murphp15 murphp15 changed the title Add the attention visualization to the textual entailment demo Add the attention visualization to the textual entailment demo Fixes #1033 May 15, 2018
@murphp15 murphp15 changed the title Add the attention visualization to the textual entailment demo Fixes #1033 Add the attention visualization to the textual entailment demo May 15, 2018
@murphp15 murphp15 force-pushed the feature/entailment-heatmap branch from 3ab0f14 to 9b9bddb Compare May 15, 2018 15:37
@murphp15 murphp15 force-pushed the feature/entailment-heatmap branch from 9b9bddb to 9cbd543 Compare May 15, 2018 15:38
@murphp15
Copy link
Contributor Author

@matt-gardner
I have:

  1. updated the title.
  2. I squashed all commits into a single one and renamed it to "Fixes Attention matrix in textual entailment demo #1033".

Does that look ok to you now?

@matt-gardner
Copy link
Contributor

Yeah, this looks much better, thanks. Future commits on this PR (if there are any) don't need to have the same "Fixes" message, they should have something more descriptive of the contents of that commit. We also have typically just put the "Fixes" message in the original PR description instead of a commit message, but either is fine.

@murphp15
Copy link
Contributor Author

Cool I'll do this from now on.

Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than switching from a MetadataField to using the Predictor to pass the tokens to the demo, looks great, thanks for the PR!

@@ -71,6 +71,10 @@ def text_to_instance(self, # type: ignore
hypothesis_tokens = self._tokenizer.tokenize(hypothesis)
fields['premise'] = TextField(premise_tokens, self._token_indexers)
fields['hypothesis'] = TextField(hypothesis_tokens, self._token_indexers)
fields['metadata'] = MetadataField({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a MetadataField that the model doesn't really need, I think this would be cleaner if you just modified the Predictor. The Predictor can tokenize the text and pass the tokens on as output to the demo. That would look something like this:

@overrides
def _json_to_instance(self, json_dict: JsonDict) -> Tuple[Instance, JsonDict]:
"""
Expects JSON that looks like ``{"sentence": "..."}``.
Runs the underlying model, and adds the ``"words"`` to the output.
"""
sentence = json_dict["sentence"]
tokens = self._tokenizer.split_words(sentence)
instance = self._dataset_reader.text_to_instance(tokens)
return_dict: JsonDict = {"words":[token.text for token in tokens]}
return instance, return_dict

You'd have to grab the _tokenizer from self._dataset_reader in the Predictor (with # pylint: disable=protected-access) and use that to tokenize the text, instead of instantiating a tokenizer in __init__. You'd also need to modify this DatasetReader's text_to_instance method to optionally take pre-tokenized text, which would look something like how passage_tokens is treated here:

@overrides
def text_to_instance(self, # type: ignore
question_text: str,
passage_text: str,
char_spans: List[Tuple[int, int]] = None,
answer_texts: List[str] = None,
passage_tokens: List[Token] = None) -> Instance:
# pylint: disable=arguments-differ
if not passage_tokens:
passage_tokens = self._tokenizer.tokenize(passage_text)

@@ -164,13 +165,29 @@ def forward(self, # type: ignore

output_dict = {"label_logits": label_logits, "label_probs": label_probs}

self.add_insight_to_output_dict(metadata, output_dict, p2h_attention, h2p_attention)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After switching the MetadataField to something on the Predictor, it seems a bit of overkill to add a method just to add the two attention fields to output_dict. I'd just put those two lines inline here.

hypothesis_tokens = []
if metadata is not None:
for datum in metadata:
premise_tokens.append(datum['premise_tokens'][:len(datum['premise_tokens'])-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI (because this should be removed, anyway), the MetadataField does not get padded, so you don't need to do this [:len...] stuff. It will just be the exact list you created in the DatasetReader.

@murphp15
Copy link
Contributor Author

PR updated.

@murphp15 murphp15 force-pushed the feature/entailment-heatmap branch 2 times, most recently from 91ae8f2 to 77b60ec Compare May 16, 2018 13:14
1. The predictor is now responsible for tokenizing hypothesis and premise.
2. The model no longer takes the metadata parameter anymore.
@murphp15 murphp15 force-pushed the feature/entailment-heatmap branch from 77b60ec to 8461056 Compare May 16, 2018 13:26
Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@matt-gardner matt-gardner merged commit 10ea3b3 into allenai:master May 16, 2018
gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018
…ai#1219)

* Fixes allenai#1033

* changes following PR review.
1. The predictor is now responsible for tokenizing hypothesis and premise.
2. The model no longer takes the metadata parameter anymore.

* Removed some extra blank lines

* Fix spacing issues
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants