Add the attention visualization to the textual entailment demo #1219

murphp15 · 2018-05-15T14:23:35Z

joelgrus · 2018-05-15T15:08:17Z

could I ask that you give your PRs (especially) and commits (less crucial) more descriptive names? most of us don't have the issue numbers memorized, so the titles aren't very helpful.

we typically link to the relevant issue in the description of the PR (which has the benefit of automatically associating the PR with the issue)

schmmd · 2018-05-15T15:20:21Z

Great point Joel! For example, here I would name the PR "Add the attention visualization to the textual entailment demo" and in the description I would have "Fixes #1033". GitHub treats that phrase specially--see https://help.github.com/articles/closing-issues-using-keywords/.

murphp15 · 2018-05-15T15:26:58Z

Fixes #1033

matt-gardner · 2018-05-15T15:29:03Z

Github is weird - you have to edit the original PR description with the Fixes #1033 comment, instead of adding a new comment with that text.

murphp15 · 2018-05-15T15:39:49Z

@matt-gardner
I have:

updated the title.
I squashed all commits into a single one and renamed it to "Fixes Attention matrix in textual entailment demo #1033".

Does that look ok to you now?

matt-gardner · 2018-05-15T15:44:05Z

Yeah, this looks much better, thanks. Future commits on this PR (if there are any) don't need to have the same "Fixes" message, they should have something more descriptive of the contents of that commit. We also have typically just put the "Fixes" message in the original PR description instead of a commit message, but either is fine.

murphp15 · 2018-05-15T15:47:41Z

Cool I'll do this from now on.

matt-gardner

Other than switching from a MetadataField to using the Predictor to pass the tokens to the demo, looks great, thanks for the PR!

matt-gardner · 2018-05-15T15:54:30Z

allennlp/data/dataset_readers/snli.py

@@ -71,6 +71,10 @@ def text_to_instance(self,  # type: ignore
        hypothesis_tokens = self._tokenizer.tokenize(hypothesis)
        fields['premise'] = TextField(premise_tokens, self._token_indexers)
        fields['hypothesis'] = TextField(hypothesis_tokens, self._token_indexers)
+        fields['metadata'] = MetadataField({


Instead of adding a MetadataField that the model doesn't really need, I think this would be cleaner if you just modified the Predictor. The Predictor can tokenize the text and pass the tokens on as output to the demo. That would look something like this:

allennlp/allennlp/service/predictors/sentence_tagger.py

Lines 27 to 39 in f246da7

@overrides

def _json_to_instance(self, json_dict: JsonDict) -> Tuple[Instance, JsonDict]:

"""

Expects JSON that looks like ``{"sentence": "..."}``.

Runs the underlying model, and adds the ``"words"`` to the output.

"""

sentence = json_dict["sentence"]

tokens = self._tokenizer.split_words(sentence)

instance = self._dataset_reader.text_to_instance(tokens)

return_dict: JsonDict = {"words":[token.text for token in tokens]}

return instance, return_dict

You'd have to grab the _tokenizer from self._dataset_reader in the Predictor (with # pylint: disable=protected-access) and use that to tokenize the text, instead of instantiating a tokenizer in __init__. You'd also need to modify this DatasetReader's text_to_instance method to optionally take pre-tokenized text, which would look something like how passage_tokens is treated here:

allennlp/allennlp/data/dataset_readers/reading_comprehension/squad.py

Lines 74 to 83 in f246da7

@overrides

def text_to_instance(self, # type: ignore

question_text: str,

passage_text: str,

char_spans: List[Tuple[int, int]] = None,

answer_texts: List[str] = None,

passage_tokens: List[Token] = None) -> Instance:

# pylint: disable=arguments-differ

if not passage_tokens:

passage_tokens = self._tokenizer.tokenize(passage_text)

matt-gardner · 2018-05-15T15:55:46Z

allennlp/models/decomposable_attention.py

@@ -164,13 +165,29 @@ def forward(self,  # type: ignore

        output_dict = {"label_logits": label_logits, "label_probs": label_probs}

+        self.add_insight_to_output_dict(metadata, output_dict, p2h_attention, h2p_attention)


After switching the MetadataField to something on the Predictor, it seems a bit of overkill to add a method just to add the two attention fields to output_dict. I'd just put those two lines inline here.

matt-gardner · 2018-05-15T15:57:52Z

allennlp/models/decomposable_attention.py

+        hypothesis_tokens = []
+        if metadata is not None:
+            for datum in metadata:
+                premise_tokens.append(datum['premise_tokens'][:len(datum['premise_tokens'])-1])


Just FYI (because this should be removed, anyway), the MetadataField does not get padded, so you don't need to do this [:len...] stuff. It will just be the exact list you created in the DatasetReader.

murphp15 · 2018-05-16T11:52:16Z

PR updated.

1. The predictor is now responsible for tokenizing hypothesis and premise. 2. The model no longer takes the metadata parameter anymore.

matt-gardner

Looks great, thanks!

…ai#1219) * Fixes allenai#1033 * changes following PR review. 1. The predictor is now responsible for tokenizing hypothesis and premise. 2. The model no longer takes the metadata parameter anymore. * Removed some extra blank lines * Fix spacing issues

murphp15 changed the title ~~Fix for https://github.com/allenai/allennlp/issues/1033~~ Add the attention visualization to the textual entailment demo May 15, 2018

murphp15 changed the title ~~Add the attention visualization to the textual entailment demo~~ Add the attention visualization to the textual entailment demo Fixes #1033 May 15, 2018

murphp15 changed the title ~~Add the attention visualization to the textual entailment demo Fixes #1033~~ Add the attention visualization to the textual entailment demo May 15, 2018

murphp15 force-pushed the feature/entailment-heatmap branch from 3ab0f14 to 9b9bddb Compare May 15, 2018 15:37

Fixes allenai#1033

9cbd543

murphp15 force-pushed the feature/entailment-heatmap branch from 9b9bddb to 9cbd543 Compare May 15, 2018 15:38

matt-gardner reviewed May 15, 2018

View reviewed changes

murphp15 force-pushed the feature/entailment-heatmap branch 2 times, most recently from 91ae8f2 to 77b60ec Compare May 16, 2018 13:14

changes following PR review.

8461056

1. The predictor is now responsible for tokenizing hypothesis and premise. 2. The model no longer takes the metadata parameter anymore.

murphp15 force-pushed the feature/entailment-heatmap branch from 77b60ec to 8461056 Compare May 16, 2018 13:26

matt-gardner added 3 commits May 16, 2018 07:31

Removed some extra blank lines

7a70d82

Fix spacing issues

9e92ebc

Merge branch 'master' into feature/entailment-heatmap

7d3e6ff

matt-gardner approved these changes May 16, 2018

View reviewed changes

matt-gardner merged commit 10ea3b3 into allenai:master May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the attention visualization to the textual entailment demo #1219

Add the attention visualization to the textual entailment demo #1219

murphp15 commented May 15, 2018

joelgrus commented May 15, 2018

schmmd commented May 15, 2018

murphp15 commented May 15, 2018

matt-gardner commented May 15, 2018

murphp15 commented May 15, 2018

matt-gardner commented May 15, 2018

murphp15 commented May 15, 2018

matt-gardner left a comment

matt-gardner May 15, 2018

matt-gardner May 15, 2018

matt-gardner May 15, 2018

murphp15 commented May 16, 2018

matt-gardner left a comment

	@overrides
	def _json_to_instance(self, json_dict: JsonDict) -> Tuple[Instance, JsonDict]:
	"""
	Expects JSON that looks like ``{"sentence": "..."}``.
	Runs the underlying model, and adds the ``"words"`` to the output.
	"""
	sentence = json_dict["sentence"]
	tokens = self._tokenizer.split_words(sentence)
	instance = self._dataset_reader.text_to_instance(tokens)

	return_dict: JsonDict = {"words":[token.text for token in tokens]}

	return instance, return_dict

	@overrides
	def text_to_instance(self, # type: ignore
	question_text: str,
	passage_text: str,
	char_spans: List[Tuple[int, int]] = None,
	answer_texts: List[str] = None,
	passage_tokens: List[Token] = None) -> Instance:
	# pylint: disable=arguments-differ
	if not passage_tokens:
	passage_tokens = self._tokenizer.tokenize(passage_text)

		@@ -164,13 +165,29 @@ def forward(self, # type: ignore

		output_dict = {"label_logits": label_logits, "label_probs": label_probs}

		self.add_insight_to_output_dict(metadata, output_dict, p2h_attention, h2p_attention)

Add the attention visualization to the textual entailment demo #1219

Add the attention visualization to the textual entailment demo #1219

Conversation

murphp15 commented May 15, 2018

joelgrus commented May 15, 2018

schmmd commented May 15, 2018

murphp15 commented May 15, 2018

matt-gardner commented May 15, 2018

murphp15 commented May 15, 2018

matt-gardner commented May 15, 2018

murphp15 commented May 15, 2018

matt-gardner left a comment

Choose a reason for hiding this comment

matt-gardner May 15, 2018

Choose a reason for hiding this comment

matt-gardner May 15, 2018

Choose a reason for hiding this comment

matt-gardner May 15, 2018

Choose a reason for hiding this comment

murphp15 commented May 16, 2018

matt-gardner left a comment

Choose a reason for hiding this comment