Fix division by zero when there are zero-length spans in MismatchedEmbedder. #4615

dwadden · 2020-08-30T23:44:03Z

Fixes #4612 and adds a unit test to confirm that missing or exotic tokens to not lead to nan gradients in BERT-style embedder.

Fix `clamp_min` on embeddings. Implment MattG's fix for NaN gradients in MismatchedEmbedder.

matt-gardner

Thanks for doing this, this looks great! There are a couple of minor things to clean up in the test, and can you add a simple note to the changelog saying something like "Fixed division by zero error when there are zero-length spans in the input to a mismatched embedder."

matt-gardner · 2020-08-31T17:53:15Z

tests/modules/token_embedders/pretrained_transformer_mismatched_embedder_test.py

+        params = Params(
+            {
+                "token_embedders": {
+                    "bert": {
+                        "type": "pretrained_transformer_mismatched",
+                        "model_name": "bert-base-uncased",
+                    }
+                }
+            }
+        )
+        token_embedder = BasicTextFieldEmbedder.from_params(vocab=vocab, params=params)


Can you just make this:

Suggested change

params = Params(

{

"token_embedders": {

"bert": {

"type": "pretrained_transformer_mismatched",

"model_name": "bert-base-uncased",

}

}

}

)

token_embedder = BasicTextFieldEmbedder.from_params(vocab=vocab, params=params)

token_embedder = BasicTextFieldEmbedder({"bert": PretrainedTransformerMismatchedEmbedder("bert-base-uncased")})

No need to use Params here. (be sure to run black on that line, it might be too long, and this might require adding some imports above)

tests/modules/token_embedders/pretrained_transformer_mismatched_embedder_test.py

Fixed division by zero error when there are zero-length spans in the input to a mismatched embedder.

dwadden · 2020-08-31T23:34:44Z

Changes have been made. There are two transformer parameters that have None gradients. Not sure if this is a bug or not. Let me know.

tests/modules/token_embedders/pretrained_transformer_mismatched_embedder_test.py

matt-gardner · 2020-08-31T23:52:51Z

Thanks @dwadden! I tried pushing a couple of small final fixes to get this to pass CI, but for some reason it didn't let me (maybe because this is from your master branch). I was able to do one of them from the web UI, but I can't add the changelog statement that I mentioned in my comment above. Can you do that? Then I'll merge this.

dwadden · 2020-09-01T00:32:24Z

OK, I added a message to the changelog. I also got confused and did git push --force on my fork, which clobbered your formatting changes. I ran black to get them back, so now the linter and type checker pass.

matt-gardner · 2020-09-01T15:28:01Z

Thanks again!

Implment MattG's fix for NaN gradients in MismatchedEmbedder.

b2bb2a3

Fix `clamp_min` on embeddings. Implment MattG's fix for NaN gradients in MismatchedEmbedder.

dwadden mentioned this pull request Aug 30, 2020

PretrainedTransformerMismatchedIndexer fails silently when given empty strings as input. #4612

Closed

10 tasks

matt-gardner changed the title ~~Implment MattG's fix for NaN gradients in MismatchedEmbedder.~~ Fix division by zero when there are zero-length spans in MismatchedEmbedder. Aug 31, 2020

matt-gardner approved these changes Aug 31, 2020

View reviewed changes

Fix NaN gradients caused by weird tokens in MismatchedEmbedder.

07c849d

Fixed division by zero error when there are zero-length spans in the input to a mismatched embedder.

matt-gardner reviewed Aug 31, 2020

View reviewed changes

tests/modules/token_embedders/pretrained_transformer_mismatched_embedder_test.py Outdated Show resolved Hide resolved

Add changelog message.

e52a7ac

dwadden force-pushed the master branch from df30f80 to e52a7ac Compare September 1, 2020 00:26

Re-run black to get code formatting right.

126b96e

matt-gardner added 2 commits September 1, 2020 08:16

Merge branch 'master' into master

84cb7df

combine fixed sections after merging with master

7c8e4a9

matt-gardner merged commit 711afaa into allenai:master Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix division by zero when there are zero-length spans in MismatchedEmbedder. #4615

Fix division by zero when there are zero-length spans in MismatchedEmbedder. #4615

dwadden commented Aug 30, 2020 •

edited by matt-gardner

matt-gardner left a comment

matt-gardner Aug 31, 2020

dwadden Aug 31, 2020

dwadden commented Aug 31, 2020

matt-gardner commented Aug 31, 2020

dwadden commented Sep 1, 2020

matt-gardner commented Sep 1, 2020

Fix division by zero when there are zero-length spans in MismatchedEmbedder. #4615

Fix division by zero when there are zero-length spans in MismatchedEmbedder. #4615

Conversation

dwadden commented Aug 30, 2020 • edited by matt-gardner

matt-gardner left a comment

Choose a reason for hiding this comment

matt-gardner Aug 31, 2020

Choose a reason for hiding this comment

dwadden Aug 31, 2020

Choose a reason for hiding this comment

dwadden commented Aug 31, 2020

matt-gardner commented Aug 31, 2020

dwadden commented Sep 1, 2020

matt-gardner commented Sep 1, 2020

dwadden commented Aug 30, 2020 •

edited by matt-gardner