Add coref_resolved method to CorefPredictor #2296

shellshock1911 · 2019-01-07T17:44:48Z

Resolves coreferences by producing a document that has had
its coreferences substituted with their main mentions
Ex:
Personal

"Charlie wants to buy a game, so he can play it with friends."
-->
"Charlie wants to buy a game, so Charlie can play a game
with friends."

Possessive

"Stocks also got a boost after China took steps to encourage
bank lending and stimulate the country's flagging economy."
-->
"Stocks also got a boost after China took steps to encourage
bank lending and stimulate China's flagging economy."

shellshock1911 · 2019-01-07T17:47:03Z

schmmd · 2019-01-07T17:54:46Z

allennlp/predictors/coref.py

+                else:
+                    resolved[coref[0]] = mention_span.text + final_token.whitespace_
+                for i in range(coref[0] + 1, coref[1] + 1):
+                    resolved[i] = ""


What is this doing exactly?

Since resolved is a list of the tokens in the document with white space, line 96 replaces the first coreference token with the mention it refers to, while lines 97-98 goes through and masks out all subsequent tokens with "", so that they are ultimately eliminated in the "".join that follows. This procedure is necessary for replacing multi-word coreferences with a single mention, e.g. "the country" with "China" in example. This logic is borrowed from lines 262-271 here.

Ah--that makes sense. I recommend turning part of your explanation into a comment.

Sounds good, yea I'll add that comment. resolved.remove(i) would work as well, but it's slightly more inefficient (quadratic vs. linear).

schmmd · 2019-01-07T17:57:19Z

allennlp/predictors/coref.py

+                # Correctly formats possessive mentions with 's endings
+                # These include my, his, her, as well as persons's, computer's, etc.
+                if final_token.tag_ in ["PRP$", "POS"]:
+                    resolved[coref[0]] = mention_span.text + "'s" + final_token.whitespace_


Perhaps Spacy avoids reconstructing a sentence because the possessive suffix depends on the resolved word. E.g. "Michael's" (singluar) but "zebras'" (plural). Your logic seems like a reasonable compromise, but I would expect a clear disclaimer.

Yes, that is one area where mistakes in grammar might occur, in producing "zebras's" instead of "zebras'". In the spaCy neuralcoref extension, you're right that it doesn't add this reconstruction, and instead replaces possessive pronouns with their main mention directly, e.g. "the country's" becomes "China". I can see how certain applications would benefit from maintaining the possessive form, while others would do fine without it. More if..else style code could be added to guarantee grammatical accuracy in these cases, but it's probably stretching the logic too far and opening up further sources of error.

Yep--it would be stretching the logic too far. I recommend just adding a disclaimer about the issue in the comments of this method.

schmmd · 2019-01-07T17:58:07Z

@shellshock1911 thanks for the examples in the description. I think we should have them as test cases if this is to be merged.

shellshock1911 · 2019-01-07T18:11:05Z

These are the examples I was using for testing, that could be used as test cases here as well:

"Charlie wants to buy chocolate. He loves it so much. Julie wants to buy fruit. That is what she loves."
"The woman reading a newspaper sat on the bench with her dog."
"At the American Economic Association conference in Atlanta Friday morning, Federal Reserve Chairman Jerome Powell reiterated the strengths of the economy but also signaled the Fed would remain flexible in its management of interest rates. The Fed has worked to gradually increase interest rates over the past couple years and signaled that it would hike rates two or three times next year."
"Stocks also got a boost after China took steps to encourage bank lending and stimulate the country's flagging economy."
"The People's Bank of China announced it would slash the amount of money that banks are required to hold in reserve, the latest in a series of policy changes the government has taken to support growth."

Testing these shows that the method can accurately handle both multi-word main mentions and coreferences, as well as both personal and possessive pronouns. Mistakes in resolution, such as in the third example, result from errors in the underlying model's prediction, and so are out of the scope of this contribution.

schmmd · 2019-01-15T23:37:22Z

@shellshock1911 I'd like to merge this but we need two things:

The build breaks must be fixed.
You should add a unit test for this functionality.

Then we're good to go!

shellshock1911 · 2019-01-22T02:47:51Z

@schmmd - Awesome, I've made changes to fix issues with the linting and types, but I kept failing my unit tests in the build system, although they pass locally. I looked into more and realized that the serialized models provided in allennlp/tests/fixtures are much smaller (52K compressed) than the ones in the online docs (57M), therefore their behavior is different on the same inputs. Are these models meant to demonstrate the expected interface of a particular model, without any accuracy guarantees? If that's the case, then should the unit tests check the type and structure of a models output, but not check for correctness? Thanks.

schmmd · 2019-01-22T15:53:53Z

@shellshock1911 it looks like there's still a linting issue.

For your test case, I probably wouldn't call a predictor in the test case. I would probably have only one or two examples and provide hard-coded predictions as inputs. The test would then check if resolving the example gives the text you intended.

That way your test will run faster and will be independent of the actual coreference models.

schmmd · 2019-01-22T16:04:02Z

@shellshock1911 to more directly answer your question--yes the test fixture models are used to test interfaces and not correctness. But for your unit test, I would avoid using any models and instead hard-code predictions.

* Resolves coreferences by producing a document that has had its coreferences substituted with their main mentions * Ex: Personal ======== "Charlie wants to buy a game, so he can play it with friends." --> "Charlie wants to buy a game, so Charlie can play a game with friends." Possessive ========== "Stocks also got a boost after China took steps to encourage bank lending and stimulate the country's flagging economy." --> "Stocks also got a boost after China took steps to encourage bank lending and stimulate China's flagging economy."

shellshock1911 · 2019-01-25T23:06:54Z

Thanks for approving this PR!

schmmd · 2019-01-25T23:30:19Z

Thanks for the hard work! I'm not going to merge it right now as I'm just about to head out for a week vacation--but someone else may or I will as soon as I'm back!

shellshock1911 · 2019-02-15T03:50:58Z

Any thoughts on this? Changes in the upstream introduced a linting issue, but I resolved that and now the build is good 😄.

matt-gardner · 2019-04-19T22:25:49Z

@schmmd, looks like you forgot about this - feel free to be brave and click merge if the tests pass =).

shellshock1911 · 2019-04-20T14:01:30Z

Thanks guys! I do still think this would be a useful utility to the coreference predictor. In recent weeks, I've been running news article processing pipeline with this fork of AllenNLP, where this in-place resolution method has been applied on millions of articles without apparent error, i.e. no crashes. It'd be awesome to have though in the upstream and I'm sure others could benefit as well since it automates the coreference resolution step of pre-processing in a larger NLP pipeline.

matt-gardner · 2019-04-20T14:13:40Z

Thanks @shellshock1911! Sorry this fell through the cracks and took so long to get merged.

* Add coref_resolved method to CorefPredictor * Resolves coreferences by producing a document that has had its coreferences substituted with their main mentions * Ex: Personal ======== "Charlie wants to buy a game, so he can play it with friends." --> "Charlie wants to buy a game, so Charlie can play a game with friends." Possessive ========== "Stocks also got a boost after China took steps to encourage bank lending and stimulate the country's flagging economy." --> "Stocks also got a boost after China took steps to encourage bank lending and stimulate China's flagging economy." * Remove redudant Doc import

schmmd reviewed Jan 7, 2019

View reviewed changes

schmmd self-assigned this Jan 15, 2019

shellshock1911 force-pushed the coref_resolved branch from 4b6d947 to c5b31ad Compare January 25, 2019 21:11

shellshock1911 force-pushed the coref_resolved branch from a53367d to b20d12a Compare January 25, 2019 22:21

schmmd approved these changes Jan 25, 2019

View reviewed changes

Merge branch 'master' into coref_resolved

1089c68

DeNeutoy and others added 4 commits February 11, 2019 10:06

Merge branch 'master' into coref_resolved

9cf0f09

Merge branch 'master' into coref_resolved

d9a3274

Merge branch 'master' into coref_resolved

7d1ecaf

Remove redudant Doc import

7b1323b

Merge branch 'master' into coref_resolved

cb00689

matt-gardner requested a review from schmmd April 19, 2019 22:25

matt-gardner merged commit deae1db into allenai:master Apr 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coref_resolved method to CorefPredictor #2296

Add coref_resolved method to CorefPredictor #2296

shellshock1911 commented Jan 7, 2019

shellshock1911 commented Jan 7, 2019

schmmd Jan 7, 2019

shellshock1911 Jan 7, 2019

schmmd Jan 7, 2019

shellshock1911 Jan 7, 2019 •

edited

Loading

schmmd Jan 7, 2019

shellshock1911 Jan 7, 2019 •

edited

Loading

schmmd Jan 7, 2019

schmmd commented Jan 7, 2019

shellshock1911 commented Jan 7, 2019

schmmd commented Jan 15, 2019

shellshock1911 commented Jan 22, 2019 •

edited

Loading

schmmd commented Jan 22, 2019

schmmd commented Jan 22, 2019

shellshock1911 commented Jan 25, 2019

schmmd commented Jan 25, 2019

shellshock1911 commented Feb 15, 2019

matt-gardner commented Apr 19, 2019

shellshock1911 commented Apr 20, 2019

matt-gardner commented Apr 20, 2019

Add coref_resolved method to CorefPredictor #2296

Add coref_resolved method to CorefPredictor #2296

Conversation

shellshock1911 commented Jan 7, 2019

Ex: Personal

Possessive

shellshock1911 commented Jan 7, 2019

schmmd Jan 7, 2019

Choose a reason for hiding this comment

shellshock1911 Jan 7, 2019

Choose a reason for hiding this comment

schmmd Jan 7, 2019

Choose a reason for hiding this comment

shellshock1911 Jan 7, 2019 • edited Loading

Choose a reason for hiding this comment

schmmd Jan 7, 2019

Choose a reason for hiding this comment

shellshock1911 Jan 7, 2019 • edited Loading

Choose a reason for hiding this comment

schmmd Jan 7, 2019

Choose a reason for hiding this comment

schmmd commented Jan 7, 2019

shellshock1911 commented Jan 7, 2019

schmmd commented Jan 15, 2019

shellshock1911 commented Jan 22, 2019 • edited Loading

schmmd commented Jan 22, 2019

schmmd commented Jan 22, 2019

shellshock1911 commented Jan 25, 2019

schmmd commented Jan 25, 2019

shellshock1911 commented Feb 15, 2019

matt-gardner commented Apr 19, 2019

shellshock1911 commented Apr 20, 2019

matt-gardner commented Apr 20, 2019

Ex:
Personal

shellshock1911 Jan 7, 2019 •

edited

Loading

shellshock1911 Jan 7, 2019 •

edited

Loading

shellshock1911 commented Jan 22, 2019 •

edited

Loading