You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is more of a general question or feature request.
The textacy.extract.subject_verb_object_triples() is really interesting and useful, but I notice for a lot of texts, it ends up returning triples with pronouns in the subject or object. For most NLP tasks, these anaphora need to be resolved to one of the discrete nouns seen earlier. Is there anything in textacy to accomplish this?
A naive approach would be to iterate over the results and track the last non-ananphora entity, and replace all subsequent anaphora with that entity. This will mis cases where the anaphora refers to the object or verb, but it's better than nothing.
The text was updated successfully, but these errors were encountered:
Hey @chrisspen , thanks for the feature request. I feel your pain... I've actually tried the "naive approach" you mentioned, but found its results too poor to include in textacy. And doing anaphora resolution well is sufficiently hard that I never got around to tackling it.
So, I'll add this back into my backlog. It would be a very useful thing to have! If you have any ideas / resources, don't hesitate to post here.
This is more of a general question or feature request.
The
textacy.extract.subject_verb_object_triples()
is really interesting and useful, but I notice for a lot of texts, it ends up returning triples with pronouns in the subject or object. For most NLP tasks, these anaphora need to be resolved to one of the discrete nouns seen earlier. Is there anything in textacy to accomplish this?A naive approach would be to iterate over the results and track the last non-ananphora entity, and replace all subsequent anaphora with that entity. This will mis cases where the anaphora refers to the object or verb, but it's better than nothing.
The text was updated successfully, but these errors were encountered: