-
Notifications
You must be signed in to change notification settings - Fork 5
refactor(medcat): CU-869b44wz8 Better internal components #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(medcat): CU-869b44wz8 Better internal components #219
Conversation
…nents (e.g NER and Linker)
tomolopolis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - there's no edits required in ner/*.py, to use the new predict_ents? looks like they all use maybe_annotate_name, which didn't change?
Does this change remove the side effect of doc.ner_ents, doc.linked_ents, might be worth a test in here if it did?
There's a slight difference in how
You're right, a few additional tests here would be beneficial. The existing test suite runs so I'm reasonable confident everything is working as expected. But testing this explicitly (that these methods don't have side effects) would certainly be beneficial. I'll get to that tomorrow. |
…D (i.e old API) is used to preserve previous functionality
This PR is intended to improve internal components.
The issue is that the current setup required a new NER or linking component to re-implement quite a coupled setup to actually set the
ner_entsorlinked_entson theMutableDocument. This tight coupling wasn't well documented nor was it a good experience when writing a new component.Thus, this PR attempts to improve the situation by:
predict_entities(MutableDocument, list[MutableEntity] | None) -> list[MutableEntity]MutableDocument.ner_entsorMutableDocument.linked_entslinked_entsand will continue to do solinked_entsinmedcat.utils.postprocessing.create_main_annner_entsinmedcat.components.ner.vocab_based_annotator.maybe_annotate_nameMutableDocument.ner_entsandMutableDocument.linked_entsner_entsandlinked_entsare usedner_entslist would be changed alongsidelinked_entsduring linkingPS:
This PR (currently) changes the signature formedcat.utils.postprocessing.create_main_annandmedcat.components.ner.vocab_based_annotator.maybe_annotate_name. The former now requires a list of the entities as input (instead of reading it from the document). The latter now requires a current ID since it won't read the length of the existing list.The later versions address the above by:
create_main_annfilter_linked_annotationsmaybe_annotate_name