-
Notifications
You must be signed in to change notification settings - Fork 4
feat(medcat): CU-869apb8ju Better ontology mapping #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(medcat): CU-869apb8ju Better ontology mapping #160
Conversation
alhendrickson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
medcat-v2/medcat/config/config.py
Outdated
| its outputs. It will use the mappings in `cdb.addl_info["cui2<ont>"]` | ||
| are present. | ||
| If set to "auto" (or missign), the value will be inferred from available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo missign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the typo is not missing!
medcat-v2/medcat/cat.py
Outdated
| # other ontologies | ||
| if self.config.general.map_to_other_ontologies: | ||
| for ont in self.config.general.map_to_other_ontologies: | ||
| other_onts = self.config.general.map_to_other_ontologies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, I'm kind of feeling like as you already have the if statement in the function, you can just always call it to keep it simple here.
other_onts =self._set_and_get_mapped_ontologies()
for ont in other_onts
...
# Then in the function
other_onts = self.config.general.map_to_other_ontologies
if other_onts == "auto": # It always does this check anywayAs the function you wrote basically does the same if statement, and after the first run it isn't auto anymore.
As the alternative, remove the if statement from the function and trust that it's only called at the right time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good call.
medcat-v2/medcat/cat.py
Outdated
|
|
||
| def _set_and_get_mapped_ontologies( | ||
| self, | ||
| ignore_list: list[str] = ["ontologies", "original_names", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - probably can use set[str] for the ignore list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a list, it's a list :)
But an ignore set may be better
| def setUpClass(cls): | ||
| super().setUpClass() | ||
| # add "mapping" | ||
| cls.model.cdb.addl_info[f"cui2{cls.MY_ONT_NAME}"] = cls.MY_ONT_MAPPING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - can add an explicit test test for the ignore_list and ignore_empty parameters for completeness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
alhendrickson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
This PR improves ontology mapping.
Currently, the default was set to a reasonable value for most production models.
However, if a different model was loaded (i.e a fake model for testing), the default would cause warnings to pop up due to an incorrect configuration.
The general assumption is that this value is set correctly during model creation time. However, for models created before this config option was introduced, there needs to be a reasonable default.
So this PR introduces a
"auto"option for this value.If set to
"auto", the ontologies to map to will be automatically inferred from the data available incdb.addl_info. This will allow models that didn't define this config option to have a decent starting point.A few notes on the process of automatically inferring the data from
cdb.addl_info:addl_info(e.g at conversion or init time)cui2<>keys inaddl_infodon't refer to ontology mappings