Skip to content

Conversation

@mart-r
Copy link
Collaborator

@mart-r mart-r commented Oct 1, 2025

This PR improves ontology mapping.

Currently, the default was set to a reasonable value for most production models.
However, if a different model was loaded (i.e a fake model for testing), the default would cause warnings to pop up due to an incorrect configuration.

The general assumption is that this value is set correctly during model creation time. However, for models created before this config option was introduced, there needs to be a reasonable default.

So this PR introduces a "auto" option for this value.
If set to"auto", the ontologies to map to will be automatically inferred from the data available in cdb.addl_info. This will allow models that didn't define this config option to have a decent starting point.

A few notes on the process of automatically inferring the data from cdb.addl_info:

  • Some values are set by default in addl_info (e.g at conversion or init time)
    • However, they may actually be empty mappings
    • So the default behaviour is to ignore empty mappings
  • Some cui2<> keys in addl_info don't refer to ontology mappings
    • They refer to other information
    • Such default values are omitted from being used (by default)

@tomolopolis
Copy link
Member

Copy link
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

its outputs. It will use the mappings in `cdb.addl_info["cui2<ont>"]`
are present.
If set to "auto" (or missign), the value will be inferred from available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo missign

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the typo is not missing!

# other ontologies
if self.config.general.map_to_other_ontologies:
for ont in self.config.general.map_to_other_ontologies:
other_onts = self.config.general.map_to_other_ontologies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, I'm kind of feeling like as you already have the if statement in the function, you can just always call it to keep it simple here.

other_onts =self._set_and_get_mapped_ontologies() 
for ont in other_onts
...

# Then in the function

  other_onts = self.config.general.map_to_other_ontologies
        if other_onts == "auto": # It always does this check anyway

As the function you wrote basically does the same if statement, and after the first run it isn't auto anymore.

As the alternative, remove the if statement from the function and trust that it's only called at the right time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good call.


def _set_and_get_mapped_ontologies(
self,
ignore_list: list[str] = ["ontologies", "original_names",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - probably can use set[str] for the ignore list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a list, it's a list :)
But an ignore set may be better

def setUpClass(cls):
super().setUpClass()
# add "mapping"
cls.model.cdb.addl_info[f"cui2{cls.MY_ONT_NAME}"] = cls.MY_ONT_MAPPING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - can add an explicit test test for the ignore_list and ignore_empty parameters for completeness

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@mart-r mart-r merged commit f5a7d6c into main Oct 2, 2025
20 checks passed
@mart-r mart-r deleted the feat/medcat/CU-869apb8ju-better-ontology-mapping branch October 2, 2025 12:46
@mart-r mart-r mentioned this pull request Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants