Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translating existing strings into HPO terms #2

Closed
Zethson opened this issue Aug 9, 2021 · 2 comments
Closed

Translating existing strings into HPO terms #2

Zethson opened this issue Aug 9, 2021 · 2 comments

Comments

@Zethson
Copy link

Zethson commented Aug 9, 2021

Hi,

I have sets of strings which are not yet in HPO form, but should be translated into HPO terms. In many cases they are (when comparing strings) already very close.

What would be the best way to map these strings to their corresponding HPO term (with uncertainty estimate maybe aka number of character mismatches)?

If I search via

for term in Ontology.search(MYOWNTERM):
print(term.name)

will I get the best matches or are they sorted alphabetically?
Any pointers in general?

@anergictcell
Copy link
Owner

Hi,
If Ontology.search performs a substring match on all term names and synonyms. This will only work if your term is a substring. It does not work for strings with character mismatches.
The order it returns results is random and not sorted. This was done for performance reasons, i.e. you will get your first result even though it did not yet finish checking all HPO terms in the Ontology.

For your use case you'd have to implement a matching/search function yourself. You would have to do something like

  1. Get all names and synonyms for every HPO term
  2. Search this list of strings, allowing for mismatches

For (1), you could do something like

strings: Dict[str, List[HPOTerm]] = {}

def name_to_term(name: str, term: HPOTerm):
    if name not in strings:
        strings[name] = [term]


for term in Ontology:
    name_to_term(term.name, term)
    for syn in term.synonym:
        name_to_term(syn, term)

Then compare your search query with the keys of strings and return the associated HPOTerms.

@Zethson
Copy link
Author

Zethson commented Aug 11, 2021

Thank you very much for your pointers!
I will reopen this issue if I run into any troubles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants