Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace leftover UMLS custom output generation with write_compendia() #126

Open
gaurav opened this issue Apr 23, 2023 · 0 comments
Open

Comments

@gaurav
Copy link
Collaborator

gaurav commented Apr 23, 2023

The leftover UMLS file currently replicates the compendium and synonym generation code in babel_utils.write_compendia():

# Write this UMLS term to UMLS.txt as a single-identifier term.
cluster = {
'type': biolink_type,
'identifiers': [{
'i': umls_id,
'l': label,
}]
}
compendiumf.write(json.dumps(cluster) + "\n")
umls_ids_in_this_compendium.add(umls_id)
logging.debug(f"Writing {cluster} to {compendiumf}")
logging.info(f"Wrote out {len(umls_ids_in_this_compendium)} UMLS IDs into the leftover UMLS compendium.")
reportf.write(f"Wrote out {len(umls_ids_in_this_compendium)} UMLS IDs into the leftover UMLS compendium.\n")
logging.info(f"Found {count_no_umls_type} UMLS IDs without UMLS types and {count_multiple_umls_type} UMLS IDs with multiple UMLS types.")
reportf.write(f"Found {count_no_umls_type} UMLS IDs without UMLS types and {count_multiple_umls_type} UMLS IDs with multiple UMLS types.\n")
# Write out synonyms for all IDs in this compendium.
synonym_ids = set()
count_synonyms = 0
with open(synonyms, 'r') as synonymsf, open(umls_synonyms, 'w') as umls_synonymsf:
for line in synonymsf:
id, relation, synonym = line.rstrip().split('\t')
if id in umls_ids_in_this_compendium:
synonym_ids.add(id)
count_synonyms += 1
umls_synonymsf.write(f"{id}\t{relation}\t{synonym}\n")
logging.info(f"Wrote {count_synonyms} synonyms for {len(synonym_ids)} UMLS IDs into the leftover UMLS synonyms file.")
reportf.write(f"Wrote {count_synonyms} synonyms for {len(synonym_ids)} UMLS IDs into the leftover UMLS synonyms file.\n")

This means that when the output formats change, we need to modify it in both write_compendia() and leftover UMLS.

We should replace this by having leftover UMLS generate an ID file (in e.g. intermediate/leftover_umls/ids/UMLS), and then use babel_uris.write_compendia() to generate the compendia and synonyms files.

gaurav added a commit that referenced this issue Apr 23, 2023
This is all pretty hacky, but I'll clean it up in #126
gaurav added a commit that referenced this issue May 16, 2023
This is all pretty hacky, but I'll clean it up in #126
@gaurav gaurav added this to the Babel December Release milestone Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant