Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of duplicated terms #1645

Closed
d0choa opened this issue Jun 26, 2022 · 6 comments
Closed

Large number of duplicated terms #1645

d0choa opened this issue Jun 26, 2022 · 6 comments
Assignees

Comments

@d0choa
Copy link

d0choa commented Jun 26, 2022

At least in 3.42 and 3.43, there are a large number of duplicated terms in EFO mostly affecting rare diseases.

Just by lower-casing the names and looking for exact matches, there are 3036 duplicated terms (v3.42). Some of them are explained by disease vs phenotype conondrum, but the vast majority correspond to a MONDO vs Orphanet duplication.

Some examples:

Hemophilia Orphanet:448 - hemophilia MONDO:0018660
Fragile X syndrome Orphanet:908 - fragile X syndrome MONDO:0010383
Apert syndrome Orphanet:87 - apert syndrome MONDO:0007041

...

@zoependlington
Copy link
Collaborator

Hi @d0choa, I believe this is due to the gradual replacement of Orphanet terms with Mondo, all of these duplicates will eventually be an obsoleted Orphanet term with a replaced by link to the Mondo term. I will try to prioritise the removal of some of these in time for the July (18th) release.

@zoependlington zoependlington self-assigned this Jul 1, 2022
zoependlington added a commit that referenced this issue Jul 15, 2022
Further replacement of Orphanet terms with Mondo terms #1645
@zoependlington
Copy link
Collaborator

The Orphanet terms have now been obsoleted and replaced with Mondo terms which should now fix this duplication after the July release - please let me know if it persists.

@ireneisdoomed
Copy link

I have checked the latest release (3.44.0) and we no longer have Orphanet/MONDO duplication. Thanks @zoependlington!

However, there are still 49 examples with an identical name after converting them to lowercase. Some of them, like arterial occlusion might be coming from the disease vs. phenotype conundrum.
image

@d0choa
Copy link
Author

d0choa commented Aug 5, 2022

3 of these have already been fixed in #1698

Many others remain genuine duplications (e.g. polycistic kidney disease)

@zoependlington zoependlington reopened this Aug 5, 2022
@zoependlington
Copy link
Collaborator

I will add mappings for the following:

http://purl.obolibrary.org/obo/MONDO_0021184	http://www.ebi.ac.uk/efo/EFO_1001303	deltaretrovirus infections	deltaretrovirus infections
http://purl.obolibrary.org/obo/MONDO_0011014	http://www.ebi.ac.uk/efo/EFO_0009052	Pleuropulmonary blastoma	Pleuropulmonary blastoma
http://purl.obolibrary.org/obo/MONDO_0700092	http://www.ebi.ac.uk/efo/EFO_0010642	neurodevelopmental disorder	neurodevelopmental disorder
http://purl.obolibrary.org/obo/MONDO_0012368	http://www.ebi.ac.uk/efo/EFO_1001981	aminoacylase 1 deficiency	aminoacylase 1 deficiency
http://purl.obolibrary.org/obo/MONDO_0020642	http://www.ebi.ac.uk/efo/EFO_0008620	Polycystic Kidney Disease	Polycystic Kidney Disease
http://purl.obolibrary.org/obo/MONDO_0019165	http://www.ebi.ac.uk/efo/EFO_0009029	Central precocious puberty	Central precocious puberty
http://purl.obolibrary.org/obo/MONDO_0014776	http://www.ebi.ac.uk/efo/EFO_0009059	Spinocerebellar ataxia type 42	Spinocerebellar ataxia type 42

The rest have either been taken care of (the measurement terms pointed out above) or are phenotype vs disease.

zoependlington added a commit that referenced this issue Aug 23, 2022
Added mappings for duplicated EFO Mondo terms for #1645
@zoependlington
Copy link
Collaborator

These mappings have now been added so the only duplicates should now be between disease/phenotype terms. Please let me know if that isn't the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants