Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are a number of CHEBI terms that are getting duplicate labels #64

Closed
ddooley opened this issue Jun 1, 2021 · 34 comments
Closed
Assignees

Comments

@ddooley
Copy link

ddooley commented Jun 1, 2021

This was reported in a MAXO issue: monarch-initiative/MAxO#230 (comment)

image

Can you see if they are occurring as leftovers of a CHEBI import or artifacts of a chebi include file?

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 2, 2021

As reported by @LCCarmody in monarch-initiative/MAxO#230 CDNO is adding labels to CHEBI terms e.g, calcium and calcium atom above. We actually did this on purpose to relabel the classes for nutritionists who expect the other labels, following those used in an existing nutrition framework. Perhaps this isn't best practice I'm not certain. @LCCarmody's issue in MAxO comes in via the foodon imports of CHEBI terms with additional labels in CDNO.

I see 2 possible options on our side:

  1. Not inject labels in CDNO, but I'm not sure if that works for CDNO's intended use-cases @LilyAndres @CropStoreDb?

  2. Have foodon only import base CDNO, e.g. only CDNO namespace terms not CHEBI terms reimported from CDNO. I'm not exactly sure how to go about that but I think it should be possible using either the cdno-base.owl or some other processing with robot imports.

Hope that helps.

@LilyAndres
Copy link
Collaborator

Thanks for highlighting this @LCCarmody and @ddooley.

As @kaiiam mentioned, in CDNO we wanted to describe nutritional components with hierarchical classifications relevant to the nutrition domain, this would also help Food Composition Tables (FCTs) such as USDA, INFOODs and EuroFIR to associate data, however, some terms in CHEBI have a label that is not used in nutrition or in the FCTs. For this reason, we decided to change the labels to a more nutritional audience, maybe is incorrect.

See examples below:
CHEBI:131693 - 7,10,13,16-docosatetraenoic acid = CDNO: docosatetraenoic acid
CHEBI:25107 - magnesium atom = CDNO: magnesium
CHEBI:25555 - nitrogen atom = CDNO: nitrogen
CHEBI:62466 - 7,7',9,9'-tetra-cis-lycopene = CDNO: cis-lycopene

From the 2 possible options that @kaiiam mentioned:

Not inject labels in CDNO

Might be the best option, as it's more important for FoodOn and CDNO collaborators to maintain the hierarchical classification we have in CDNO, otherwise, FoodOn might end up importing the exhaustive hierarchical classifications from CHEBI (see example Fig2.). Also if people have the same problem again, CDNO might not be super helpful for other ontologies, especially for FoodOn.

@CropStoreDb do you think option 2 is ok?

@LCCarmody
Copy link

I need the labels not to be changed for what I am doing. I already import from CHEBI directly, but I was importing these from FOODON for additional context/hierarchy. I am not sure how many are changed, but this also effects the Vitamin branch (Vitamin B, water soluble vitamin, etc.). Maybe the solution is to make a CDNO (or FOODON) term with a logical definition referencing the CHEBI term? Maybe CHEBI term 'and has some role food'? Or other logical definition? Not sure what would work for your purpose.

@ddooley
Copy link
Author

ddooley commented Jun 2, 2021

A possible solution is for CDNO to add an IAO 'alternative term' (a kind of synonym) with the nutritionist friendly label. I'm suggesting that rather than oboInOwl:hasExactSynonym because the latter can get lost in a pile of synonyms. A nice feature of protege is that you can give 'alternative term' primacy over rdfs:label so the interface shows an entity by that name too.

In GenEpiO I played with a "user interface label" annotation to do the same thing. I was trying to be explicit that the given text was being used in a software interface.

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 3, 2021

@ddooley thats a reasonable suggestion, but I think doing that depends on CDNO's intended use-cases @LilyAndres @CropStoreDb can you comment?

If CDNO needs to have those as primary labels then could we modify FOODON's CDNO import not to take injected axioms or APs from CDNO?

@LilyAndres
Copy link
Collaborator

I'm suggesting that rather than oboInOwl:hasExactSynonym because the latter can get lost in a pile of synonyms. A nice feature of protege is that you can give 'alternative term' primacy over rdfs:label so the interface shows an entity by that name too.

Thanks @ddooley you are correct, about the synonyms. I think your solution might work, I'm just trying to find an example so I can ask the team if this is something we could have.

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 3, 2021

@matentzn I presume it was bad practice for us to have added additional labels to imported CHEBI terms in CDNO? Would you support the use of alternative term AP's instead? Or do you have another suggestion?

@LCCarmody
Copy link

Maybe it is my ignorance, but I thought if you wanted to change the term, including adding exact synonyms, you should go back to the original ontology (in this case, CHEBI) to make the change. Otherwise, the terms imported from CHEBI and the term imported from CDNO (or other ontologies) have the potential to be different terms.

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 3, 2021

I think the intention is to only adding domain specific synonyms to CHEBI terms, and in some cases simplifying the CHEBI hierarchy. I don't think injecting synonyms leads to the potential of being different terms, injecting axioms could cause problems though for downstream imports.

@LCCarmody
Copy link

Sorry if i wasn’t clear, I suggested adding a new term inCDNO with an axiom referring to the CHEBI term. And, yes, that would alter downstream imports of the CDNO term, but would have no effect on the CHEBI term. If you have domain specific terminology to add to a term, then it shouldn’t be added to a CHEBI term, it should be added to a CDNO term, perhaps a child term to the CHEBI term. But changing the CHEBI term, can alter its meaning. Engaging with CHEBI to add your synonym may be an option.

@ddooley
Copy link
Author

ddooley commented Jun 3, 2021

CDNO was aiming to use an exact synonym basically for its user community - so an annotation - rather than offering its community an entity that has some semantic difference. So I don't think CDNO subclassing to CHEBI is appropriate. The question is how best to offer a label for the term that suits CDNO user community. That's why I was proposing "alternative term" in CDNO. But I am interested if OBOFoundry practice would be to request that of CHEBI directly? Or annotating CDNO alternative term to make clear it originated in CDNO?

@matentzn
Copy link
Contributor

matentzn commented Jun 3, 2021

We should never change labels of upstream terms, or any annotations for that matter. It is ok to add annotations, but it is good practice to add them as far upstream as possible to ensure the maximum number of people can benefit from it. If the annotation only apply to a specific user group, best indicate this somehow, maybe using inSubset, or making an additional comment to that fact. Else it becomes cumbersome when we merge ontologies.

As for preferred labels by group, I have seen at least 2 patterns:

  1. Using exact synonym and the inSubset some UserX annotation to mean: this is an exact synonym, and userX (Monarch, AGR, Nico) prefers this over the main label
  2. Using the skos:prefLabel with inSubset the same way as above.

I tend to 1, because prefLabel is widely ignored and idiosyncratic for OBO - exact synonyms are widely used and well understood.

@ddooley
Copy link
Author

ddooley commented Jun 4, 2021

The exact synonym + inSubset looks viable informatically, though that won't be usable for Protege display, but I think that can be sacrificed.

I see MAXO in its axiomatic use of CHEBI Vitamin B etc, was expecting material entities. So we really need material entity Vitamers to swap in place if possible: See their "vitamin supplementation" branch: https://www.ebi.ac.uk/ols/ontologies/maxo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMAXO_0001129&viewMode=All&siblings=false which is suffering from FoodOn's vitamin term deprecation at moment.

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 4, 2021

The exact synonym + inSubset looks viable informatically, though that won't be usable for Protege display, but I think that can be sacrificed.

I think this will depend on if this works for @CropStoreDb and @LilyAndres. I'm happy with either of @matentzn's suggestions, however I don't think (and correct me if I'm wrong) that there is a precedent of subsets in CHEBI. As far as I can tell they only use the 1,2,3 start subset properties. In a way CDNO could be thought of as a nutritionists subset of CHEBI (plus other relevant terms). Is adding subsets to CHEBI something we should pursue? Or would switching to exact synonym or prefLabel work for now?

To @ddooley's second point about MAXO, we're (@LilyAndres is) working to sort that out see #57.

@LilyAndres
Copy link
Collaborator

LilyAndres commented Jun 7, 2021

Is adding subsets to CHEBI something we should pursue? Or would switching to exact synonym or prefLabel work for now?

Thanks for the suggestion @matentzn, @ddooley, @kaiiam I think switching to exact synonym or prefLabel would work, I'm having a meeting with the CHEBI team this week but I could ask directly to Adnan if changing some of the labels would be possible...

@LCCarmody
Copy link

"I see MAXO in its axiomatic use of CHEBI Vitamin B etc, was expecting material entities. So we really need material entity Vitamers to swap in place if possible: See their "vitamin supplementation" branch: https://www.ebi.ac.uk/ols/ontologies/maxo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMAXO_0001129&viewMode=All&siblings=false which is suffering from FoodOn's vitamin term deprecation at moment."

@ddooley should I start a new ticket for this? In Foodon? or is it noted elsewhere? Thanks.

@ddooley
Copy link
Author

ddooley commented Jun 8, 2021

@LCCarmody its going to be solved with #57 , via upcoming strategy call soon with CHEBI - at which point FoodOn can adopt the material references and have the right "term replaced by" reference. FoodOn intended to reference vitamins as groups of vitamer or particular vitamer as material entities. We will mirror CHEBI vitamin roles too if its clear enough they aren't material entities.

@LCCarmody
Copy link

Thanks. I must have missed that. :)

@LilyAndres
Copy link
Collaborator

Just to follow this up, I have checked the terms that we have in the 'dietary nutritional component' class to see if we can switch to the original label from CHEBI. Which I think is the first strategy we can follow, then we can see for which terms we can add the exact synonym or prefLabel.

In total, I counted 141 terms from 455 where the original label from CHEBI has been changed in CDNO.

@kaiiam I think we had issues integrating the labels 'ω−3 fatty acid', 'α-linolenic acid', 'γ-linolenic acid', 'δ-tocopherol', etc and for this reason, we changed to 'omega-6 fatty acid', 'alpha-linolenic acid', 'gamma-linolenic acid', 'delta-tocopherol', do you think this still will be a problem?

@ddooley
Copy link
Author

ddooley commented Jun 17, 2021

@matentzn do you think the OBO community would take to using skos:prefLabel for label additions aimed at providing a label for a local community of use? Or we use exactSynonym and annotate it with "inSubset CDNO" etc?

@matentzn
Copy link
Contributor

If you ask my opinion, I would say the latter (exactSynonym) has a higher chance of success. The problem with prefLabel is that it really should be singular (only one such annotation) - and you may want to define preferred labels for multiple sources..

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 21, 2021

@kaiiam I think we had issues integrating the labels 'ω−3 fatty acid', 'α-linolenic acid', 'γ-linolenic acid', 'δ-tocopherol', etc and for this reason, we changed to 'omega-6 fatty acid', 'alpha-linolenic acid', 'gamma-linolenic acid', 'delta-tocopherol', do you think this still will be a problem?

This can be fixed if needed, but we'll be taking the original labels from CHEBI regardless.

Or we use exactSynonym and annotate it with "inSubset CDNO" etc?

@matentzn and @ddooley do you mean exact synonym and in subset CDNO being added in CHEBI? I don't think there is a precedent for adding subsets to CHEBI.

@matentzn
Copy link
Contributor

Yes, since Chebi is not truly "open" in the strict sense, or at least let's say, not community managed, you may not be able to get these labels in there. But IMO you won't harm anyone for now, if you add the exact synonym of the CHEBI term to CDNO directly, and add the subset tag on the synonym. Here is an example of that in Mondo: https://www.ebi.ac.uk/ols/ontologies/mondo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMONDO_0010929

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 21, 2021

Thanks @matentzn for setting up the vote in information-artifact-ontology/ontology-metadata#69.

I'm unclear about the MONDO example. I'm not seeing a subset tag on the synonym show up on OLS?

image

Perhaps it's just in the owl file or I'm missing something?

@matentzn
Copy link
Contributor

Its just odd rendering:

image

I should have shown you the source instead:

[Term]
id: MONDO:0010929
name: craniosynostosis 4
def: "Any craniosynostosis in which the cause of the disease is a mutation in the ERF gene." [MONDO:patterns/disease_series_by_gene]
synonym: "craniosynostosis 4" EXACT [MONDO:Lexical, OMIM:600775]
synonym: "craniosynostosis caused by mutation in ERF" EXACT [MONDO:design_pattern]
synonym: "craniosynostosis type 4" EXACT [MONDORULE:1, OMIM:600775]
synonym: "CRS4" RELATED ABBREVIATION [MONDO:Lexical, OMIM:600775]
synonym: "ERF craniosynostosis" EXACT [MONDO:design_pattern, MONDO:patterns/disease_series_by_gene]
synonym: "ERF-related craniosynostosis" EXACT CLINGEN_PREFERRED [https://clinicalgenome.org/affiliation/40059/, PMID:23354439]
...

@matentzn
Copy link
Contributor

In particular:

synonym: "ERF-related craniosynostosis" EXACT CLINGEN_PREFERRED [https://clinicalgenome.org/affiliation/40059/, PMID:23354439]

@LilyAndres
Copy link
Collaborator

@matentzn thanks a lot for the explanation and @kaiiam for following this up.

Just one question, is "EXACT CLINGEN_PREFERRED" a sub-type of ExactSynonym? Sorry, I can't find much information about it.

So 'craniosynostosis 4' then can have multiple ExactSynonyms but "EXACT CLINGEN_PREFERRED" is like the preferred one?

@Graham-J-King
Copy link

Graham-J-King commented Jun 22, 2021 via email

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 22, 2021

I'm guessing the CLINGEN_PREFERRED is an annotation property on the synonym to show that that synonym is the preferred one from that source.

I think in our case (unless @CropStoreDb you have an objection to this) that we should keep exisiting CHEBI labels as they are, as not to change things or create confusion. Then have our preferred labels as exact synonyms so that they'll show up in an obvious way on OLS when people are searching for them. For example in CDNO for the CHEBI term: CHEBI_27552 (original label: 4'-methoxy-5,7-dihydroxyflavanone) we added the label isosakuranetin so its currently duplicated on OLS it only renders our added one (not the original one)

image

You can also see that it renders exact synonyms in the synonyms box below. I think we should stop re-asserting labels (aka keep the original one) then add those current labels (e.g., isosakuranetin) as extra exact synonyms within CDNO. That way we don't change CHEBI but add our preferred term labels in a still noticeable way.

Thoughts?

@Graham-J-King
Copy link

I agree, let's use exact synonyms as the way forward

@matentzn
Copy link
Contributor

If you want to follow this method to this:

  1. Create a property under http://www.geneontology.org/formats/oboInOwl#SubsetProperty, named, say, http://purl.obolibrary.org/obo/cdno#crop_db
  2. Dont worry about anything else, in particular, do not give it a label. You may give it a rdfs:comment.
  3. Create your exact synonym using http://www.geneontology.org/formats/oboInOwl#hasExactSynonym the way you usually would.
  4. Then click on the @ symbol next to the annotation, e.g.
    image
  5. Click on the + to add another annotation, select the oio:inSubset property, then Click on Entity IRI on the left and select the subset (in my case I refer to phenotype_rcn).
    image
  6. Lastly Click ok. You should see this:
    image

This should be it!

I would recommend using ROBOT templates for this, this makes the process infinitely easier :D

@kaiiam
Copy link
Collaborator

kaiiam commented Jun 23, 2021

Thanks @matentzn our workflow already makes use of robot templates so we can just add a column with the new AP appended to the preferred exact synonym AP there.

@LilyAndres
Copy link
Collaborator

Awesome, thanks @matentzn we will follow the recommendations.

@LilyAndres
Copy link
Collaborator

This issue has been solved in the new release, I used hasRelatedSynonym and ExactSynonym and integrated the original labels from CHEBI. I will close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants