Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent use of CHEBI for design patterns #977

Closed
kaiiam opened this issue Jun 19, 2020 · 25 comments
Closed

Ensure consistent use of CHEBI for design patterns #977

kaiiam opened this issue Jun 19, 2020 · 25 comments
Labels
Modify term pattern Issue requires use of pattern-driven routines
Projects

Comments

@kaiiam
Copy link
Contributor

kaiiam commented Jun 19, 2020

@ramonawalls suggested I be really clear about the use of CHEBI molecular entity and atom hierarchies, in regard to which would be the correct one to use when referring to measurements in our concentration terms. @cmungall mentioned it would be better to be consistent and just use one.

Hence I asked the CHEBI team and they responded saying we should use terms from molecular entity terms (e.g. elemental cadmium) instead of atom terms (e.g. cadmium atom). As such I think we should be consistent when importing CHEBI terms for use in our DOSDPs.

I noticed this issue in some of the concentration terms @stevenchong and I had made to address #721, e.g. ENVO:3200027 which is set to include lanthanum atom.

I also noticed this in this ongoing PR @pbuttigieg is working on, where I notice the addition of terms like ENVO:3100043,,CHEBI:27594,carbon atom,ENVO:00002149,sea water

Finally I also noticed the use of atom terms in the entity_attribute_location pattern, e.g. solubility of nitrogen atom in water.

Let me know if you guys think we should commit to just using terms from the CHEBI molecular entity hierarchy.

@kaiiam kaiiam added Modify term pattern Issue requires use of pattern-driven routines labels Jun 19, 2020
@kaiiam
Copy link
Contributor Author

kaiiam commented Jun 19, 2020

I guess there might be some cases where we might need to refer to atoms like in the case of stable isotope measurements e.g., carbon-14 atom

@kaiiam
Copy link
Contributor Author

kaiiam commented Jun 19, 2020

Porting this over from @cmungall

@kaiiam - can you provide more details? (IMHO it's good to do this via tickets on the respective ontology tracker)

It would be good to get a definitive answer with scientific justification for the choice. We need to clearly document this across multiple OBO ontologies that need to represent things at the level of elements (@diatomsRcool @matentzn @bpeters42)

As it happens, I think the molecular entity choice is a good one, because it groups ions, and sometimes these forms are more physiologically relevant.

However, it would be good if the choice we made scientifically rather than just 'this seems to give us the inferences we need'

This part of chebi has always confused me, we have a has-part between the molentity and the atom. This implies to me that the molentity is molecule with multiple atoms but this is now what we want here... see
image

@kaiiam
Copy link
Contributor Author

kaiiam commented Jun 19, 2020

@cmungall Here is my exchange with Adnan Malik (posted with permission)

ME:

Hello CHEBI curation team,My name is Kai Blumberg I’m a developer for the OBO ontology ENVO. We are making use of CHEBI within our ontology to create terms representing concentrations of chemical entities within environmental materials, e.g., concentration of ammonium in water.I have a question regarding the intended use of the CHEBI molecular entity and atom hierarchies, specifically in regard to the distinction between an atom e.g., cadmium atom and the elemental form of that same atom, e.g., elemental cadmium. Which term would be the correct one to use when referring to a measurement of Cd? I would suspect elemental cadmium would be more appropriate, but I’m not sure. In this and similar cases, the elemental terms do not always have annotation properties such as molecular weight, whereas the atom terms do. Do terms like elemental cadmium represent a portion of Cd atoms which don’t have a fixed formula, net charge, average mass, etc, and therefore don't have annotation properties?Much appreciated if you could help us sort this out. Cheers, Kai

Adnan

Hi Kai, Thank you for your recent e-mail. I'm not quite sure what ENVO are measuring. I guess that the total amount of cadmium (in an organism, soil or water sample) is being measured. This may be present as elemental cadmium, or as one or more cadmium salts. But presumably whatever it is, the results are converted to elemental cadmium equivalent (otherwise reporting finding 0.1 g/litre of cadmium iodide in one study and 0.1 g/litre of cadmium chloride in another study will be misleading, since iodide will make up a much greater proportion (and hence cadmium will make up a much smaller proportion of the total mass than chloride.

So on that basis, I would suggest that ENVO use elemental cadmium (CHEBI:37249). I have added a structure, definition and some some more information to the entry. However, you are right that alot of the elements in ChEBI do not have properties (such as mass, monoisotopic mass etc) associated with them. Lets take elemental carbon (CHEBI:33415) as an example, there are several different forms of elemental carbon that can be found such as diamond (CHEBI:33417) or graphene (CHEBI:36973). The mass and monoisotopic masses of these different forms of carbon will vary hence it would be misleading to assign a mass, monoisotopic mass to this entry. Best Regards, Adnan

ME:

Thank you very much Adnan for the clarification. As I understand it, the measurements we're trying to represent are the results of processes converting an element like Cd to its elemental form and not measuring the associated salts. My question was more in regard to the use of the atom vs the element term from CHEBI, but it sounds like elements form the molecular entity hierarchy are what we should be using.

Thank you also for responding about the CHEBI properties regarding elements, which as I suspected could take on different forms, hence not assigning masses.

Subsequently to this conversation they updated elemental cadmium to include additional annotation properties such as average mass

@pbuttigieg
Copy link
Member

Let me know if you guys think we should commit to just using terms from the CHEBI molecular entity hierarchy.

I think it just depends on what we're talking about. Both are valid.

I guess there might be some cases where we might need to refer to atoms like in the case of stable isotope measurements e.g., carbon-14 atom

Yes, and I'm sure there will be more cases like this.

CHEBI's treatment of "molecular" is bizarre to me - molecules have two or more atoms, and an ion can be monoatomic.

@pbuttigieg
Copy link
Member

As I understand it, the measurements we're trying to represent are the results of processes converting an element like Cd to its elemental form and not measuring the associated salts.

That's not the case - many (even most) measurement processes do not include a step to convert stuff into elemental forms, but you can calculate the mass (and thus concentration) from the concentrations of the molecules bearing the atom of interest.

@diatomsRcool
Copy link
Contributor

Just my 2 cents.....from the ECTO perspective, the use cases we've encountered at the moment call for elemental, rather than atomic. I can see wanting to represent atoms when talking about molecular reactions, but we haven't encountered that need yet. HOWEVER when building the environmental qualities classes you may need to use the atom IF the analysis is measuring the concentration of the atom in seawater, for example.

@stevenchong
Copy link
Collaborator

The Arctic Data Center's use cases were primarily elemental analyses, so good catch. @mpsaloha might want to weigh in on this issue.

@cmungall
Copy link
Member

cadmium and friends:

image

note the ions are not connected to the molecular entity / atom branch. But the ion form may be more physiologically relevant?

@kaiiam yu mentioned ammonia, here is ammonia in the context of nitrogen, so following the proposed ppattern nitrogen-in-soil superClassOf ammonia-in-soil:

image

@kaiiam
Copy link
Contributor Author

kaiiam commented Jun 19, 2020

@cmungall isn't multiple inheritance under the ion and molecular entity hierarchies desirable? Ionic forms are physiologically relevant and often measured, e.g. concentration of ammonium in soil, where ammonium is subsumed under both the ion and molecular entity hierarchies, same with phosphate(3-).
image

so following the proposed pattern nitrogen-in-soil superClassOf ammonia-in-soil:

Wouldn't it be better to have nitrogen molecular entity in soil? Because that would actually be super class to ammonia-in-soil, whereas nitrogen atom in soil wouldn't, again coming back that that has part relation between the atom and molecular entity.

image

@cmungall
Copy link
Member

isn't multiple inheritance under the ion and molecular entity hierarchies desirable?

Yes, MI is usually a good thing. Some ontologists have muddied the waters here.

Ionic forms are physiologically relevant and often measured, e.g. concentration of ammonium in soil, where ammonium is subsumed under both the ion and molecular entity hierarchies, same with phosphate(3-).

Yes, I didn't make my point clearly, this is desired

Wouldn't it be better to have nitrogen molecular entity in soil

Yes, I was proposing to use the NME class. I don't know what should be used in the label since NME is not very intuitive

@pbuttigieg

CHEBI's treatment of "molecular" is bizarre to me - molecules have two or more atoms, and an ion can be monoatomic.

I agree

@bpeters42
Copy link

bpeters42 commented Jun 24, 2020 via email

@kaiiam
Copy link
Contributor Author

kaiiam commented Jul 3, 2020

Porting over @pbuttigieg 's comment from here

Following our discussion of using atom vs molecular entity terms, I emailed the CHEBI team and they suggested we use the molecular entity terms. Hence I think we should be consistent and not use these atom terms here.

@kaiiam I'm not sold on this yet

@kaiiam
Copy link
Contributor Author

kaiiam commented Jul 3, 2020

@pbuttigieg Due to the parallel atom and molecular entity CHEBI hierarchies joined via a has part relation, I'm worried that if we allow ourselves to create concentration terms from both, it could hinder rather then help with interoperability. E.g. curators annotating their data would have to pick between terms like concentration of cadmium atom and concentration of cadmium molecular entity. If we allow ourselves to create both, then we won't be helping to make disparate datasets interoperable but instead we would be separating data by the curation choice of confusingly similar concentration terms. Hence my advocacy for us only using one CHEBI hierarchy when possible.

CHEBI's treatment of "molecular" is bizarre to me - molecules have two or more atoms, and an ion can be monoatomic.

Perhaps atom terms are preferable to molecular entity terms when describing measurements of elemental forms like in @diatomsRcool's, @stevenchong's, and the UA-SRC use-cases. I'm not sure what's "better" I just want us to be consistent.

The argument the other way is that since ions are subsumed under the molecular entity hierarchy a recursive subclass query for all subclasses of a molecular entity term, e.g. aluminum molecular entitity, would give us the ions as well:

image

However, due to inconsistencies in CHEBI, this doesn't always seem to hold, e.g. with cadmium where the linkages that would enable us to query and discover cadmium cations are missing.
image

@kaiiam
Copy link
Contributor Author

kaiiam commented Jul 10, 2020

In conversation with @pbuttigieg and @wdduncan, we're thinking of favoring the use of the molecular entity branch over the atom branch for the majority of cases. Although both are correct, using terms from the molecular entity hierarchy seems more pragmatic as it contains the various valence states and ions, which people require see the sulfur molecular entity for example:

image

In contrast, the atom branch contains the various isotopes the element can have.

image

Thus a potential solution would seem to be 1) Use terms from the molecular entity hierarchy for the majority of cases, and to be as specific as possible e.g. aluminum(3+) instead of aluminum cation 2) Make use of the atom hierarchy when specifically describing measurements of isotopes e.g. carbon-14 atom.

@cmungall
Copy link
Member

I would be cautious about using overly specific ion forms. Again it comes down to do you get the inferences you expect? You might want to write up some competency questions

for GO we use the pH7.3 form ion subtype which represents "normal" physiology in kind of metazoa biased kind of way. No idea if that translates to e.g soil, seawater. But if you use the ME class it should give a lot of what you needd

@pbuttigieg
Copy link
Member

pbuttigieg commented Jul 14, 2020

@cmungall

I would be cautious about using overly specific ion forms. Again it comes down to do you get the inferences you expect? You might want to write up some competency questions

If a method reports on the concentration of a specific ionic form, I think we should use that class regardless of what inferences come out. The ontology should be driven by reality. However, as @kaiiam and @ramonawalls will be using this branch for their work, they may want to explore this via competency questions more closely.

for GO we use the pH7.3 form ion subtype which represents "normal" physiology in kind of metazoa biased kind of way. No idea if that translates to e.g soil, seawater.

It could translate to other environmental materials other than tissues/cells, but we wouldn't really know. What we would know is that a specific method is reporting on, e.g., the concentration of nitrate (or nitrite, or sulphate, etc). That's enough to build a corresponding class I think.

But if you use the ME class it should give a lot of what you needd

Yes, we're not going to be able to resolve CHEBI's ambiguity on ME:

Any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer etc., identifiable as a separately distinguishable entity.

Atom should be an ME, if that definition is right, and the "etc." isn't really helpful.

So, as @kaiiam notes above:

Converging on...

  • We'll typically default the the ME classes.
    • For example, rather than "sulfur atom" we'll use "sulfur molecular entity" unless we can be more specific (i.e. choose something deeper in the ME hierarchy)

Still need to settle on ...

  • For isotopes, we could either
    • use the relevant class under atom, such as "sulfur-35" or
    • use an axiom like (ME has_part only sulfur-35)

I think the latter is more "correct" but I'm not sure it's worth the complexity given the other issues with these hierarchies.

Notes

@stevenchong

The Arctic Data Center's use cases were primarily elemental analyses, so good catch. @mpsaloha might want to weigh in on this issue.

Please be sure about that - make sure the methods are actually reporting on a quantity of matter where all the atoms have the same atomic number.

@diatomsRcool

Just my 2 cents.....from the ECTO perspective, the use cases we've encountered at the moment call for elemental, rather than atomic.

This should work out then - the atomic forms are linked via parthood to the MEs.

I can see wanting to represent atoms when talking about molecular reactions, but we haven't encountered that need yet.

That should likely be in a separate ontology. I know there were some that dealt with rxns, but not sure if they're maintained.

HOWEVER when building the environmental qualities classes you may need to use the atom IF the analysis is measuring the concentration of the atom in seawater, for example.

This is the confusing bit - I think that saying "oxygen molecular entity" would cover most forms of atoms themselves (because of CHEBI's very inclusive ME def and the presence of the "elemental" classes under ME) and also allow a bit of fuzziness in case the method is measuring different forms of the chemical entity.

@pbuttigieg pbuttigieg added this to In Progress in betterTech Jul 14, 2020
@wdduncan
Copy link
Member

Do you know how ChEBI formally defines 'part of'? Sometimes, 'part of' is defined a being reflexive, so every oxygen atom is part of itself. If ChEBI defines part of in this way, I could see how atoms are subsumed under molecular entity.

@pbuttigieg
Copy link
Member

@wdduncan

Looking at oxygen molecular entity it looks like they use the BFO:'has part'.

Looking at oxygen atom, they don't use a 'part of' relation, as the 'has part' does the work.

So they're quite reasonably (pun) linked

@wdduncan
Copy link
Member

@pbuttigieg
I should have been more careful in my language. 'has part' is just the inverse of 'part of'. See here:
http://www.ontobee.org/ontology/RO?iri=http://purl.obolibrary.org/obo/BFO_0000051

On the ontobee page, it only specifies that has part is transitive. The OWL maybe different.

@pbuttigieg
Copy link
Member

@wdduncan I'd post on their tracker with this question, cross-linking to this one.

@wdduncan
Copy link
Member

@pbuttigieg
Done. See ebi-chebi/ChEBI#3813

@cmungall
Copy link
Member

Any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer etc., identifiable as a separately distinguishable entity.
Atom should be an ME, if that definition is right, and the "etc." isn't really helpful.

FWIW, although CHEBI do not provide provenance for their definitions, this comes from IUPAC:

https://goldbook.iupac.org/terms/view/M03986

Although IUPAC is authoritative, this does not mean it is a good source of ontology definitions. I think IUPAC should be used to define CHEBI metaclasses, not classes. This class/metaclass confusion has persisted throughout chebi leading to problems such as the ones pointed out in this tracker. I have been pointing this out in the chebi tracker since 2007 to no avail.

@ramonawalls
Copy link

ramonawalls commented Jul 15, 2020

This thread is so long already, I almost hate to add to it, but...

As I understand it, the measurements we're trying to represent are the results of processes converting an element like Cd to its elemental form and not measuring the associated salts.

That's not the case - many (even most) measurement processes do not include a step to convert stuff into elemental forms, but you can calculate the mass (and thus concentration) from the concentrations of the molecules bearing the atom of interest.

As far as I understand, that is exactly what the environmental scientists do when reporting metal concentrations. I think for metals as contaminants, it is fairly standard.

Overall, I am very happy with where this thread is converging. I have also asked a colleague from Dartmouth who is processing their environmental data to comment.

@pbuttigieg
Copy link
Member

pbuttigieg commented Jul 15, 2020

@ramonawalls

As far as I understand, that is exactly what the environmental scientists do when reporting metal concentrations. I think for metals as contaminants, it is fairly standard.

I didn't realise / missed that your use case is restricted to metals. I was referring to compounds in general.

However, I still think we should work with molecular entity, as the actual quality we're talking about may not inhere in (only) the elemental form of the metal in the soil/water/etc, even if the method of measurement converts things into elemental forms.

Overall, I am very happy with where this thread is converging. I have also asked a colleague from Dartmouth who is processing their environmental data to comment.

Cool, many thanks

@kaiiam
Copy link
Contributor Author

kaiiam commented Jul 18, 2020

xref to this commit from Chris's new chemistry-ontology to which he intended to tag to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Modify term pattern Issue requires use of pattern-driven routines
Projects
betterTech
In Progress
Development

No branches or pull requests

8 participants