missing ChEBI records? #83

andrewsu · 2020-09-24T18:06:13Z

I can't find macrolide in mychem, and I wonder if it indicates that our ChEBI import is somehow incomplete? Details below (using imatinib as a positive control):

ChEBI records

imatinib: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:45783
macrolide: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:25106

MyChem queries:

http://mychem.info/v1/query?q=chebi.id:CHEBI\:45783 (1 HIT, good)
http://mychem.info/v1/query?q=chebi.id:CHEBI\:25106 (0 HITS, bad?)

kevinxin90 · 2020-10-28T16:56:49Z

@andrewsu
For CHEBI dumper in MyChem, we're using the SDF file provided by CHEBI. And as stated in the website, it contains all the chemical structures and associated information. However, it excludes any ontological information.
Macrolide is an ontology term within CHEBI that represents a class of chemicals. That's why it's not included in the SDF file.
I'm fine if we wanna include all ontology info from CHEBI into MyChem. It's just we need to use a different file from CHEBI to ingest.

andrewsu · 2020-10-29T04:42:02Z

got it, thanks for looking into this, Kevin. I leave it to you to prioritize, but I do think having the ontological nodes in mychem would be valuable...

andrewsu · 2021-06-17T19:45:45Z

Adding a note that mychem also does not include an entry for CHEBI:31859 (which is the entry for the racemic mixture of modafinil). We do have entries for the separate right- and left-handed versions of this molecule armodafinil (CHEBI:77590) and (S)-modafinil (CHEBI:77591), and also the nonchiral version 2-[(diphenylmethyl)sulfinyl]acetamide (CHEBI:77585).

MyChem links:
http://mychem.info/v1/query?q=chebi.id:CHEBI\:31859 (no record)
http://mychem.info/v1/query?q=chebi.id:CHEBI\:77585
http://mychem.info/v1/query?q=chebi.id:CHEBI\:77590
http://mychem.info/v1/query?q=chebi.id:CHEBI\:77591

I would expect that MyChem would have records for all four of these records (we have three out of four), and that mychem would capture the ontological relationships in ChEBI (specifically the ones listed in the screenshots below);

more info from Chris Bizon at https://ncatstranslator.slack.com/archives/C01LQKY499A/p1623775157021300

erikyao · 2021-06-29T22:08:10Z

Aim

Add another set of dumper/parser/uploader to read ChEBI ontology data.

Source Files

As indicated in ChEBI website, there are 3 versions of ontology files:

LITE - Only ids, name, subsets and relationships are available. Small size if you are interested in our ontology only
CORE - As above plus chemical data (mass, charge, formula) and structures (inchis, inchikeys, smiles)
FULL - As above plus name synonyms and manually added cross-reference.

each in two formats:

obo
owl

The LITE version should have everything we need.

Implementation

Refer to the code of mondo plugin in mydisease.info, which

reads obo files into relationship networks (in networkx graph objects)
using obonet lib

Another lib pronto is also available to read obo or owl files, but it's not as convenient as obonet when reading relationships.

erikyao · 2021-07-07T19:12:40Z

How `obonet` works

Our Mondo parser uses obonet library, which, on receiving an obo ontology file,

parses each entity into a node, and
connects the entity nodes into a graph by their relationships

However, the node representation has is_a relationship as an individual field beside their relationship field. E.g.

'MONDO:0016575': {
    'is_a': ['MONDO:0002254', 'MONDO:0005087', 'MONDO:0005308'],
    'relationship': ['excluded_subClassOf MONDO:0018395', 'has_modifier MONDO:0021136'],
    ...  # other fields omitted
}

'CHEBI:77590': {
    'is_a': ['CHEBI:77585'],
    'relationship': ['is_enantiomer_of CHEBI:77591', 'has_role CHEBI:35337', 'has_role CHEBI:77567']
    ...  # other fields omitted
}

Looks like is_a relationship is more often used and thus made individual. Dr. Chris Mungall has a related comment here.

However we don't have to worry about combining the is_a and relationship fields manually. The graphs returned by obonet will treat them both as edges. E.g.

graph = obonet.read_obo("chebi_lite.obo")
print(list(graph.successors("CHEBI:77590")))
# Output: ['CHEBI:77585', 'CHEBI:77591', 'CHEBI:35337', 'CHEBI:77567']

We can see that successors of CHEBI:77590 is a union of its is_a and relationship entities.

A possible problem in our Mondo parser

As shown in https://github.com/biothings/mydisease.info/blob/master/src/plugins/mondo/parser.py, a Mondo document has:

a parents field which includes ONLY its is_a nodes
a children field, a ancestors field, and a descendants field, each includes the reachable nodes from the UNION of its is_a and relationship nodes

From @andrewsu's comments above, I think all relationships (i.e. union of is_a and relationship) should be returned in the documents. Nonetheless, if we intended to only care is_a relationships, we should make a is_a subgraph before calculating each entity node's successors/predecessors/descendants/ancestors.

andrewsu · 2021-07-08T04:17:19Z

I think parents, children, ancestors, and descendants should be strictly based on is_a nodes. is_a indicates subclass relationships, and those four relationship types (parents, children, ancestors, and descendants) should only be based on subclassing.

Having said that, the other types of relationships would also be very useful to capture. I leave it to you to decide how to model the object, but my guess is that it should look something like this:

{
   "id": "CHEBI:77590",
   "parents": [ "CHEBI:77585" ],
   "children": [ ... ],
   "ancestors": [ "CHEBI:77585", ... ],
   "descendants": [ ...],
   "relationships": {
      "is_enantiomer_of": [ "CHEBI:77591" ],
      "has_role": [ "CHEBI:35337", "CHEBI:77567" ]
   }
}

erikyao · 2021-07-08T22:35:30Z

I can't find macrolide in mychem, and I wonder if it indicates that our ChEBI import is somehow incomplete? Details below (using imatinib as a positive control):

ChEBI records

imatinib: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:45783

macrolide: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:25106

MyChem queries:

http://mychem.info/v1/query?q=chebi.id:CHEBI:45783 (1 HIT, good)

http://mychem.info/v1/query?q=chebi.id:CHEBI:25106 (0 HITS, bad?)

@andrewsu CHEBI:25106 (macrolide) is not included as an individual record in the ChEBI SDF file (although it does appear once as the "Secondary ChEBI ID" to CHEBI:3112 (biperiden)).

CHEBI:25106 has an record in the ChEBI ontology file, as below:

{'name': 'macrolide',
 'subset': ['3_STAR'],
 'def': '"A macrocyclic lactone with a ring of twelve or more members derived from a polyketide." []',
 'is_a': ['CHEBI:26188', 'CHEBI:63944']}

got it, thanks for looking into this, Kevin. I leave it to you to prioritize, but I do think having the ontological nodes in mychem would be valuable...

@andrewsu @newgene We need to discuss whether to include ontological nodes in mychem in details. Technically it's feasible.

erikyao · 2021-07-17T08:39:09Z

ChEBI ids in `rel201` SDF and obo files

The compound file, rel201/SDF/ChEBI_complete.sdf.gz, has 133,779 ChEBI ids (let's indicate it as a set S1), while the ontology file rel201/ontology/chebi_lite.obo.gz has 146,183 (S2). However, S1 is not a subset of S2. Their set difference is

S1 \ S2 = {"CHEBI:156068"}

which means "CHEBI:156068" will be the only document that has no ontology fields in our collection.

The set difference S2 \ S1 contains 12,404 ChEBI ids. They will have no chemical/compound structure fields.

andrewsu · 2021-08-04T04:41:17Z

@erikyao since http://mychem.info/v1/query?q=chebi.id:CHEBI\:25106 still returns zero hits, I'm guessing this hasn't yet been deployed? What is involved in doing that, and who will handle that? (As a general rule, I think we should merge and deploy before closing an issue...)

erikyao · 2021-08-05T16:04:45Z

The PR to fix this issue was merged to the code base. Not deployed yet.

erikyao · 2021-08-05T17:37:06Z

Hi @ravila4, please let me know when you fixed the PubChem data source. We can deploy the fixes together. Thank you!

ravila4 · 2021-09-07T23:59:00Z

@erikyao Can we close this issue now? It looks like the queries are working now.

andrewsu added the bug label Sep 24, 2020

andrewsu added enhancement and removed bug labels Oct 29, 2020

kevinxin90 mentioned this issue Feb 26, 2021

How do I retrieve ChEBI Name ? biothings/biothings_explorer_archived#175

Closed

erikyao self-assigned this Mar 15, 2021

erikyao assigned ravila4 Jun 29, 2021

erikyao mentioned this issue Jul 8, 2021

BUG: Limit parents, children, ancestors, and descendants in Mondo document to is_a ontology nodes biothings/mydisease.info#44

Closed

erikyao added a commit to erikyao/mychem.info that referenced this issue Jul 16, 2021

fixed issue biothings#83

574c3c1

erikyao added a commit to erikyao/mychem.info that referenced this issue Jul 17, 2021

added mapping for new field introduced when fixing issue biothings#83

048d824

This was referenced Jul 19, 2021

Fix to Issue#83 #106

Merged

obonet is incapable to parse the definition terms correctly in obo files #107

Open

erikyao closed this as completed in #106 Aug 3, 2021

erikyao reopened this Aug 5, 2021

ravila4 closed this as completed Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing ChEBI records? #83

missing ChEBI records? #83

andrewsu commented Sep 24, 2020

kevinxin90 commented Oct 28, 2020 •

edited

Loading

andrewsu commented Oct 29, 2020

andrewsu commented Jun 17, 2021

erikyao commented Jun 29, 2021

erikyao commented Jul 7, 2021 •

edited

Loading

andrewsu commented Jul 8, 2021 •

edited

Loading

erikyao commented Jul 8, 2021 •

edited

Loading

erikyao commented Jul 17, 2021 •

edited

Loading

andrewsu commented Aug 4, 2021

erikyao commented Aug 5, 2021

erikyao commented Aug 5, 2021 •

edited

Loading

ravila4 commented Sep 7, 2021

missing ChEBI records? #83

missing ChEBI records? #83

Comments

andrewsu commented Sep 24, 2020

kevinxin90 commented Oct 28, 2020 • edited Loading

andrewsu commented Oct 29, 2020

andrewsu commented Jun 17, 2021

erikyao commented Jun 29, 2021

Aim

Source Files

Implementation

erikyao commented Jul 7, 2021 • edited Loading

How obonet works

A possible problem in our Mondo parser

andrewsu commented Jul 8, 2021 • edited Loading

erikyao commented Jul 8, 2021 • edited Loading

erikyao commented Jul 17, 2021 • edited Loading

ChEBI ids in rel201 SDF and obo files

andrewsu commented Aug 4, 2021

erikyao commented Aug 5, 2021

erikyao commented Aug 5, 2021 • edited Loading

ravila4 commented Sep 7, 2021

kevinxin90 commented Oct 28, 2020 •

edited

Loading

erikyao commented Jul 7, 2021 •

edited

Loading

How `obonet` works

andrewsu commented Jul 8, 2021 •

edited

Loading

erikyao commented Jul 8, 2021 •

edited

Loading

erikyao commented Jul 17, 2021 •

edited

Loading

ChEBI ids in `rel201` SDF and obo files

erikyao commented Aug 5, 2021 •

edited

Loading