Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add max research phase to treatsChembl edges from mychem.info #813

Closed
andrewsu opened this issue Apr 22, 2024 · 8 comments
Closed

add max research phase to treatsChembl edges from mychem.info #813

andrewsu opened this issue Apr 22, 2024 · 8 comments
Assignees

Comments

@andrewsu
Copy link
Member

Per a request from @mbrush, I'm creating this issue to add a max research phase attribute to edges based on treatsChembl and treatsChembl-rev in the mychem.info openAPI annotation file. This info will be used in this CQS query template. Note that the application of the attribute constraint will occur within the CQS, but we need to make sure that BTE returns the attribute in the response.

test query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicates": [
            "biolink:in_clinical_trials_for"
          ]
        }
      },
      "nodes": {
        "n00": {
          "categories": [
            "biolink:ChemicalEntity"
          ]
        },
        "n01": {
          "categories": [
            "biolink:Disease"
          ],
          "ids": [
            "MONDO:0004979"
          ]
        }
      }
    }
  }
}

example edge returned from https://bte.ci.transltr.io/v1/query:

                "d29ff5f4006b1523fea5a64ef3b36292": {
                    "predicate": "biolink:in_clinical_trials_for",
                    "subject": "PUBCHEM.COMPOUND:1993",
                    "object": "MONDO:0004979",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e8983096-65e6-41f2-8b6f-bbf0a7227307",
                                "https://clinicaltrials.gov/search?id=%22NCT02584257%22",
                                "https://clinicaltrials.gov/search?id=%22NCT05292976%22",
                                "https://clinicaltrials.gov/search?id=%22NCT02097537%22",
                                "https://clinicaltrials.gov/search?id=%22NCT01907334%22",
                                "clinicaltrials:NCT02584257",
                                "clinicaltrials:NCT05292976",
                                "clinicaltrials:NCT02097537",
                                "clinicaltrials:NCT01907334",
                                "https://clinicaltrials.gov/ct2/results?id=%22NCT03505489%22",
                                "clinicaltrials:NCT03505489"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                       ...
                    ]
                },

The fix to this issue would add a new entry under attributes for max research phase. As noted in https://github.com/NCATS-Tangerine/translator-api-registry/blob/biolink-4-update/mychem.info/openapi_full.yml#L1069, this info is available from mychem.info under chembl.drug_indications.max_phase_for_ind.

colleenXu added a commit to NCATS-Tangerine/translator-api-registry that referenced this issue Apr 22, 2024
@colleenXu
Copy link
Collaborator

colleenXu commented Apr 22, 2024

The changes should be live on Dev/CI within 10 min of the linked commit. BTE/Service Provider now create a "biolink:max_research_phase" edge-attribute from the MyChem chembl.drug_indications.max_phase_for_ind info.

However, the values returned right now don't match the biolink-model spec for "max research phase"...

  • We're returning arrays of strings, and I've seen these values: "-1.0", "0.0", "0.5", "1.0", "2.0", "3.0", "4.0"
  • VS the biolink-model max research phase has specific values (enum) that are more descriptive strings

If we needed to return values in the biolink-model spec, I imagine we'd need to figure out all current possible values and what they mean, map them to the biolink-model's values, and do a BTE JQ/post-processing step with those mappings.

@colleenXu colleenXu added On CI Related changes are deployed to CI server needs discussion labels Apr 22, 2024
@andrewsu
Copy link
Member Author

andrewsu commented May 1, 2024

https://mychem.info/v1/query?q=_exists_:chembl.drug_indications&fields=chembl.drug_indications&facets=chembl.drug_indications.max_phase_for_ind

{
    "took": 12,
    "total": 8462,
    "max_score": 1,
    "facets": {
        "chembl.drug_indications.max_phase_for_ind": {
            "_type": "terms",
            "terms": [
                {
                    "count": 4554,
                    "term": 2
                },
                {
                    "count": 3996,
                    "term": 1
                },
                {
                    "count": 3142,
                    "term": 3
                },
                {
                    "count": 2707,
                    "term": 4
                },
                {
                    "count": 692,
                    "term": 0
                },
                {
                    "count": 492,
                    "term": -1
                }
            ],
            "other": 0,
            "missing": 0,
            "total": 15583
        }
    },

@colleenXu
Copy link
Collaborator

colleenXu commented May 1, 2024

Mapping:

  • "-1.0" -> "not_provided" (actually "clinical trial phase unknown")
  • "0.5" -> "pre_clinical_research_phase" (actually Phase 0 trials)
  • "1.0" -> "clinical_trial_phase_1"
  • "2.0" -> "clinical_trial_phase_2"
  • "3.0" -> "clinical_trial_phase_3"
  • "4.0" -> "clinical_trial_phase_4" (actually "approved"/"marketed")
  • not one of these values -> "not_provided" and raise an error that we'd catch in Sentry

References

@colleenXu
Copy link
Collaborator

colleenXu commented May 2, 2024

@tokebe

I don't think "0.0" actually exists in the data, so let's remove that handling/mapping. I've edited my post above.

There's been discussions happening in Slack w/ Chunlei and Dylan. They discovered an issue with the API that they're addressing, but it won't affect the actual values that we're working with (lab Slack links).

@colleenXu
Copy link
Collaborator

@mbrush

In the above post, I wrote mappings between Chembl's "max phase for indication" values (specific to each drug-indication pair) and the biolink-model's MaxResearchPhaseEnum values.

However, I'm not sure on "-1.0", "0.5", and "4.0" - because the actual definition in Chembl doesn't seem to quite match the options.

I'm wondering if you have advice/opinions on this.

@colleenXu
Copy link
Collaborator

I've tested the linked PR biothings/bte_trapi_query_graph_handler#192 and it works as-intended!

Example query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MEDDRA:10012374"],
                    "categories":["biolink:Disease"]
                },
                "n1": {
                    "categories":["biolink:SmallMolecule"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:tested_by_clinical_trials_of"]
                }
            }
        }
    }
}

Here's the before-after for some edges from chembl-treats operations:

max phase for ind = -1.0

Before:

                "efeef17260b03e60a9266d6d860f2ad1": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:135061",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "-1.0"
                            ]
                        },

After:

                "efeef17260b03e60a9266d6d860f2ad1": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:135061",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "not_provided"
                            ]
                        },

max phase for ind = 0.5

Before:

                "5e9942a90c62660599c90a4e6700fe73": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:28939",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "0.5"
                            ]
                        },

After:

                "5e9942a90c62660599c90a4e6700fe73": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:28939",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "pre_clinical_research_phase"
                            ]
                        },

max phase for ind = 1.0

Before:

                "6082fe0e1b9e1681ce88b2cf94abf3e0": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:24802842",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "1.0"
                            ]
                        },

After:

                "6082fe0e1b9e1681ce88b2cf94abf3e0": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:24802842",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "clinical_trial_phase_1"
                            ]
                        },

max phase for ind = 2.0

Before:

                "9b2dcc4dad93c04c227d80147a766d76": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:4118151",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "2.0"
                            ]
                        },

After:

                "9b2dcc4dad93c04c227d80147a766d76": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:4118151",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "clinical_trial_phase_2"
                            ]
                        },

max phase for ind = 3.0 and 4.0 (disease meddra IDs treated as same entity)

Before:

                "87f29deccfa0f51de58d9ee3f1f98963": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:36791",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "4.0",
                                "3.0"
                            ]
                        },

After:

                "87f29deccfa0f51de58d9ee3f1f98963": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:36791",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "clinical_trial_phase_4",
                                "clinical_trial_phase_3"
                            ]
                        },

@colleenXu colleenXu added On Dev Related changes are deployed to Dev server and removed On CI Related changes are deployed to CI server labels May 3, 2024
@tokebe tokebe added On CI Related changes are deployed to CI server and removed On Dev Related changes are deployed to Dev server labels Jun 20, 2024
@colleenXu colleenXu added On Test Related changes are deployed to Test server and removed needs discussion On CI Related changes are deployed to CI server labels Jun 24, 2024
@colleenXu colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Jul 19, 2024
@tokebe
Copy link
Member

tokebe commented Jul 26, 2024

Related PRs deployed to Prod. @colleenXu good to close?

@colleenXu
Copy link
Collaborator

Yeah, let's close this. We'll open a new issue if Matt Brush/others tell us to adjust the mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants