Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Therapeutic Target Database (TTD) Deployment #123

Closed
lucyzhang95 opened this issue Jun 1, 2023 · 27 comments
Closed

API Therapeutic Target Database (TTD) Deployment #123

lucyzhang95 opened this issue Jun 1, 2023 · 27 comments
Assignees
Labels
api deployment done data source Data source pending to create a new API On Test Match https://github.com/biothings/biothings_explorer/labels x-bte

Comments

@lucyzhang95
Copy link

lucyzhang95 commented Jun 1, 2023

{
  "_id": "D07OAC_associated_with_T71390",
  "association": {
    "predicate": "biolink:associated_with"
  },
  "object": {
    "id": "T71390",
    "type": "biolink:Protein",
    "target_id": "T71390",
    "uniprot": "S5A2_HUMAN",
    "target_type": "successful` target",
    "bioclass": "CH-CH donor oxidoreductase"
  },
  "subject": {
    "id": "D07OAC",
    "trial_status": "investigative",
    "type": "biolink:Drug",
    "moa": "inhibitor"
  }
}
{
  "_id": "D04SKM_treats_2A85",
  "association": {
    "predicate": "biolink:treats",
    "clinical_trial": [
      {
        "status": "phase 1",
        "disease": "Acute lymphoblastic leukaemia"
      }
    ]
  },
  "object": {
    "id": "2A85",
    "icd11": "2A85",
    "name": "Acute lymphoblastic leukaemia",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "D04SKM",
    "name": "CART-10 cells",
    "type": "biolink:Drug"
  }
}
{
  "_id": "T67162_target_for_BD10-BD1Z",
  "association": {
    "predicate": "biolink:target_for",
    "clinical_trial": [
      {
        "status": "Approved",
        "disease": "Heart failure"
      }
    ]
  },
  "object": {
    "id": "BD10-BD1Z",
    "icd11": "BD10-BD1Z",
    "name": "Heart failure",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "T67162",
    "name": "Dopamine D2 receptor (D2R)",
    "type": "biolink:Protein",
    "target_id": "T67162",
    "uniprot": "DRD2_HUMAN",
    "target_type": "successful target",
    "bioclass": "GPCR rhodopsin"
  }
}
@erikyao
Copy link
Contributor

erikyao commented Jun 1, 2023

See also #57

@colleenXu
Copy link

Note that once this is deployed, we'll need some issue tracking the writing / deployment / hooking-up of the SmartAPI yaml w/ x-bte annotation

@colleenXu
Copy link

Also not sure how to handle the github issue assignment stuff here. For now, moving the pending repo issue to Yao's section of the project manager for Translator...

@erikyao
Copy link
Contributor

erikyao commented Jun 2, 2023

Will deploy tomorrow @colleenXu

@erikyao erikyao self-assigned this Jun 2, 2023
@erikyao
Copy link
Contributor

erikyao commented Jun 2, 2023

1. Problem with object.id and object.icd11

The following 1 document has object.id and object.icd11 as text, not a keyword.

{
  "_id": "D08WUK_treats_DF-1 vaccine",
  "association": {
    "predicate": "biolink:treats",
    "clinical_trial": [
      {
        "status": "phase 1b",
        "disease": "Middle East Respiratory Syndrome (MERS)"
      }
    ]
  },
  "object": {
    "id": "DF-1 vaccine",
    "icd11": "DF-1 vaccine",
    "name": "Middle East Respiratory Syndrome (MERS)",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "D08WUK",
    "name": "MVA-MERS-S",
    "type": "biolink:Drug"
  }
}

Solution: It could be an error in the source file. If not fixable, ok to ignore for now.

P.S. These two fields should still be indexed as keyword in ES.

2. Problem with object.uniprot

868 documents have multiple IDs in object.uniprot, as a whole string. E.g.

{
  "_id": "D04ZCZ_associated_with_T55285",
  "association": {
    "predicate": "biolink:associated_with"
  },
  "object": {
    "id": "T55285",
    "type": "biolink:Protein",
    "target_id": "T55285",
    "uniprot": "GLRA1_HUMAN; GLRA2_HUMAN; GLRA3_HUMAN; GLRA4_HUMAN; GLRB_HUMAN",
    "target_type": "successful target",
    "bioclass": "Neurotransmitter receptor"
  },
  "subject": {
    "id": "D04ZCZ",
    "trial_status": "terminated",
    "type": "biolink:Drug",
    "moa": "inhibitor"
  }
}

Solution: convert such a string into a list of strings.

3. Problem with _id, object.name and object.symbol

~12 documents have strange _id, object.name and object.symbol combination. E.g.

{
  "_id": "BM000024_TYMS-ER), thymidylate synthase 1494 (TYMS- 1494), dihydropyrimidine dehydrogenase (DPYD), methylenetetrahydrofolate reducta se (MTHFR), mutL homolog 1 (MLH1), UDP glucuronyltransferase (UGT1A1), ATP-binding cassette group B gene 1 (ABCB1), x-ray cross-complementing group 1 (XRCC1), g lutathione-S-transferase P1 (GSTP1), excision repair cross-complementing gene 2 (ERCC2_biomarker_for_Colorectal_cancer",
  "association": {
    "predicate": "biolink:biomarker_for"
  },
  "object": {
    "id": "BM000024",
    "type": "biolink:Biomarker",
    "name": "thymidylate synthase-enhancer region ",
    "symbol": "TYMS-ER), thymidylate synthase 1494 (TYMS- 1494), dihydropyrimidine dehydrogenase (DPYD), methylenetetrahydrofolate reducta se (MTHFR), mutL homolog 1 (MLH1), UDP glucuronyltransferase (UGT1A1), ATP-binding cassette group B gene 1 (ABCB1), x-ray cross-complementing group 1 (XRCC1), g lutathione-S-transferase P1 (GSTP1), excision repair cross-complementing gene 2 (ERCC2"
  },
  "subject": {
    "id": "2B91.Z",
    "name": "Colorectal cancer",
    "type": "biolink:Disease",
    "icd11": "2B91.Z"
  }
}

Solution: Looks like some records have a complicated string in a single cell, and got mis-read by the parser. Such strings could be double-quoted in the source file. Parser should be updated.

P.S. It's also possible that those records have missing or NA values, making the parser mis-read the wrong columns.

4. Problem with subject.uniprot

Same with the problem with object.uniprot. 372 documents involved. E.g.

{
  "_id": "T01447_target_for_2B6B",
  "association": {
    "predicate": "biolink:target_for",
    "clinical_trial": [
      {
        "status": "Phase 3",
        "disease": "Nasopharyngeal cancer"
      }
    ]
  },
  "object": {
    "id": "2B6B",
    "icd11": "2B6B",
    "name": "Nasopharyngeal cancer",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "T01447",
    "name": "NEDD8-activating enzyme (NAE)",
    "type": "biolink:Protein",
    "target_id": "T01447",
    "uniprot": "ULA1_HUMAN; UBA3_HUMAN",
    "target_type": "clinical trial target"
  }
}

Solution: convert such a string into a list of strings.

@lucyzhang95
Copy link
Author

Fixed problems 2, 3, and 4.

Updated Git Branch/Commit: master bd9d85c

The current parser outputs:

Problem 2. object.uniprot

{
        "_id": "D04ZCZ_associated_with_T55285",
        "association": {
            "predicate": "biolink:associated_with"
        },
        "object": {
            "id": "T55285",
            "type": "biolink:Protein",
            "target_id": "T55285",
            "uniprot": [
                "GLRA1_HUMAN",
                "GLRA2_HUMAN",
                "GLRA3_HUMAN",
                "GLRA4_HUMAN",
                "GLRB_HUMAN"
            ],
            "target_type": "successful target",
            "bioclass": "Neurotransmitter receptor"
        },
        "subject": {
            "id": "D04ZCZ",
            "trial_status": "terminated",
            "type": "biolink:Drug",
            "moa": "inhibitor"
        }
    }

Problem 3. _id, object.name and object.symbol

{
        "_id": "BM000024_biomarker_for_Colorectal_cancer",
        "association": {
            "predicate": "biolink:biomarker_for"
        },
        "object": {
            "name": [
                "thymidylate synthase-enhancer region",
                "thymidylate synthase 1494",
                "dihydropyrimidine dehydrogenase",
                "methylenetetrahydrofolate reducta se",
                "mutL homolog 1",
                "UDP glucuronyltransferase",
                "ATP-binding cassette group B gene 1",
                "x-ray cross-complementing group 1",
                "g lutathione-S-transferase P1",
                "excision repair cross-complementing gene 2"
            ],
            "symbol": [
                "TYMS-ER",
                "TYMS- 1494",
                "DPYD",
                "MTHFR",
                "MLH1",
                "UGT1A1",
                "ABCB1",
                "XRCC1",
                "GSTP1",
                "ERCC2"
            ]
        },
        "subject": {
            "id": "2B91.Z",
            "name": "Colorectal cancer",
            "type": "biolink:Disease",
            "icd11": "2B91.Z"
        }
    }

Problem 4. subject.uniprot

{
        "_id": "T01447_target_for_2B6B",
        "association": {
            "predicate": "biolink:target_for",
            "clinical_trial": [
                {
                    "status": "Phase 3",
                    "disease": "Nasopharyngeal cancer"
                }
            ]
        },
        "object": {
            "id": "2B6B",
            "icd11": "2B6B",
            "name": "Nasopharyngeal cancer",
            "type": "biolink:Disease"
        },
        "subject": {
            "id": "T01447",
            "name": "NEDD8-activating enzyme (NAE)",
            "type": "biolink:Protein",
            "target_id": "T01447",
            "uniprot": [
                "ULA1_HUMAN",
                "UBA3_HUMAN"
            ],
            "target_type": "clinical trial target"
        }
    }

@colleenXu
Copy link

colleenXu commented Jun 5, 2023

Err...throwing some ideas out here (also CC @andrewsu):

  • IDs:
    • uniprot here seems to be the Uniprot label rather than an actual ID
    • It would be nice if subject.id was a curie ("prefix:ID") and there was another field named with the prefix (like chembl_target) where the value is just the ID (a1234).
    • looks like sometimes there are no IDs to the object or subject? see problem 3 section of the above post...
    • can we map IDs or names / labels to other ID namespaces and include those other IDs (like uniprot IDs, NCBIGene, UMLS, ENSEMBL?)
  • for the problem 2 section of the above post:
    • some info in the subject seems like it'd go better in the association section of the record? Like subject.trial_status, subject.moa (not sure about object.target_type)?
  • I'm not sure that subject.type, object.type, association.predicate need to be biolink-model categories/predicates. It may be better to leave them as whatever they're called in the original data source. Then x-bte annotation can include the assignments of biolink-model terms...

@lucyzhang95
Copy link
Author

Thank you so much for your comments, Colleen!
I have several questions if you don't mind!

  • IDs:

    • uniport label is from the original source. They didn't provide an actual ID. What should we do with it?
    • Could you elaborate on the curie part? Do we have any examples of it so that I can refer to and adjust the parser?
    • Already fixed problem 3. Thank you for pointing it out!
    • I guess potentially we can? The original source doesn't have the actual uniprot ID, NCBIGene, UMLS, or ENSEMBL. The original source did provide the uniprot label and target (protein sequences). If we can access the uniport database, then we can match the uniport label and the protein sequence to the actual uniport ID? However, it might be hard to match to NCBIGene, ENSEMBL, since the reverse translation of protein sequence to DNA sequence might be arbitrary due to the codon system. People still do that all the time though. What will be your suggestion?
  • I can absolutely put the trial_status and moa in the association section of the record!

  • The original source doesn't provide any relationship between the two entities. Chunlei and I worked on defining the relationship together using biolink model. Here is a screenshot of the original resource for drug and target. What would you suggest to do in this case?
    Screenshot 2023-06-05 at 11 39 06 AM

Thanks again for your suggestions! 😊

@erikyao
Copy link
Contributor

erikyao commented Jun 5, 2023

Hi @lucyzhang95,

uniprot here seems to be the Uniprot label rather than an actual ID

I think Colleen meant that that uniprot field contains only labels instead of IDs. Typically we expect IDs. E.g. ULA1_HUMAN has ID Q13564 (link).

We can call some other API to get uniprot IDs from labels, if needed.

It would be nice if subject.id was a curie ("prefix:ID") and there was another field named with the prefix (like chembl_target) where the value is just the ID (a1234).

It's common practice for us to have something like:

{
   "id": "uniprot:Q13564",
   "uniprot": "Q13564"
}

where "uniprot:Q13564" is a CURIE (or Compact URI). It's just a format of IDs.

However I have no idea if we have a CURIE standard for TTD IDs like T55285, or idc11 IDs like 2B6B... (@colleenXu can you double-check? Thanks!)

@erikyao
Copy link
Contributor

erikyao commented Jun 5, 2023

@colleenXu @lucyzhang95 I found that we do have CURIE for TTD, see https://bioregistry.io/registry/ttd.target

@lucyzhang95
Copy link
Author

@erikyao
Hey Yao, the parser is ready for deployment on Monday (June. 26th)! I mapped all uniprot labels to the actual uniprot IDs and they are now included in the _id. For those that do not have uniprot entity, I leave the original internal TTD ids.

I also mapped their internal drug id to either pubchem_cid or chembi_id, which are also included in the _id.

Besides, I double-checked the _id for weird formatting to see if there are whitespaces, slashes, and backslashes. The current _ids are free of all of those. Please let me know if you still find other weird-formatted _ids or fields!

Thanks again for helping me out! I really appreciate it!

Updated info:
Github URL: https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/TTD_parser.py
Git Branch/Commit: master cc87ccb
No. Documents: 853055
Structure of documents: 1 record

{
    "_id":"D08WUK_treats_1D64",
    "association":{
        "predicate":"biolink:treats",
        "clinical_trial":[
            {
                "status":"phase 1b",
                "disease":"Middle East Respiratory Syndrome (MERS)"
            }
        ]
    },
    "object":{
        "id":"1D64",
        "icd11":"1D64",
        "name":"Middle East Respiratory Syndrome (MERS)",
        "type":"biolink:Disease"
    },
    "subject":{
        "id":"ttd_drug_id:D08WUK",
        "type":"biolink:Drug"
    }
}
{
    "_id":"W8TNQ9_target_for_1B5Y",
    "association":{
        "predicate":"biolink:target_for",
        "clinical_trial":[
            {
                "status":"Phase 2",
                "disease":"Staphylococcal/streptococcal disease"
            }
        ]
    },
    "object":{
        "id":"1B5Y",
        "icd11":"1B5Y",
        "name":"Staphylococcal/streptococcal disease",
        "type":"biolink:Disease"
    },
    "subject":{
        "id":"uniprot:W8TNQ9",
        "ttd_target_id":"T61547",
        "uniprot":[
            "W8TNQ9"
        ],
        "target_type":"clinical trial target",
        "name":"Staphylococcus Manganese transporter C (Stap-coc MntC)",
        "type":"biolink:Protein"
    }
}

@colleenXu
Copy link

colleenXu commented Jun 26, 2023

Sorry for not responding >.<.

Part 1

Biolink-model doesn't seem to include any ttd ID-namespaces (there's a target one and a drug one?) or ICD11. So Translator Node Norm likely doesn't either. BTE uses Node Norm to find equivalent IDs and human-readable labels, aka what IDs are actually the same "node"/entity.

EDIT: I know an effort was made for ttd.target -> uniprot IDs and for ttd.drug -> pubchem_cid and chembl_id. How many records have unmapped entities (only ID for subject/object is ttd.target or ttd.drug)?

Have mapping efforts been tried for icd11 IDs? (a Disease ID-namespace in biothink-model, like MONDO or DOID?)

Part 2

Could you make a table / list of the MetaTriples in this KP: unique combos of subject ID-prefix / subject-type / predicate / object ID-prefix / object-type? This is needed for the x-bte annotation

@colleenXu
Copy link

Note: updated my comment after noticing Lucy has done ttd.drug mapping...

@lucyzhang95
Copy link
Author

@colleenXu
Sorry for the late reply! I was having a bad headache yesterday!

Part 1:

  1. I checked the unmapped IDs that were using only internal ttd.target and ttd.drug. There are 27,955 records _id out of 853,055 that are unmapped either with ttd.target or ttd.drug. There are 248,819 entities (subject id or object id) that are unmapped either with internal ttd.target or ttd.drug IDs (has overlap).

  2. I did a quick lookup for mapping icd11 IDs with UMLS or Mondo. Both of them only map icd10 disease ontology, not icd11. Most of the data in TTD only have icd11 info, and 1 biomarker source has additional icd10 and icd9 info, but not every single disease has its corresponding icd10. It is going to be a little complicated to match the icd11 to UMLS or Mondo, but it is doable. We can first map the icd11 to icd10 and then map the icd10 to Mondo/UMLS.

What would you suggest me to do in this case?

Part 2:

Table for unique entities:

Entity Number of entities
subject ID-prefix 850,543
subject ID with no prefix 2,512
subject-type 853,055
predicate 853,055
object ID-prefix 820,063
object ID with no prefix 32,992
object-type 853,055
  • The subject ID with no prefix consists of disease icd11
  • The object ID with no prefix consists of icd11 and biomarkers for disease

I would love to learn how to do x-bte annotations and bte registry from you when you have time! Is it a bad time to have a quick chat with you this week?

@colleenXu
Copy link

colleenXu commented Jun 30, 2023

Overview

@lucyzhang95 and I have discussed the records / associations in this API and laid out what we need to know to write x-bte annotation (types of things, ID-prefixes and categories, relationships and predicates).

The next steps are:

  • Lucy will look into mapping / parsing issues noted in the "Types of Things" section
  • I'll write an example set of x-bte annotation, so Lucy has a starting point

Some notes:

Types of Things (entities) in this resource

There is more work that can be done to map IDs or add fields describing the relationships...

click to expand

Drug (chemicals) -> SmallMolecule

  • PUBCHEM.COMPOUND (some also have CHEBI)
  • TTD.DRUG

Target -> Protein (Gene)

  • UniProtKB
  • TTD.TARGET

Target - compound activity: Could use the paper's definitions of "what is the relationship, how strong is the relationship" to "map" the IC50/Ki/EC50 values to relationships? Because the paper defines these, maybe we'll have more success than we did with BindingDB here...

Disease

ICD11: we could do a mapping effort to get to ICD9 (partial support in Translator) or MONDO (has full support in Translator) -> EDIT: BASICALLY DONE, SEE BELOW

Biomarker

not going to write x-bte annotation for biomarker - disease relationships

  • some of these are Genes/Proteins. Could search by name -> get ID. Good IDs would be NCBIGene, ENSEMBL, HGNC, UniProtKB (stuff in biolink-model under Gene or Protein). Then they can go into the Protein / bucket.
  • there are many kinds of things here...so it's not easy to use as-is

what relationships (combos of subject-predicate-object) are in this resource

every row will become 2 x-bte operations (1 set, querying data from subject -> object and data form object -> subject)

click to expand

drug - disease relationships

Subject-id Subject-category predicate Object-id Object-category
PUBCHEM.COMPOUND SmallMolecule treats ICD11 Disease
PUBCHEM.COMPOUND SmallMolecule treats MONDO Disease
TTD.DRUG SmallMolecule treats ICD11 Disease
TTD.DRUG SmallMolecule treats MONDO Disease

target - disease relationships

Subject-id Subject-category predicate Object-id Object-category
UniProtKB Protein target_for ICD11 Disease
UniProtKB Protein target_for MONDO Disease
TTD.TARGET Protein target_for ICD11 Disease
TTD.TARGET Protein target_for MONDO Disease

drug - target relationships

Subject-id Subject-category predicate Object-id Object-category
PUBCHEM.COMPOUND SmallMolecule interacts_with UniProtKB Protein
CHEBI SmallMolecule interacts_with UniProtKB Protein
TTD.DRUG SmallMolecule interacts_with UniProtKB Protein
PUBCHEM.COMPOUND SmallMolecule interacts_with TTD.TARGET Protein
CHEBI SmallMolecule interacts_with TTD.TARGET Protein
TTD.DRUG SmallMolecule interacts_with TTD.TARGET Protein

@colleenXu
Copy link

colleenXu commented Jul 18, 2023

@lucyzhang95

Here's the example operations.

example operations

In the /query POST section:

      x-bte-kgs-operations:
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11'
      ## need to change subject.pubchem.compound field name for this operation to work
      # - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11-rev'

in the components section, the operations and response-mapping

  x-bte-kgs-operations:
  ## fields not included due to data-processing / biolink-modeling issues:
  ## - association.clinical_trial.status: possible values (there may be more that I don't know)
  ##     approved', 'phase 4', 'phase 3', 'phase 2', 'phase 1', 'terminated', 'withdrawn from market'...
    chebi_treats_icd11:
    ## 1788 records: https://pending.biothings.io/ttd/query?q=_exists_:subject.chebi%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.icd11
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: CHEBI
            semantic: SmallMolecule
        requestBodyType: object
        requestBody:  ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["subject.chebi", "association.predicate"]
            }
        outputs:
          - id: ICD11
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.icd11,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/icd11-obj"
        # testExamples:
        #   - qInput: "CHEBI:28001"            ## Vancomycin
        #     oneOutput: "ICD11:1A00-1A09"     ## Methicillin-resistant staphylococci infection
    chebi_treats_icd11-rev:
      - supportBatch: true
        useTemplating: true
        inputs:
          - id: ICD11
            semantic: Disease
        requestBodyType: object
        requestBody:  ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["object.icd11", "association.predicate"]
            }
        parameters:
          fields: >-
            subject.chebi,
            subject.name,
            object.name
          size: 1000
        outputs:
          - id: CHEBI
            semantic: SmallMolecule
        predicate: treated_by
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/chebi-subject"
        # testExamples:
        #   - qInput: "ICD11:1A00-CA43.1"     ## Inflammation 
        #     oneOutput: "CHEBI:2500"         ## Aescin / escin Ib
    pubchem_treats_icd11:
    ## 9615 records: https://pending.biothings.io/ttd/query?q=_exists_:subject.pubchem%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.icd11
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        requestBodyType: object
        requestBody:  ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["subject.pubchem.compound", "association.predicate"]
            }
        outputs:
          - id: ICD11
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.icd11,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/icd11-obj"
        # testExamples:
        #   - qInput: "PUBCHEM.COMPOUND:135428923"      ## EC20 / Folcepri
        #     oneOutput: "ICD11:2C73"                   ## Ovarian cancer
    # ## need to change subject.pubchem.compound field name for this operation to work
    # pubchem_treats_icd11-rev:
    #   - supportBatch: true
    #     useTemplating: true
    #     inputs:
    #       - id: ICD11
    #         semantic: Disease
    #     requestBodyType: object
    #     requestBody:  ## no prefix
    #       body: >-
    #         {
    #           "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
    #           "scopes": ["object.icd11", "association.predicate"]
    #         }
    #     parameters:
    #       fields: >-
    #         subject.pubchem.compound,
    #         subject.name,
    #         object.name
    #       size: 1000
    #     outputs:
    #       - id: "PUBCHEM.COMPOUND"
    #         semantic: SmallMolecule
    #     predicate: treated_by
    #     source: "infores:ttd"
    #     response_mapping:
    #       "$ref": "#/components/x-bte-response-mapping/pubchem-subject"
    #     # testExamples:
    #     #   - qInput: "ICD11:2B52"                           ## Ewing sarcoma
    #     #     oneOutput: "PUBCHEM.COMPOUND:24958200"         ## MK-4827
  x-bte-response-mapping:
    icd11-obj:
      ICD11: object.icd11               ## no prefix
      input_name: subject.name          ## BTE will use this for node name if node normalizer didn't provide
      output_name: object.name          ## BTE will use this for node name if node normalizer didn't provide
    chebi-subject:
      CHEBI: subject.chebi              ## no prefix
      input_name: object.name
      output_name: subject.name
    # ## need to change subject.pubchem.compound field name for this operation to work
    # pubchem-subject:
    #   "PUBCHEM.COMPOUND": subject.pubchem.compound        ## no prefix
    #   input_name: object.name
    #   output_name: subject.name

And after writing this example, I have a bunch of advice / commentary >.<. The first two collapsed sections are the important ones.

My advice on writing operations
  • For now, operations with PUBCHEM.COMPOUND as the output will not work. You'll need to change the parser to take out the period in those field keys (example: subject.pubchem.compound and object.pubchem.compound -> subject.pubchem_compound and object.pubchem_compound). After deploying those changes and adjusting operations to use those new field spellings, things should work.
    • I included an example that doesn't work right now.
  • I couldn't find records with "biolink:interacts_with" as the predicate. But I could find records with "biolink:associated_with"....perhaps these are the records for the drug - target relationships?
    • I'll leave it to you to pick the best predicate for the drug - target relationships...
  • use quotation marks when ID-namespaces/prefixes have periods in them
Explaining the comments on missing fields

In my examples, you'll see comments saying that some record fields aren't included in the parameters.fields and response-mapping. Because of TRAPI / biolink-model validation issues, we are only keeping some fields in the response-mapping (like keywords BTE fully transforms into TRAPI like output ID-namespace, input_name, output_name, pubmed...). It then makes sense to query only for those fields (with parameters.fields).

However, I think it's still useful to know what useful record fields could be retrieved in each set of operations. So I suggest that you write similar comments. Here's the fields I identified (using in the /metadata/fields endpoint response), that seem useful:

  • subject.target_type / object.target_type
  • subject.bioclass / object.bioclass
  • association.trial_status
  • association.moa
  • association.ki
  • association.ic50
  • association.ec50
  • association.clinical_trial.status

I think it'd also be useful to list the missing fields in a comment block at the top of the operations text, with possible values (for fields with limited number of possible values) or example values (for fields that are basically free text). I included this in my example operations as well.

Observations (only for later reference)
  • Sometimes a subject or object will have multiple kinds of IDs...so a record could be returned by multiple operations. It should be handled alright...but we could test the retrieval of such a record to see that only 1 edge is returned, not multiple.
  • An observation: it looks like the records with subject.icd11 sometimes have mappings to ICD10 and ICD9 (subject.icd10 and subject.icd9 fields). However, this may be for the biomarker data only...

@lucyzhang95
Copy link
Author

lucyzhang95 commented Jul 26, 2023

@colleenXu

Thank you so much again for the examples, detailed explanations, and comments! They are super helpful! I tried to write the rest of the x-bte-kgs-annotations, operations, and mappings using your example as a reference. I have some questions regarding the parameters.fields and x-bte-response-mapping.

PS:

  1. I have mapped the icd11 to mondo with the available biomarker data. I mapped the icd11 to icd9 and to mondo. The newest version of the API has not been deployed yet but will be included in the following x-bte-kgs-annotations and operation components.
  2. I changed pubchem.compound to pubchem_compound in the newer API.
  3. I changed biolink:associated_with to biolink:interacts_with

The full smartapi.yaml file can be found here: https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/smart_api/smartapi.yaml

In the /query POST section:

click to expand
- query
      x-bte-kgs-operations:
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_mondo'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_mondo-rev'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_mondo'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_mondo-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/uniprotkb_target_for_mondo'
      - $ref: '#/components/x-bte-kgs-operations/uniprotkb_target_for_mondo-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_mondo'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_mondo-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_icd11'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_uniprotkb'
      - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_uniprotkb-rev'
      # - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_ttd_target_id'
      # - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_ttd_target_id-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_interacts_with_uniprotkb'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_interacts_with_uniprotkb-rev'
      # - $ref: '#/components/x-bte-kgs-operations/pubchem_compound_interacts_with_ttd_target_id'
      # - $ref: '#/components/x-bte-kgs-operations/pubchem_compound_interacts_with_ttd_target_id-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_mondo'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_mondo-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd11'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd11-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd10'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd10-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd9'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd9-rev'

Comments: I commented out all the $ref with internal ttd ids, such as ttd_target_id, ttd_drug_id, and ttd_biomarker_id.
Questions: Does this - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_mondo' look right?

Component section: operations

chebi_treats_mondo:

click to expand
  x-bte-kgs-operations:
  ## in the API's records:
  ## - subjects and objects can be Protein/Gene (UniprotKB), SmallMolecule (PUBCHEM.COMPOUND, CHEBI),
  ##   Disease (MONDO, ICD11, ICD10, ICD9)
  ## - predicates can be treats, target_for, interacts_with
  ##   SmallMolecule_treats_Disease, Gene_target_for_Disease, SmallMolecule_interacts_with_Gene
  ## - BTE automatically puts prefix on MONDO IDs, but prefix has to be added to other ID inputs
  ## - currently, BTE will also accept response with Translator-prefix (api-response-transform module).
  ## - joinSafe is only needed if the delimiter isn't a comma
  ## - fields not included due to data-processing / biolink-modeling issues:
  ##   association.clinical_trial.status: possible values include:
  ##     'investigative', 'patented', 'phase 2', 'approved', 'phase 1', 'terminated', 'phase 3',
  ##     'discontinued in phase 2', 'phase 1/2', 'discontinued in phase 1', 'preclinical', 'discontinued in phase 3',
  ##     'phase 2/3', 'clinical trial', 'withdrawn from market', 'phase 4', 'phase 3 trial', 'discontinued in phase1/2',
  ##     'phase 2 trial', 'discontinued in preregistration', 'phase 2/3 trial', 'preregistration', 'phase 2a',
  ##     'phase 1 trial', 'phase 0', 'registered', 'approved (orphan drug)', 'phase 1b', 'phase 2b',
  ##     'discontinued in phase2/3', 'NDA filed', 'phase 1/2 trial', 'phase 1/2a', 'discontinued in phase 1 trial',
  ##     'discontinued in phase 4', 'phase 1b/2a', 'application submitted', 'approval submitted', 'BLA submitted',
  ##     'discontinued in phase 2a', 'discontinued in phase 2b'
    chebi_treats_mondo:
    ## https://pending.biothings.io/ttd/query?q=_exists_:subject.chebi%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.mondo
    ## 543 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: CHEBI
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ], 
              "scopes": ["subject.chebi", "association.predicate"] 
            }
        outputs:
          - id: "MONDO"
            semantic: Disease
        parameters:
          ## not including these fields due to data-processing / biolink-modeling issues
          ## - association.clinical_trial.status
          fields: >-
            object.mondo,
            object.icd11,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/mondo-object"
        # testExamples:
        #   - qInput: "CHEBI:6960"                    ## Moexipril
        #     oneOutput: "MONDO:0001134"              ## Hypertension

Comments and Questions:

  1. I added all the possible clinical_trial values in the comments, but not sure if it is necessary.
  2. Under the parameters.fields: I added object.icd11 but I am not sure if it is necessary to add it, since I have additional operations for chebi_treats_icd11. I have icd11 for every single record, but not mondo. It might provide additional info for the users when they query mondo?

pubchem_treats_mondo:

click to expand
    pubchem_treats_mondo:
    ##  https://pending.biothings.io/ttd/query?q=_exists_:subject.pubchem_compound%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.mondo
    ##  1604 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["subject.pubchem_compound", "association.predicate"]
            }
        outputs:
          - id: MONDO
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.mondo,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/mondo-object"
        # testExamples:
        #   - qInput: "PUBCHEM.COMPOUND:118063735"       ## LMB763 (Nidufexor)
        #     oneOutput: "MONDO:0004790"                 ## Non-alcoholic steatohepatitis
    pubchem_treats_mondo-rev:
      - supportBatch: true
        useTemplating: true
        inputs:
          - id: MONDO
            semantic: Disease
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["object.mondo", "association.predicate"]
            }
        outputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        parameters:
          fields: >-
            subject.pubchem_compound,
            subject.name,
            object.name
          size: 1000
        predicate: treated_by
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/pubchem-subject"
        # testExamples:
        #   - qInput: "MONDO:0021094"                      ## Human immunodeficiency virus infection
        #     oneOutput: "PUBCHEM.COMPOUND:103596027"      ## GS-1156

Comments and Questions:

  1. Is this pubchem_treats_mondo correct? For making the title, is it better to use pubchem_treats_mondo instead of pubchem_compound_treats_mondo ?

uniprotkb_target_for_mondo:

click to expand
    uniprotkb_target_for_mondo:
    ## https://biothings.ci.transltr.io/ttd/query?q=_exists_:subject.uniprotkb%20AND%20association.predicate:%22biolink:target_for%22%20AND%20_exists_:object.mondo
    ## 601 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: UniProtKB
            semantic: Protein
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:target_for"]') }} ],
              "scopes": ["subject.uniprotkb", "association.predicate"]
            }
        outputs:
          - id: MONDO
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.mondo,
            object.name,
            object.icd11,
            subject.name,
            subject.bioclass,
            subject.target_type
          size: 1000
        predicate: target_for
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/mondo-object"
        # testExamples:
        #   - qInput: "UniProtKB:Q08722"         ## Leukocyte surface antigen CD47
        #     oneOutput: "MONDO:0001351"         ## Ovarian cancer

Comments and Questions:

  1. Is it okay that I added subject.bioclass and subject.target_type in the parameters.fields?
  2. What will be a proper predicate for uniprotkb_target_for_mondo-rev?

chebi_interacts_with_uniprotkb:

click to expand
    chebi_interacts_with_uniprotkb:
    ## https://biothings.ci.transltr.io/ttd/query?q=_exists_:subject.chebi%20AND%20association.predicate:%22biolink:interacts_with%22%20AND%20_exists_:object.uniprotkb
    ## 2899 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: CHEBI
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:interacts_with"]') }} ],
              "scopes": ["subject.chebi", "association.predicate"]
            }
        outputs:
          - id: UniProtKB
            semantic: Protein
        parameters:
          fields: >-
            object.uniprotkb,
            object.bioclass,
            object.target_type,
            association.moa
          size: 1000
        predicate: interacts_with
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/uniprotkb-object"
        # testExamples:
        #   - qInput: "CHEBI:2696"        ## target_type:literature-reported target, bioclass:Acyltransferase
        #     oneOutput: "UniProtKB:Q8WYB5"

Comments and Questions:

  1. Is it allowed to add association.moa in the parameters.fields?
  2. For object.bioclass in the parameters.fields, only some records have bioclass, do I still include it in the parameters.fields?
  3. Also, I'm not sure how about the format of testExample. I only provided the uniprotkb as oneOutput:

x-bte-response-mapping:

click to expand
    chebi-subject:
      CHEBI: subject.chebi                           # no prefix
      input_name: object.name
      output_name: subject.name
    mondo-object:
      MONDO: object.mondo                            # no prefix
      input_name: subject.name
      output_name: object.name
    pubchem-subject:
      "PUBCHEM.COMPOUND": subject.pubchem_compound   # no prefix
      input_name: object.name
      output_name: subject.name
    icd11-object:
      ICD11: object.icd11                            # no prefix
      input_name: subject.name                       # BTE will use this for node name if node normalizer didn't provide
      output_name: object.name                       # BTE will use this for node name if node normalizer didn't provide
    uniprotkb-subject:
      UniProtKB: subject.uniprotkb                   # no prefix
      input_name: object.name
      output_name: subject.name
    uniprotkb-object:
      UniProtKB: object.uniprotkb
      input_name:
      output_name:

Comments and Questions:

  1. I am not sure what are input_name and output_name correspond to. If the output is not subject.name or object.name, instead it is object.target_type or subject.target_type. How should I write the response-mapping?
  2. How many pairs of response-mapping I should write? I think I have chebi and pubchem_compound only as subject id, and mondo/icd11 as object id; however, uniprotkb can be either subject id or object id.

Sorry about the extremely long post! We can definitely have another Slack huddle meeting if you have time!
Also, if you don't mind, could you explain a little about how to test if the annotations and response-mapping perform as expected locally if I want to do some queries just like other users?

@colleenXu
Copy link

Info from the convo @lucyzhang95 and I had Friday (7/28) afternoon (sorry for the belated posting >.<):

Reminders:

  • remember to add the sources: "infores:ttd" to all operations!
  • retrieving duplicate records (under both ttd.drug and pubchem.compound for instance) is an issue, which is why we'll write "no-scopes" requestBody templates for most operations (see notes for each section below)
  • Using Gene rather than Protein for UniProtKB IDs is more useful.
  • Lucy will try getting names for ttd.drug and ttd.target for this set of data because it's missing right now. This will allow BTE to have human-readable names for these IDs.
  • for now, we'll leave comments on the fields that aren't the output ID and node names. Since they'll take time to map to biolink-model / TRAPI validation standards...

Updated: relationships + NOT _exists_ filters

every row will become 2 x-bte operations (1 set, querying data from subject -> object and data form object -> subject)

drug - disease relationships
Subject-id Subject-category predicate Object-id Object-category
PUBCHEM.COMPOUND SmallMolecule treats ICD11 Disease
PUBCHEM.COMPOUND SmallMolecule treats MONDO Disease
TTD.DRUG SmallMolecule treats ICD11 Disease
TTD.DRUG SmallMolecule treats MONDO Disease

Reverse predicate is "treated_by"

We're not using CHEBI IDs because it looks like every subject with a CHEBI field also has a pubchem field (so this query (which only works right now before updates) gets no hits)

Pubchem and mondo are the preferred IDs. ttd.drug and icd11 are the backup / default, to use only when pubchem and mondo are unavailable:

  • pubchem + mondo: keep original format with filled-out scopes
  • pubchem + icd11: use empty-scopes format with NOT exists:object.mondo
  • ttd.drug + mondo: use empty-scopes format with NOT exists:subject.pubchem_compound (future field name)
  • ttd.drug + icd11: use empty-scopes format with NOT exists:subject.pubchem_compound AND NOT exists:object.mondo (future field name)
target (gene) - disease relationships
Subject-id Subject-category predicate Object-id Object-category
UniProtKB Gene target_for ICD11 Disease
UniProtKB Gene target_for MONDO Disease
TTD.TARGET Gene target_for ICD11 Disease
TTD.TARGET Gene target_for MONDO Disease

Reverse predicate is "has_target"

UniProtKB and MONDO are the preferred IDs. ttd.target and icd11 are the backup / default, to use only when UniProtKB and mondo are unavailable:

  • UniProtKB + mondo: keep format with filled-out scopes
  • UniProtKB + icd11: use empty-scopes format with NOT exists:object.mondo
  • ttd.target + mondo: use empty-scopes format with NOT exists:subject.uniprotkb
  • ttd.target + icd11: use empty-scopes format with NOT exists:subject.uniprotkb AND NOT exists:object.mondo
drug - target (gene) relationships

We're not using CHEBI IDs because it looks like every subject with a CHEBI field also has a pubchem field (so this query (which only works right now before updates) gets no hits)

Subject-id Subject-category predicate Object-id Object-category
PUBCHEM.COMPOUND SmallMolecule interacts_with UniProtKB Gene
TTD.DRUG SmallMolecule interacts_with UniProtKB Gene
PUBCHEM.COMPOUND SmallMolecule interacts_with TTD.TARGET Gene
TTD.DRUG SmallMolecule interacts_with TTD.TARGET Gene

pubchem and UniProtKB are the preferred IDs. ttd.drug and ttd.target are the backup / default, to use only when pubchem and UniProtKB are unavailable:

  • pubchem + UniProtKB: keep format with filled-out scopes
  • pubchem + ttd.target: use empty-scopes format with NOT exists:object.uniprotkb
  • ttd.drug + UniProtKB: use empty-scopes format with NOT exists:subject.pubchem_compound
  • ttd.drug + ttd.target: use empty-scopes format with NOT exists:subject.pubchem_compound AND NOT exists:object.uniprotkb

@colleenXu
Copy link

colleenXu commented Aug 1, 2023

And including this, from my review of earlier discussions...

These can also be left for later (not needed to get this SmartAPI / x-bte annotation written, registered, and used by BTE)...

Notes from earlier post, edited (not addressed during Friday meeting)

  • Target - compound activity (aka drug - target relationships?): Could use the paper's definitions of "what is the relationship, how strong is the relationship" to "map" the IC50/Ki/EC50 values to relationships? Because the paper defines these, maybe we'll have more success than we did with BindingDB Data source: BindingDB #70 (comment)...
  • some biomarkers are genes/proteins. Maybe if we could map their names to IDs, we could write x-bte annotation using those IDs... (Good ID-namespaces would be stuff in biolink-model for Gene or Protein, likeNCBIGene, ENSEMBL, HGNC, UniProtKB)

@colleenXu colleenXu assigned lucyzhang95 and unassigned erikyao Aug 3, 2023
@lucyzhang95
Copy link
Author

@colleenXu

Thank you so much for being super helpful! I really appreciate you spending the time to have multiple meetings with me! I have done updating the smartapi.yaml for ttd. The newest version of ttd has also been deployed on Translator APIs!

While testing the post query locally, I found one issue with the drug-target relationships. I used Postman for the testing since you taught me how to use it last time! The testing result showed "description": "Query processed successfully, retrieved 0 results." I am a bit confused since I can find the result here if I do it manually on the translator.io.

results from bte:biothings-explorer-trapi The result is extremely long! I'm sorry about that!
 bte:biothings-explorer-trapi:main SmartAPI Specs read from path: /Users/lucyzhang1116/Documents/biothings_explorer/data/smartapi_specs.json +0ms
  bte:smartapi-kg:SyncLoader Using single spec sync loader now. +3m
  bte:smartapi-kg:AllSpecsSyncLoader Fetching from file path: /Users/lucyzhang1116/Documents/biothings_explorer/data/smartapi_specs.json +3m
  bte:smartapi-kg:AllSpecsSyncLoader Hits in inputs: true +1ms
  bte:biothings-explorer-trapi:main MetaKG successfully loaded! +2ms
  bte:biothings-explorer-trapi:query_graph (1) Creating edges for manager... +3m
  bte:biothings-explorer-trapi:query_graph Query node missing categories...Looking for match... +381ms
  bte:biothings-explorer-trapi:QNode (1) Node "n0" has (1) entities at start. +3m
  bte:biothings-explorer-trapi:QNode (1) Node "n0" expanded initial curie. {"PUBCHEM_COMPOUND:12795894":["PUBCHEM_COMPOUND:12795894"]} +0ms
  bte:biothings-explorer-trapi:query_graph Query node missing categories...Looking for match... +280ms
  bte:biothings-explorer-trapi:QNode (1) Node "n1" has (1) entities at start. +280ms
  bte:biothings-explorer-trapi:QNode (1) Node "n1" expanded initial curie. {"UniProtKB:Q14534":["UniProtKB:Q14534"]} +0ms
  bte:biothings-explorer-trapi:QNode "n0" connected to "e01" +0ms
  bte:biothings-explorer-trapi:QNode "n1" connected to "e01" +0ms
  bte:biothings-explorer-trapi:QEdge (2) Created Edge "e01" Reverse = false +3m
  bte:biothings-explorer-trapi:main (3) All edges created [{"id":"e01","subject":{"id":"n0","category":["biolink:SmallMolecule"],"expandedCategories":["SmallMolecule"],"equivalentIDsUpdated":false,"curie":["PUBCHEM_COMPOUND:12795894"],"expanded_curie":{"PUBCHEM_COMPOUND:12795894":["PUBCHEM_COMPOUND:12795894"]},"entity_count":1,"held_curie":[],"held_expanded":{},"connected_to":{}},"object":{"id":"n1","category":["biolink:Gene"],"expandedCategories":["Gene"],"equivalentIDsUpdated":false,"curie":["UniProtKB:Q14534"],"expanded_curie":{"UniProtKB:Q14534":["UniProtKB:Q14534"]},"entity_count":1,"held_curie":[],"held_expanded":{},"connected_to":{}},"qualifier_constraints":[],"reverse":false,"executed":false,"logs":[],"records":[]}] +661ms
  bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 1 qEdges. +3m
  bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +0ms
  bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'e01' WITH entity count...(1) +0ms
  bte:biothings-explorer-trapi:edge-manager Checking entity max : (1)--(1) +0ms
  bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +0ms
  bte:biothings-explorer-trapi:QNode (8) Node "n1" holding ["UniProtKB:Q14534"] aside. +0ms
  bte:biothings-explorer-trapi:QEdge (8) Sub - Obj were same but chose subject (1) +0ms
  bte:biothings-explorer-trapi:edge-manager 'e01' : (1) --> (1) +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input node is n0 +3m
  bte:biothings-explorer-trapi:qedge2btedge Output node is n1 +0ms
  bte:biothings-explorer-trapi:qedge2btedge KG Filters: {
  bte:biothings-explorer-trapi:qedge2btedge   "input_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "SmallMolecule"
  bte:biothings-explorer-trapi:qedge2btedge   ],
  bte:biothings-explorer-trapi:qedge2btedge   "output_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "Gene"
  bte:biothings-explorer-trapi:qedge2btedge   ]
  bte:biothings-explorer-trapi:qedge2btedge } +0ms
  bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 1 qEdges. +1ms
  bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +0ms
  bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'e01' WITH entity count...(1) +0ms
  bte:biothings-explorer-trapi:edge-manager Checking entity max : (1)--(1) +0ms
  bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +1ms
  bte:biothings-explorer-trapi:QNode (8) Node "n1" holding ["UniProtKB:Q14534"] aside. +1ms
  bte:biothings-explorer-trapi:QEdge (8) Sub - Obj were same but chose subject (1) +0ms
  bte:biothings-explorer-trapi:edge-manager 'e01' : (1) --> (1) +0ms
  bte:biothings-explorer-trapi:edge-manager (5) Executing current edge >> "e01" +0ms
  bte:biothings-explorer-trapi:batch_edge_query Node Update Start +3m
  bte:biothings-explorer-trapi:nodeUpdateHandler Getting equivalent IDs... +3m
  bte:biothings-explorer-trapi:nodeUpdateHandler curies: {"SmallMolecule":["PUBCHEM_COMPOUND:12795894"]} +0ms
  bte:biothings-explorer-trapi:nodeUpdateHandler Got Edge Equivalent IDs successfully. +247ms
  bte:biothings-explorer-trapi:batch_edge_query Node Update Success +247ms
  bte:biothings-explorer-trapi:batch_edge_query Start to convert qEdges into APIEdges.... +1ms
  bte:biothings-explorer-trapi:qedge2btedge Input node is n0 +249ms
  bte:biothings-explorer-trapi:qedge2btedge Output node is n1 +0ms
  bte:biothings-explorer-trapi:qedge2btedge KG Filters: {
  bte:biothings-explorer-trapi:qedge2btedge   "input_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "SmallMolecule"
  bte:biothings-explorer-trapi:qedge2btedge   ],
  bte:biothings-explorer-trapi:qedge2btedge   "output_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "Gene"
  bte:biothings-explorer-trapi:qedge2btedge   ]
  bte:biothings-explorer-trapi:qedge2btedge } +0ms
  bte:biothings-explorer-trapi:qedge2btedge 1 APIs being used: ["Biothings Therapeutic Target Database API"] +0ms
  bte:biothings-explorer-trapi:qedge2btedge 4 SmartAPI edges are retrieved.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: PUBCHEM.COMPOUND +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: PUBCHEM.COMPOUND +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: TTD.DRUG +1ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: TTD.DRUG +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge No metaKG found for this query batch. +0ms
  bte:biothings-explorer-trapi:batch_edge_query qEdges are successfully converted into 0 APIEdges.... +1ms
  bte:biothings-explorer-trapi:edge-manager (X) Terminating..."e01" got 0 records. +249ms
  bte:biothings-explorer-trapi:worker1 Worker thread 1 completed task. +2s

I suspect this might due to smartapi.yaml was not written properly. In the x-bte-kgs-operations: I wrote:

body:

{"q": {{ queryInputs | replPrefix('association.predicate:"biolink:interacts_with"
             AND (_exists_:subject.name) AND (_exists_:object.name) AND subject.pubchem_compound') | dump}},
             "scopes": []
            }
full codes
    pubchem_interacts_with_uniprotkb:
    ## url: https://biothings.transltr.io/ttd/query?q=_exists_:subject.pubchem_compound%20AND%20association.predicate:%22biolink:interacts_with%22%20AND%20_exists_:object.uniprotkb%20AND%20_exists_:subject.name%20AND%20_exists_:object.name
    ## 38,914 records
    ## exists: subject.pubchem_compound, association.predicate.biolink:interacts_with, object.uniprotkb, subject.name, object.name
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {"q": {{ queryInputs | replPrefix('association.predicate:"biolink:interacts_with"
             AND (_exists_:subject.name) AND (_exists_:object.name) AND subject.pubchem_compound') | dump}},
             "scopes": []
            }
        outputs:
          - id: UniProtKB
            semantic: Gene
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.ic50, ec50, ki
        ## - association.trial_status
        ## - object.bioclass
        ## - object.target_type
          fields: >-
            object.uniprotkb,
            object.name,
            subject.name
          size: 1000
        predicate: interacts_with
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/uniprotkb-object"
        # testExamples:
        #   - qInput: "PUBCHEM_COMPOUND:12795894"     ## Procyanidin B-2 3,3'-di-O-gallate
        #     oneOutput: "UniProtKB:Q14534"           ## Squalene monooxygenase (SQLE)

Specifically, do you mind checking the body for me? I did mapping to allow both the drugs and targets to have names and to retrieve the results only when drugs and targets have names. I am not sure if AND (_exists_:subject.name) AND (_exists_:object.name) is correctly written.

Then for the x-bte-response-mapping: I wrote:

    pubchem-subject:
      "PUBCHEM.COMPOUND": subject.pubchem_compound   # no prefix
      input_name: object.name
      output_name: subject.name

    uniprotkb-object:
      UniProtKB: object.uniprotkb
      input_name: subject.name
      output_name: object.name

I can slack you tomorrow as well! I feel bad for bothering you during off-work hours, so I am posting the issue here for now! Thank you!

@lucyzhang95
Copy link
Author

@colleenXu

I know you are pretty busy about resolving the translator issues with the code freeze! So, no rush to get to this issue!
I did some troubleshooting and found interesting log messages regarding the comment I posted above!

The body part seems to have no issue as I can retrieve the result directly from TTD translator API with https://biothings.transltr.io/ttd/query?q=association.predicate:%22biolink:interacts_with%22%20AND%20(_exists_:subject.name)%20AND%20(_exists_:object.name)%20AND%20subject.pubchem_compound:12795894

The last several log messages are as following:

{
            "timestamp": "2023-08-30T23:31:26.234Z",
            "level": "DEBUG",
            "message": "BTE found 4 metaKG edges corresponding to e01. These metaKG edges comes from 1 unique APIs. They are Biothings Therapeutic Target Database API",
            "code": null
        },
        {
            "timestamp": "2023-08-30T23:31:26.235Z",
            "level": "WARNING",
            "message": "BTE didn't find any metaKG for this batch. Your query terminates.",
            "code": null
        },
        {
            "timestamp": "2023-08-30T23:31:26.235Z",
            "level": "INFO",
            "message": "e01 execution: 0 queries (0 success/0 fail) and (0) cached qEdges return (0) records",
            "code": null
        },
        {
            "timestamp": "2023-08-30T23:31:26.235Z",
            "level": "WARNING",
            "message": "qEdge (e01) got 0 records. Your query terminates.",
            "code": null
        }
    ]
}

The query stopped after "BTE didn't find any metaKG for this batch. Your query terminates." and I am not sure what's the cause! Could you do a test locally when you have time? Thank you!

@colleenXu
Copy link

colleenXu commented Sep 8, 2023

@lucyzhang95

No problem at all; in fact, I'm sorry for being so late in my reply >.<.

I think you've done great work, and we're super close to the finish line!

On the issue you identified

I think the issue is your queries, specifically the incorrect prefix for the pubchem compound IDs. It looks like you're using PUBCHEM_COMPOUND:12795894. But the correct format (for biolink-model) is with a period, not a _: PUBCHEM.COMPOUND:12795894.

I tested the operations that you may have been reviewing (pubchem_interacts_with_uniprotkb and pubchem_interacts_with_ttd_target_id?), and they behaved fine. I suspect they're the operations you had issues testing because the testExamples have the "PUBCHEM_COMPOUND" prefix.


I have two suggested "fixes"

click to expand

  • Both ttd_drug_id_treats_icd11 and its rev operation refer to subject.pubchem but that field doesn't exist. Those NOT _exists_ statements don't work right now. I imagine you meant the field name you fixed (subject.pubchem_compound)?
  • For_exists_:subject.name and _exists_:object.name in the requestBody…

Plus minor things I noticed in your comments (click to expand)

  • Lines 1660-1661: labels for testExamples seem switched (Anakinra is the drug?)
  • Line 738: the end of the url is odd (probably remove %20##%20%20TODO:%201604%20records?)
  • Line 1476: the url has an error (the association:predicate part isn't set correctly to "biolink:interacts_with")
  • I think you have some comments that are "TODO". Did you want to complete these tasks and then remove the comments?

@colleenXu
Copy link

colleenXu commented Sep 11, 2023

And a very minor thing I noticed: in the description, we may want to change TherapeuticTargetDatabase -> Therapeutic Target Database (TTD)

@colleenXu
Copy link

colleenXu commented Oct 19, 2023

After discussion with Lucy, I've taken responsibility for this issue.

The TTD SmartAPI yaml was put into the translator-api-registry repo and adjusted by following my two earlier posts above.

Everything was tested locally and worked. Then I registered the API in SmartAPI Registry and made the PR to add this API to BTE...and I tested the PR locally and it worked as well.

To test for yourself

send a POST request to the api-specific endpoint, BioThings TTD only. Like http://localhost:3000/v1/smartapi/e481efd21f8e8c1deac05662439c2294/query

Put this in the request body: It's querying with the gene Human immunodeficiency virus Envelope messenger RNA (HIV env mRNA)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["TTD.TARGET:T65414"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

You should get a response with this edge to TTD.DRUG:D0L3MP (VRX496):

                "3b4f17f07fc92cd5f0c3056aecda8aa0": {
                    "predicate": "biolink:interacts_with",
                    "subject": "TTD.TARGET:T65414",
                    "object": "TTD.DRUG:D0L3MP",
                    "attributes": [],
                    "sources": [
                        {
                            "resource_id": "infores:ttd",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-ttd",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:ttd"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-ttd"
                            ]
                        }
                    ]
                }
            }

@colleenXu
Copy link

Now being addressed by a different commit biothings/bte-server@58177d3. This is now deployed on dev/CI instances.

See Jackson's post here

@colleenXu colleenXu added On CI Match https://github.com/biothings/biothings_explorer/labels x-bte data source Data source pending to create a new API labels Nov 1, 2023
@colleenXu
Copy link

Note:

@colleenXu colleenXu added On Test Match https://github.com/biothings/biothings_explorer/labels and removed On CI Match https://github.com/biothings/biothings_explorer/labels labels Dec 22, 2023
@colleenXu
Copy link

colleenXu commented Feb 21, 2024

Closing this issue since the changes have been deployed to Prod with the Feb 2024 release.

I've confirmed that I can query BioThings TTD through BTE prod https://bte.transltr.io/v1/team/Service Provider/query with the example in #123 (comment) and get the expected response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api deployment done data source Data source pending to create a new API On Test Match https://github.com/biothings/biothings_explorer/labels x-bte
Projects
None yet
Development

No branches or pull requests

3 participants