API Therapeutic Target Database (TTD) Deployment #123

lucyzhang95 · 2023-06-01T20:11:26Z

Github URL: https://github.com/lucyzhang95/BioThings_TTD_Dataplugin
Git Branch/Commit: master fab9df6
No. Documents: 889851
Structure of documents: 3 records

{
  "_id": "D07OAC_associated_with_T71390",
  "association": {
    "predicate": "biolink:associated_with"
  },
  "object": {
    "id": "T71390",
    "type": "biolink:Protein",
    "target_id": "T71390",
    "uniprot": "S5A2_HUMAN",
    "target_type": "successful` target",
    "bioclass": "CH-CH donor oxidoreductase"
  },
  "subject": {
    "id": "D07OAC",
    "trial_status": "investigative",
    "type": "biolink:Drug",
    "moa": "inhibitor"
  }
}

{
  "_id": "D04SKM_treats_2A85",
  "association": {
    "predicate": "biolink:treats",
    "clinical_trial": [
      {
        "status": "phase 1",
        "disease": "Acute lymphoblastic leukaemia"
      }
    ]
  },
  "object": {
    "id": "2A85",
    "icd11": "2A85",
    "name": "Acute lymphoblastic leukaemia",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "D04SKM",
    "name": "CART-10 cells",
    "type": "biolink:Drug"
  }
}

{
  "_id": "T67162_target_for_BD10-BD1Z",
  "association": {
    "predicate": "biolink:target_for",
    "clinical_trial": [
      {
        "status": "Approved",
        "disease": "Heart failure"
      }
    ]
  },
  "object": {
    "id": "BD10-BD1Z",
    "icd11": "BD10-BD1Z",
    "name": "Heart failure",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "T67162",
    "name": "Dopamine D2 receptor (D2R)",
    "type": "biolink:Protein",
    "target_id": "T67162",
    "uniprot": "DRD2_HUMAN",
    "target_type": "successful target",
    "bioclass": "GPCR rhodopsin"
  }
}

erikyao · 2023-06-01T21:24:24Z

1. Problem with `object.id` and `object.icd11`

The following 1 document has object.id and object.icd11 as text, not a keyword.

{
  "_id": "D08WUK_treats_DF-1 vaccine",
  "association": {
    "predicate": "biolink:treats",
    "clinical_trial": [
      {
        "status": "phase 1b",
        "disease": "Middle East Respiratory Syndrome (MERS)"
      }
    ]
  },
  "object": {
    "id": "DF-1 vaccine",
    "icd11": "DF-1 vaccine",
    "name": "Middle East Respiratory Syndrome (MERS)",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "D08WUK",
    "name": "MVA-MERS-S",
    "type": "biolink:Drug"
  }
}

Solution: It could be an error in the source file. If not fixable, ok to ignore for now.

P.S. These two fields should still be indexed as keyword in ES.

2. Problem with `object.uniprot`

868 documents have multiple IDs in object.uniprot, as a whole string. E.g.

{
  "_id": "D04ZCZ_associated_with_T55285",
  "association": {
    "predicate": "biolink:associated_with"
  },
  "object": {
    "id": "T55285",
    "type": "biolink:Protein",
    "target_id": "T55285",
    "uniprot": "GLRA1_HUMAN; GLRA2_HUMAN; GLRA3_HUMAN; GLRA4_HUMAN; GLRB_HUMAN",
    "target_type": "successful target",
    "bioclass": "Neurotransmitter receptor"
  },
  "subject": {
    "id": "D04ZCZ",
    "trial_status": "terminated",
    "type": "biolink:Drug",
    "moa": "inhibitor"
  }
}

Solution: convert such a string into a list of strings.

3. Problem with `_id`, `object.name` and `object.symbol`

~12 documents have strange _id, object.name and object.symbol combination. E.g.

{
  "_id": "BM000024_TYMS-ER), thymidylate synthase 1494 (TYMS- 1494), dihydropyrimidine dehydrogenase (DPYD), methylenetetrahydrofolate reducta se (MTHFR), mutL homolog 1 (MLH1), UDP glucuronyltransferase (UGT1A1), ATP-binding cassette group B gene 1 (ABCB1), x-ray cross-complementing group 1 (XRCC1), g lutathione-S-transferase P1 (GSTP1), excision repair cross-complementing gene 2 (ERCC2_biomarker_for_Colorectal_cancer",
  "association": {
    "predicate": "biolink:biomarker_for"
  },
  "object": {
    "id": "BM000024",
    "type": "biolink:Biomarker",
    "name": "thymidylate synthase-enhancer region ",
    "symbol": "TYMS-ER), thymidylate synthase 1494 (TYMS- 1494), dihydropyrimidine dehydrogenase (DPYD), methylenetetrahydrofolate reducta se (MTHFR), mutL homolog 1 (MLH1), UDP glucuronyltransferase (UGT1A1), ATP-binding cassette group B gene 1 (ABCB1), x-ray cross-complementing group 1 (XRCC1), g lutathione-S-transferase P1 (GSTP1), excision repair cross-complementing gene 2 (ERCC2"
  },
  "subject": {
    "id": "2B91.Z",
    "name": "Colorectal cancer",
    "type": "biolink:Disease",
    "icd11": "2B91.Z"
  }
}

Solution: Looks like some records have a complicated string in a single cell, and got mis-read by the parser. Such strings could be double-quoted in the source file. Parser should be updated.

P.S. It's also possible that those records have missing or NA values, making the parser mis-read the wrong columns.

4. Problem with `subject.uniprot`

Same with the problem with object.uniprot. 372 documents involved. E.g.

{
  "_id": "T01447_target_for_2B6B",
  "association": {
    "predicate": "biolink:target_for",
    "clinical_trial": [
      {
        "status": "Phase 3",
        "disease": "Nasopharyngeal cancer"
      }
    ]
  },
  "object": {
    "id": "2B6B",
    "icd11": "2B6B",
    "name": "Nasopharyngeal cancer",
    "type": "biolink:Disease"
  },
  "subject": {
    "id": "T01447",
    "name": "NEDD8-activating enzyme (NAE)",
    "type": "biolink:Protein",
    "target_id": "T01447",
    "uniprot": "ULA1_HUMAN; UBA3_HUMAN",
    "target_type": "clinical trial target"
  }
}

Solution: convert such a string into a list of strings.

lucyzhang95 · 2023-06-03T00:47:36Z

Fixed problems 2, 3, and 4.

Updated Git Branch/Commit: master bd9d85c

The current parser outputs:

Problem 2. `object.uniprot`

{
        "_id": "D04ZCZ_associated_with_T55285",
        "association": {
            "predicate": "biolink:associated_with"
        },
        "object": {
            "id": "T55285",
            "type": "biolink:Protein",
            "target_id": "T55285",
            "uniprot": [
                "GLRA1_HUMAN",
                "GLRA2_HUMAN",
                "GLRA3_HUMAN",
                "GLRA4_HUMAN",
                "GLRB_HUMAN"
            ],
            "target_type": "successful target",
            "bioclass": "Neurotransmitter receptor"
        },
        "subject": {
            "id": "D04ZCZ",
            "trial_status": "terminated",
            "type": "biolink:Drug",
            "moa": "inhibitor"
        }
    }

Problem 3. `_id, object.name and object.symbol`

{
        "_id": "BM000024_biomarker_for_Colorectal_cancer",
        "association": {
            "predicate": "biolink:biomarker_for"
        },
        "object": {
            "name": [
                "thymidylate synthase-enhancer region",
                "thymidylate synthase 1494",
                "dihydropyrimidine dehydrogenase",
                "methylenetetrahydrofolate reducta se",
                "mutL homolog 1",
                "UDP glucuronyltransferase",
                "ATP-binding cassette group B gene 1",
                "x-ray cross-complementing group 1",
                "g lutathione-S-transferase P1",
                "excision repair cross-complementing gene 2"
            ],
            "symbol": [
                "TYMS-ER",
                "TYMS- 1494",
                "DPYD",
                "MTHFR",
                "MLH1",
                "UGT1A1",
                "ABCB1",
                "XRCC1",
                "GSTP1",
                "ERCC2"
            ]
        },
        "subject": {
            "id": "2B91.Z",
            "name": "Colorectal cancer",
            "type": "biolink:Disease",
            "icd11": "2B91.Z"
        }
    }

Problem 4. `subject.uniprot`

{
        "_id": "T01447_target_for_2B6B",
        "association": {
            "predicate": "biolink:target_for",
            "clinical_trial": [
                {
                    "status": "Phase 3",
                    "disease": "Nasopharyngeal cancer"
                }
            ]
        },
        "object": {
            "id": "2B6B",
            "icd11": "2B6B",
            "name": "Nasopharyngeal cancer",
            "type": "biolink:Disease"
        },
        "subject": {
            "id": "T01447",
            "name": "NEDD8-activating enzyme (NAE)",
            "type": "biolink:Protein",
            "target_id": "T01447",
            "uniprot": [
                "ULA1_HUMAN",
                "UBA3_HUMAN"
            ],
            "target_type": "clinical trial target"
        }
    }

colleenXu · 2023-06-05T16:15:34Z

Err...throwing some ideas out here (also CC @andrewsu):

IDs:
- uniprot here seems to be the Uniprot label rather than an actual ID
- It would be nice if subject.id was a curie ("prefix:ID") and there was another field named with the prefix (like chembl_target) where the value is just the ID (a1234).
- looks like sometimes there are no IDs to the object or subject? see problem 3 section of the above post...
- can we map IDs or names / labels to other ID namespaces and include those other IDs (like uniprot IDs, NCBIGene, UMLS, ENSEMBL?)
for the problem 2 section of the above post:
- some info in the subject seems like it'd go better in the association section of the record? Like subject.trial_status, subject.moa (not sure about object.target_type)?
I'm not sure that subject.type, object.type, association.predicate need to be biolink-model categories/predicates. It may be better to leave them as whatever they're called in the original data source. Then x-bte annotation can include the assignments of biolink-model terms...

lucyzhang95 · 2023-06-05T18:44:21Z

Thank you so much for your comments, Colleen!
I have several questions if you don't mind!

IDs:
- uniport label is from the original source. They didn't provide an actual ID. What should we do with it?
- Could you elaborate on the curie part? Do we have any examples of it so that I can refer to and adjust the parser?
- Already fixed problem 3. Thank you for pointing it out!
- I guess potentially we can? The original source doesn't have the actual uniprot ID, NCBIGene, UMLS, or ENSEMBL. The original source did provide the uniprot label and target (protein sequences). If we can access the uniport database, then we can match the uniport label and the protein sequence to the actual uniport ID? However, it might be hard to match to NCBIGene, ENSEMBL, since the reverse translation of protein sequence to DNA sequence might be arbitrary due to the codon system. People still do that all the time though. What will be your suggestion?
I can absolutely put the trial_status and moa in the association section of the record!
The original source doesn't provide any relationship between the two entities. Chunlei and I worked on defining the relationship together using biolink model. Here is a screenshot of the original resource for drug and target. What would you suggest to do in this case?

Thanks again for your suggestions! 😊

erikyao · 2023-06-05T18:59:36Z

Hi @lucyzhang95,

uniprot here seems to be the Uniprot label rather than an actual ID

I think Colleen meant that that uniprot field contains only labels instead of IDs. Typically we expect IDs. E.g. ULA1_HUMAN has ID Q13564 (link).

We can call some other API to get uniprot IDs from labels, if needed.

It would be nice if subject.id was a curie ("prefix:ID") and there was another field named with the prefix (like chembl_target) where the value is just the ID (a1234).

It's common practice for us to have something like:

{
   "id": "uniprot:Q13564",
   "uniprot": "Q13564"
}

where "uniprot:Q13564" is a CURIE (or Compact URI). It's just a format of IDs.

However I have no idea if we have a CURIE standard for TTD IDs like T55285, or idc11 IDs like 2B6B... (@colleenXu can you double-check? Thanks!)

erikyao · 2023-06-05T19:11:03Z

@colleenXu @lucyzhang95 I found that we do have CURIE for TTD, see https://bioregistry.io/registry/ttd.target

lucyzhang95 · 2023-06-25T03:09:43Z

@erikyao
Hey Yao, the parser is ready for deployment on Monday (June. 26th)! I mapped all uniprot labels to the actual uniprot IDs and they are now included in the _id. For those that do not have uniprot entity, I leave the original internal TTD ids.

I also mapped their internal drug id to either pubchem_cid or chembi_id, which are also included in the _id.

Besides, I double-checked the _id for weird formatting to see if there are whitespaces, slashes, and backslashes. The current _ids are free of all of those. Please let me know if you still find other weird-formatted _ids or fields!

Thanks again for helping me out! I really appreciate it!

Updated info:
Github URL: https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/TTD_parser.py
Git Branch/Commit: master cc87ccb
No. Documents: 853055
Structure of documents: 1 record

{
    "_id":"D08WUK_treats_1D64",
    "association":{
        "predicate":"biolink:treats",
        "clinical_trial":[
            {
                "status":"phase 1b",
                "disease":"Middle East Respiratory Syndrome (MERS)"
            }
        ]
    },
    "object":{
        "id":"1D64",
        "icd11":"1D64",
        "name":"Middle East Respiratory Syndrome (MERS)",
        "type":"biolink:Disease"
    },
    "subject":{
        "id":"ttd_drug_id:D08WUK",
        "type":"biolink:Drug"
    }
}

{
    "_id":"W8TNQ9_target_for_1B5Y",
    "association":{
        "predicate":"biolink:target_for",
        "clinical_trial":[
            {
                "status":"Phase 2",
                "disease":"Staphylococcal/streptococcal disease"
            }
        ]
    },
    "object":{
        "id":"1B5Y",
        "icd11":"1B5Y",
        "name":"Staphylococcal/streptococcal disease",
        "type":"biolink:Disease"
    },
    "subject":{
        "id":"uniprot:W8TNQ9",
        "ttd_target_id":"T61547",
        "uniprot":[
            "W8TNQ9"
        ],
        "target_type":"clinical trial target",
        "name":"Staphylococcus Manganese transporter C (Stap-coc MntC)",
        "type":"biolink:Protein"
    }
}

colleenXu · 2023-06-26T19:03:52Z

Sorry for not responding >.<.

Part 1

Biolink-model doesn't seem to include any ttd ID-namespaces (there's a target one and a drug one?) or ICD11. So Translator Node Norm likely doesn't either. BTE uses Node Norm to find equivalent IDs and human-readable labels, aka what IDs are actually the same "node"/entity.

EDIT: I know an effort was made for ttd.target -> uniprot IDs and for ttd.drug -> pubchem_cid and chembl_id. How many records have unmapped entities (only ID for subject/object is ttd.target or ttd.drug)?

Have mapping efforts been tried for icd11 IDs? (a Disease ID-namespace in biothink-model, like MONDO or DOID?)

Part 2

Could you make a table / list of the MetaTriples in this KP: unique combos of subject ID-prefix / subject-type / predicate / object ID-prefix / object-type? This is needed for the x-bte annotation

colleenXu · 2023-06-27T01:33:38Z

Note: updated my comment after noticing Lucy has done ttd.drug mapping...

lucyzhang95 · 2023-06-27T20:49:23Z

@colleenXu
Sorry for the late reply! I was having a bad headache yesterday!

Part 1:

I checked the unmapped IDs that were using only internal ttd.target and ttd.drug. There are 27,955 records _id out of 853,055 that are unmapped either with ttd.target or ttd.drug. There are 248,819 entities (subject id or object id) that are unmapped either with internal ttd.target or ttd.drug IDs (has overlap).
I did a quick lookup for mapping icd11 IDs with UMLS or Mondo. Both of them only map icd10 disease ontology, not icd11. Most of the data in TTD only have icd11 info, and 1 biomarker source has additional icd10 and icd9 info, but not every single disease has its corresponding icd10. It is going to be a little complicated to match the icd11 to UMLS or Mondo, but it is doable. We can first map the icd11 to icd10 and then map the icd10 to Mondo/UMLS.

What would you suggest me to do in this case?

Part 2:

Table for unique entities:

Entity	Number of entities
subject ID-prefix	850,543
subject ID with no prefix	2,512
subject-type	853,055
predicate	853,055
object ID-prefix	820,063
object ID with no prefix	32,992
object-type	853,055

The subject ID with no prefix consists of disease icd11
The object ID with no prefix consists of icd11 and biomarkers for disease

I would love to learn how to do x-bte annotations and bte registry from you when you have time! Is it a bad time to have a quick chat with you this week?

colleenXu · 2023-06-30T23:09:42Z

Overview

@lucyzhang95 and I have discussed the records / associations in this API and laid out what we need to know to write x-bte annotation (types of things, ID-prefixes and categories, relationships and predicates).

The next steps are:

Lucy will look into mapping / parsing issues noted in the "Types of Things" section
I'll write an example set of x-bte annotation, so Lucy has a starting point

Some notes:

Referencing https://github.com/lucyzhang95/BioThings_TTD_Dataplugin
Total number of records: 882,636

Types of Things (entities) in this resource

There is more work that can be done to map IDs or add fields describing the relationships...

click to expand

Drug (chemicals) -> SmallMolecule

PUBCHEM.COMPOUND (some also have CHEBI)
TTD.DRUG

Target -> Protein (Gene)

UniProtKB
TTD.TARGET

Target - compound activity: Could use the paper's definitions of "what is the relationship, how strong is the relationship" to "map" the IC50/Ki/EC50 values to relationships? Because the paper defines these, maybe we'll have more success than we did with BindingDB here...

Disease

ICD11: we could do a mapping effort to get to ICD9 (partial support in Translator) or MONDO (has full support in Translator) -> EDIT: BASICALLY DONE, SEE BELOW

Biomarker

not going to write x-bte annotation for biomarker - disease relationships

some of these are Genes/Proteins. Could search by name -> get ID. Good IDs would be NCBIGene, ENSEMBL, HGNC, UniProtKB (stuff in biolink-model under Gene or Protein). Then they can go into the Protein / bucket.
there are many kinds of things here...so it's not easy to use as-is

what relationships (combos of subject-predicate-object) are in this resource

every row will become 2 x-bte operations (1 set, querying data from subject -> object and data form object -> subject)

click to expand

drug - disease relationships

Subject-id	Subject-category	predicate	Object-id	Object-category
PUBCHEM.COMPOUND	SmallMolecule	treats	ICD11	Disease
PUBCHEM.COMPOUND	SmallMolecule	treats	MONDO	Disease
TTD.DRUG	SmallMolecule	treats	ICD11	Disease
TTD.DRUG	SmallMolecule	treats	MONDO	Disease

target - disease relationships

Subject-id	Subject-category	predicate	Object-id	Object-category
UniProtKB	Protein	target_for	ICD11	Disease
UniProtKB	Protein	target_for	MONDO	Disease
TTD.TARGET	Protein	target_for	ICD11	Disease
TTD.TARGET	Protein	target_for	MONDO	Disease

drug - target relationships

Subject-id	Subject-category	predicate	Object-id	Object-category
PUBCHEM.COMPOUND	SmallMolecule	interacts_with	UniProtKB	Protein
CHEBI	SmallMolecule	interacts_with	UniProtKB	Protein
TTD.DRUG	SmallMolecule	interacts_with	UniProtKB	Protein
PUBCHEM.COMPOUND	SmallMolecule	interacts_with	TTD.TARGET	Protein
CHEBI	SmallMolecule	interacts_with	TTD.TARGET	Protein
TTD.DRUG	SmallMolecule	interacts_with	TTD.TARGET	Protein

colleenXu · 2023-07-18T06:09:50Z

@lucyzhang95

Here's the example operations.

example operations

In the /query POST section:

      x-bte-kgs-operations:
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11'
      ## need to change subject.pubchem.compound field name for this operation to work
      # - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11-rev'

in the components section, the operations and response-mapping

  x-bte-kgs-operations:
  ## fields not included due to data-processing / biolink-modeling issues:
  ## - association.clinical_trial.status: possible values (there may be more that I don't know)
  ##     approved', 'phase 4', 'phase 3', 'phase 2', 'phase 1', 'terminated', 'withdrawn from market'...
    chebi_treats_icd11:
    ## 1788 records: https://pending.biothings.io/ttd/query?q=_exists_:subject.chebi%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.icd11
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: CHEBI
            semantic: SmallMolecule
        requestBodyType: object
        requestBody:  ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["subject.chebi", "association.predicate"]
            }
        outputs:
          - id: ICD11
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.icd11,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/icd11-obj"
        # testExamples:
        #   - qInput: "CHEBI:28001"            ## Vancomycin
        #     oneOutput: "ICD11:1A00-1A09"     ## Methicillin-resistant staphylococci infection
    chebi_treats_icd11-rev:
      - supportBatch: true
        useTemplating: true
        inputs:
          - id: ICD11
            semantic: Disease
        requestBodyType: object
        requestBody:  ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["object.icd11", "association.predicate"]
            }
        parameters:
          fields: >-
            subject.chebi,
            subject.name,
            object.name
          size: 1000
        outputs:
          - id: CHEBI
            semantic: SmallMolecule
        predicate: treated_by
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/chebi-subject"
        # testExamples:
        #   - qInput: "ICD11:1A00-CA43.1"     ## Inflammation 
        #     oneOutput: "CHEBI:2500"         ## Aescin / escin Ib
    pubchem_treats_icd11:
    ## 9615 records: https://pending.biothings.io/ttd/query?q=_exists_:subject.pubchem%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.icd11
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        requestBodyType: object
        requestBody:  ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["subject.pubchem.compound", "association.predicate"]
            }
        outputs:
          - id: ICD11
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.icd11,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/icd11-obj"
        # testExamples:
        #   - qInput: "PUBCHEM.COMPOUND:135428923"      ## EC20 / Folcepri
        #     oneOutput: "ICD11:2C73"                   ## Ovarian cancer
    # ## need to change subject.pubchem.compound field name for this operation to work
    # pubchem_treats_icd11-rev:
    #   - supportBatch: true
    #     useTemplating: true
    #     inputs:
    #       - id: ICD11
    #         semantic: Disease
    #     requestBodyType: object
    #     requestBody:  ## no prefix
    #       body: >-
    #         {
    #           "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
    #           "scopes": ["object.icd11", "association.predicate"]
    #         }
    #     parameters:
    #       fields: >-
    #         subject.pubchem.compound,
    #         subject.name,
    #         object.name
    #       size: 1000
    #     outputs:
    #       - id: "PUBCHEM.COMPOUND"
    #         semantic: SmallMolecule
    #     predicate: treated_by
    #     source: "infores:ttd"
    #     response_mapping:
    #       "$ref": "#/components/x-bte-response-mapping/pubchem-subject"
    #     # testExamples:
    #     #   - qInput: "ICD11:2B52"                           ## Ewing sarcoma
    #     #     oneOutput: "PUBCHEM.COMPOUND:24958200"         ## MK-4827
  x-bte-response-mapping:
    icd11-obj:
      ICD11: object.icd11               ## no prefix
      input_name: subject.name          ## BTE will use this for node name if node normalizer didn't provide
      output_name: object.name          ## BTE will use this for node name if node normalizer didn't provide
    chebi-subject:
      CHEBI: subject.chebi              ## no prefix
      input_name: object.name
      output_name: subject.name
    # ## need to change subject.pubchem.compound field name for this operation to work
    # pubchem-subject:
    #   "PUBCHEM.COMPOUND": subject.pubchem.compound        ## no prefix
    #   input_name: object.name
    #   output_name: subject.name

And after writing this example, I have a bunch of advice / commentary >.<. The first two collapsed sections are the important ones.

My advice on writing operations

For now, operations with PUBCHEM.COMPOUND as the output will not work. You'll need to change the parser to take out the period in those field keys (example: subject.pubchem.compound and object.pubchem.compound -> subject.pubchem_compound and object.pubchem_compound). After deploying those changes and adjusting operations to use those new field spellings, things should work.
- I included an example that doesn't work right now.
I couldn't find records with "biolink:interacts_with" as the predicate. But I could find records with "biolink:associated_with"....perhaps these are the records for the drug - target relationships?
- I'll leave it to you to pick the best predicate for the drug - target relationships...
use quotation marks when ID-namespaces/prefixes have periods in them

Explaining the comments on missing fields

In my examples, you'll see comments saying that some record fields aren't included in the parameters.fields and response-mapping. Because of TRAPI / biolink-model validation issues, we are only keeping some fields in the response-mapping (like keywords BTE fully transforms into TRAPI like output ID-namespace, input_name, output_name, pubmed...). It then makes sense to query only for those fields (with parameters.fields).

However, I think it's still useful to know what useful record fields could be retrieved in each set of operations. So I suggest that you write similar comments. Here's the fields I identified (using in the /metadata/fields endpoint response), that seem useful:

subject.target_type / object.target_type
subject.bioclass / object.bioclass
association.trial_status
association.moa
association.ki
association.ic50
association.ec50
association.clinical_trial.status

I think it'd also be useful to list the missing fields in a comment block at the top of the operations text, with possible values (for fields with limited number of possible values) or example values (for fields that are basically free text). I included this in my example operations as well.

Observations (only for later reference)

Sometimes a subject or object will have multiple kinds of IDs...so a record could be returned by multiple operations. It should be handled alright...but we could test the retrieval of such a record to see that only 1 edge is returned, not multiple.
An observation: it looks like the records with subject.icd11 sometimes have mappings to ICD10 and ICD9 (subject.icd10 and subject.icd9 fields). However, this may be for the biomarker data only...

lucyzhang95 · 2023-07-26T03:36:06Z

@colleenXu

Thank you so much again for the examples, detailed explanations, and comments! They are super helpful! I tried to write the rest of the x-bte-kgs-annotations, operations, and mappings using your example as a reference. I have some questions regarding the parameters.fields and x-bte-response-mapping.

PS:

I have mapped the icd11 to mondo with the available biomarker data. I mapped the icd11 to icd9 and to mondo. The newest version of the API has not been deployed yet but will be included in the following x-bte-kgs-annotations and operation components.
I changed pubchem.compound to pubchem_compound in the newer API.
I changed biolink:associated_with to biolink:interacts_with

The full smartapi.yaml file can be found here: https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/smart_api/smartapi.yaml

In the /query POST section:

click to expand

- query
      x-bte-kgs-operations:
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_mondo'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_mondo-rev'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11'
      - $ref: '#/components/x-bte-kgs-operations/chebi_treats_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_mondo'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_mondo-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/uniprotkb_target_for_mondo'
      - $ref: '#/components/x-bte-kgs-operations/uniprotkb_target_for_mondo-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_mondo'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_mondo-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_icd11'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_target_id_target_for_icd11-rev'
      - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_uniprotkb'
      - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_uniprotkb-rev'
      # - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_ttd_target_id'
      # - $ref: '#/components/x-bte-kgs-operations/chebi_interacts_with_ttd_target_id-rev'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_interacts_with_uniprotkb'
      - $ref: '#/components/x-bte-kgs-operations/pubchem_interacts_with_uniprotkb-rev'
      # - $ref: '#/components/x-bte-kgs-operations/pubchem_compound_interacts_with_ttd_target_id'
      # - $ref: '#/components/x-bte-kgs-operations/pubchem_compound_interacts_with_ttd_target_id-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_mondo'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_mondo-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd11'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd11-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd10'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd10-rev'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd9'
      # - $ref: '#/components/x-bte-kgs-operations/ttd_biomarker_id_biomarker_for_icd9-rev'

Comments: I commented out all the $ref with internal ttd ids, such as ttd_target_id, ttd_drug_id, and ttd_biomarker_id.
Questions: Does this - $ref: '#/components/x-bte-kgs-operations/pubchem_treats_mondo' look right?

Component section: operations

chebi_treats_mondo:

click to expand

  x-bte-kgs-operations:
  ## in the API's records:
  ## - subjects and objects can be Protein/Gene (UniprotKB), SmallMolecule (PUBCHEM.COMPOUND, CHEBI),
  ##   Disease (MONDO, ICD11, ICD10, ICD9)
  ## - predicates can be treats, target_for, interacts_with
  ##   SmallMolecule_treats_Disease, Gene_target_for_Disease, SmallMolecule_interacts_with_Gene
  ## - BTE automatically puts prefix on MONDO IDs, but prefix has to be added to other ID inputs
  ## - currently, BTE will also accept response with Translator-prefix (api-response-transform module).
  ## - joinSafe is only needed if the delimiter isn't a comma
  ## - fields not included due to data-processing / biolink-modeling issues:
  ##   association.clinical_trial.status: possible values include:
  ##     'investigative', 'patented', 'phase 2', 'approved', 'phase 1', 'terminated', 'phase 3',
  ##     'discontinued in phase 2', 'phase 1/2', 'discontinued in phase 1', 'preclinical', 'discontinued in phase 3',
  ##     'phase 2/3', 'clinical trial', 'withdrawn from market', 'phase 4', 'phase 3 trial', 'discontinued in phase1/2',
  ##     'phase 2 trial', 'discontinued in preregistration', 'phase 2/3 trial', 'preregistration', 'phase 2a',
  ##     'phase 1 trial', 'phase 0', 'registered', 'approved (orphan drug)', 'phase 1b', 'phase 2b',
  ##     'discontinued in phase2/3', 'NDA filed', 'phase 1/2 trial', 'phase 1/2a', 'discontinued in phase 1 trial',
  ##     'discontinued in phase 4', 'phase 1b/2a', 'application submitted', 'approval submitted', 'BLA submitted',
  ##     'discontinued in phase 2a', 'discontinued in phase 2b'
    chebi_treats_mondo:
    ## https://pending.biothings.io/ttd/query?q=_exists_:subject.chebi%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.mondo
    ## 543 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: CHEBI
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ], 
              "scopes": ["subject.chebi", "association.predicate"] 
            }
        outputs:
          - id: "MONDO"
            semantic: Disease
        parameters:
          ## not including these fields due to data-processing / biolink-modeling issues
          ## - association.clinical_trial.status
          fields: >-
            object.mondo,
            object.icd11,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/mondo-object"
        # testExamples:
        #   - qInput: "CHEBI:6960"                    ## Moexipril
        #     oneOutput: "MONDO:0001134"              ## Hypertension

Comments and Questions:

I added all the possible clinical_trial values in the comments, but not sure if it is necessary.
Under the parameters.fields: I added object.icd11 but I am not sure if it is necessary to add it, since I have additional operations for chebi_treats_icd11. I have icd11 for every single record, but not mondo. It might provide additional info for the users when they query mondo?

pubchem_treats_mondo:

click to expand

    pubchem_treats_mondo:
    ##  https://pending.biothings.io/ttd/query?q=_exists_:subject.pubchem_compound%20AND%20association.predicate:%22biolink:treats%22%20AND%20_exists_:object.mondo
    ##  1604 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["subject.pubchem_compound", "association.predicate"]
            }
        outputs:
          - id: MONDO
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.mondo,
            object.name,
            subject.name
          size: 1000
        predicate: treats
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/mondo-object"
        # testExamples:
        #   - qInput: "PUBCHEM.COMPOUND:118063735"       ## LMB763 (Nidufexor)
        #     oneOutput: "MONDO:0004790"                 ## Non-alcoholic steatohepatitis
    pubchem_treats_mondo-rev:
      - supportBatch: true
        useTemplating: true
        inputs:
          - id: MONDO
            semantic: Disease
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:treats"]') }} ],
              "scopes": ["object.mondo", "association.predicate"]
            }
        outputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        parameters:
          fields: >-
            subject.pubchem_compound,
            subject.name,
            object.name
          size: 1000
        predicate: treated_by
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/pubchem-subject"
        # testExamples:
        #   - qInput: "MONDO:0021094"                      ## Human immunodeficiency virus infection
        #     oneOutput: "PUBCHEM.COMPOUND:103596027"      ## GS-1156

Comments and Questions:

Is this pubchem_treats_mondo correct? For making the title, is it better to use pubchem_treats_mondo instead of pubchem_compound_treats_mondo ?

uniprotkb_target_for_mondo:

click to expand

    uniprotkb_target_for_mondo:
    ## https://biothings.ci.transltr.io/ttd/query?q=_exists_:subject.uniprotkb%20AND%20association.predicate:%22biolink:target_for%22%20AND%20_exists_:object.mondo
    ## 601 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: UniProtKB
            semantic: Protein
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:target_for"]') }} ],
              "scopes": ["subject.uniprotkb", "association.predicate"]
            }
        outputs:
          - id: MONDO
            semantic: Disease
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.clinical_trial.status
          fields: >-
            object.mondo,
            object.name,
            object.icd11,
            subject.name,
            subject.bioclass,
            subject.target_type
          size: 1000
        predicate: target_for
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/mondo-object"
        # testExamples:
        #   - qInput: "UniProtKB:Q08722"         ## Leukocyte surface antigen CD47
        #     oneOutput: "MONDO:0001351"         ## Ovarian cancer

Comments and Questions:

Is it okay that I added subject.bioclass and subject.target_type in the parameters.fields?
What will be a proper predicate for uniprotkb_target_for_mondo-rev?

chebi_interacts_with_uniprotkb:

click to expand

    chebi_interacts_with_uniprotkb:
    ## https://biothings.ci.transltr.io/ttd/query?q=_exists_:subject.chebi%20AND%20association.predicate:%22biolink:interacts_with%22%20AND%20_exists_:object.uniprotkb
    ## 2899 records
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: CHEBI
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {
              "q": [ {{ queryInputs | wrap( '["' , '","biolink:interacts_with"]') }} ],
              "scopes": ["subject.chebi", "association.predicate"]
            }
        outputs:
          - id: UniProtKB
            semantic: Protein
        parameters:
          fields: >-
            object.uniprotkb,
            object.bioclass,
            object.target_type,
            association.moa
          size: 1000
        predicate: interacts_with
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/uniprotkb-object"
        # testExamples:
        #   - qInput: "CHEBI:2696"        ## target_type:literature-reported target, bioclass:Acyltransferase
        #     oneOutput: "UniProtKB:Q8WYB5"

Comments and Questions:

Is it allowed to add association.moa in the parameters.fields?
For object.bioclass in the parameters.fields, only some records have bioclass, do I still include it in the parameters.fields?
Also, I'm not sure how about the format of testExample. I only provided the uniprotkb as oneOutput:

x-bte-response-mapping:

click to expand

    chebi-subject:
      CHEBI: subject.chebi                           # no prefix
      input_name: object.name
      output_name: subject.name
    mondo-object:
      MONDO: object.mondo                            # no prefix
      input_name: subject.name
      output_name: object.name
    pubchem-subject:
      "PUBCHEM.COMPOUND": subject.pubchem_compound   # no prefix
      input_name: object.name
      output_name: subject.name
    icd11-object:
      ICD11: object.icd11                            # no prefix
      input_name: subject.name                       # BTE will use this for node name if node normalizer didn't provide
      output_name: object.name                       # BTE will use this for node name if node normalizer didn't provide
    uniprotkb-subject:
      UniProtKB: subject.uniprotkb                   # no prefix
      input_name: object.name
      output_name: subject.name
    uniprotkb-object:
      UniProtKB: object.uniprotkb
      input_name:
      output_name:

Comments and Questions:

I am not sure what are input_name and output_name correspond to. If the output is not subject.name or object.name, instead it is object.target_type or subject.target_type. How should I write the response-mapping?
How many pairs of response-mapping I should write? I think I have chebi and pubchem_compound only as subject id, and mondo/icd11 as object id; however, uniprotkb can be either subject id or object id.

Sorry about the extremely long post! We can definitely have another Slack huddle meeting if you have time!
Also, if you don't mind, could you explain a little about how to test if the annotations and response-mapping perform as expected locally if I want to do some queries just like other users?

colleenXu · 2023-08-01T04:18:56Z

Info from the convo @lucyzhang95 and I had Friday (7/28) afternoon (sorry for the belated posting >.<):

Reminders:

remember to add the sources: "infores:ttd" to all operations!
retrieving duplicate records (under both ttd.drug and pubchem.compound for instance) is an issue, which is why we'll write "no-scopes" requestBody templates for most operations (see notes for each section below)
Using Gene rather than Protein for UniProtKB IDs is more useful.
Lucy will try getting names for ttd.drug and ttd.target for this set of data because it's missing right now. This will allow BTE to have human-readable names for these IDs.
for now, we'll leave comments on the fields that aren't the output ID and node names. Since they'll take time to map to biolink-model / TRAPI validation standards...

Updated: relationships + `NOT _exists_` filters

every row will become 2 x-bte operations (1 set, querying data from subject -> object and data form object -> subject)

drug - disease relationships

Subject-id	Subject-category	predicate	Object-id	Object-category
PUBCHEM.COMPOUND	SmallMolecule	treats	ICD11	Disease
PUBCHEM.COMPOUND	SmallMolecule	treats	MONDO	Disease
TTD.DRUG	SmallMolecule	treats	ICD11	Disease
TTD.DRUG	SmallMolecule	treats	MONDO	Disease

Reverse predicate is "treated_by"

We're not using CHEBI IDs because it looks like every subject with a CHEBI field also has a pubchem field (so this query (which only works right now before updates) gets no hits)

Pubchem and mondo are the preferred IDs. ttd.drug and icd11 are the backup / default, to use only when pubchem and mondo are unavailable:

pubchem + mondo: keep original format with filled-out scopes
pubchem + icd11: use empty-scopes format with NOT exists:object.mondo
ttd.drug + mondo: use empty-scopes format with NOT exists:subject.pubchem_compound (future field name)
ttd.drug + icd11: use empty-scopes format with NOT exists:subject.pubchem_compound AND NOT exists:object.mondo (future field name)

target (gene) - disease relationships

Subject-id	Subject-category	predicate	Object-id	Object-category
UniProtKB	Gene	target_for	ICD11	Disease
UniProtKB	Gene	target_for	MONDO	Disease
TTD.TARGET	Gene	target_for	ICD11	Disease
TTD.TARGET	Gene	target_for	MONDO	Disease

Reverse predicate is "has_target"

UniProtKB and MONDO are the preferred IDs. ttd.target and icd11 are the backup / default, to use only when UniProtKB and mondo are unavailable:

UniProtKB + mondo: keep format with filled-out scopes
UniProtKB + icd11: use empty-scopes format with NOT exists:object.mondo
ttd.target + mondo: use empty-scopes format with NOT exists:subject.uniprotkb
ttd.target + icd11: use empty-scopes format with NOT exists:subject.uniprotkb AND NOT exists:object.mondo

drug - target (gene) relationships

We're not using CHEBI IDs because it looks like every subject with a CHEBI field also has a pubchem field (so this query (which only works right now before updates) gets no hits)

Subject-id	Subject-category	predicate	Object-id	Object-category
PUBCHEM.COMPOUND	SmallMolecule	interacts_with	UniProtKB	Gene
TTD.DRUG	SmallMolecule	interacts_with	UniProtKB	Gene
PUBCHEM.COMPOUND	SmallMolecule	interacts_with	TTD.TARGET	Gene
TTD.DRUG	SmallMolecule	interacts_with	TTD.TARGET	Gene

pubchem and UniProtKB are the preferred IDs. ttd.drug and ttd.target are the backup / default, to use only when pubchem and UniProtKB are unavailable:

pubchem + UniProtKB: keep format with filled-out scopes
pubchem + ttd.target: use empty-scopes format with NOT exists:object.uniprotkb
ttd.drug + UniProtKB: use empty-scopes format with NOT exists:subject.pubchem_compound
ttd.drug + ttd.target: use empty-scopes format with NOT exists:subject.pubchem_compound AND NOT exists:object.uniprotkb

colleenXu · 2023-08-01T04:21:12Z

And including this, from my review of earlier discussions...

These can also be left for later (not needed to get this SmartAPI / x-bte annotation written, registered, and used by BTE)...

Notes from earlier post, edited (not addressed during Friday meeting)

Target - compound activity (aka drug - target relationships?): Could use the paper's definitions of "what is the relationship, how strong is the relationship" to "map" the IC50/Ki/EC50 values to relationships? Because the paper defines these, maybe we'll have more success than we did with BindingDB Data source: BindingDB #70 (comment)...

some biomarkers are genes/proteins. Maybe if we could map their names to IDs, we could write x-bte annotation using those IDs... (Good ID-namespaces would be stuff in biolink-model for Gene or Protein, likeNCBIGene, ENSEMBL, HGNC, UniProtKB)

lucyzhang95 · 2023-08-29T00:38:02Z

@colleenXu

Thank you so much for being super helpful! I really appreciate you spending the time to have multiple meetings with me! I have done updating the smartapi.yaml for ttd. The newest version of ttd has also been deployed on Translator APIs!

While testing the post query locally, I found one issue with the drug-target relationships. I used Postman for the testing since you taught me how to use it last time! The testing result showed "description": "Query processed successfully, retrieved 0 results." I am a bit confused since I can find the result here if I do it manually on the translator.io.

results from bte:biothings-explorer-trapi

The result is extremely long! I'm sorry about that!

 bte:biothings-explorer-trapi:main SmartAPI Specs read from path: /Users/lucyzhang1116/Documents/biothings_explorer/data/smartapi_specs.json +0ms
  bte:smartapi-kg:SyncLoader Using single spec sync loader now. +3m
  bte:smartapi-kg:AllSpecsSyncLoader Fetching from file path: /Users/lucyzhang1116/Documents/biothings_explorer/data/smartapi_specs.json +3m
  bte:smartapi-kg:AllSpecsSyncLoader Hits in inputs: true +1ms
  bte:biothings-explorer-trapi:main MetaKG successfully loaded! +2ms
  bte:biothings-explorer-trapi:query_graph (1) Creating edges for manager... +3m
  bte:biothings-explorer-trapi:query_graph Query node missing categories...Looking for match... +381ms
  bte:biothings-explorer-trapi:QNode (1) Node "n0" has (1) entities at start. +3m
  bte:biothings-explorer-trapi:QNode (1) Node "n0" expanded initial curie. {"PUBCHEM_COMPOUND:12795894":["PUBCHEM_COMPOUND:12795894"]} +0ms
  bte:biothings-explorer-trapi:query_graph Query node missing categories...Looking for match... +280ms
  bte:biothings-explorer-trapi:QNode (1) Node "n1" has (1) entities at start. +280ms
  bte:biothings-explorer-trapi:QNode (1) Node "n1" expanded initial curie. {"UniProtKB:Q14534":["UniProtKB:Q14534"]} +0ms
  bte:biothings-explorer-trapi:QNode "n0" connected to "e01" +0ms
  bte:biothings-explorer-trapi:QNode "n1" connected to "e01" +0ms
  bte:biothings-explorer-trapi:QEdge (2) Created Edge "e01" Reverse = false +3m
  bte:biothings-explorer-trapi:main (3) All edges created [{"id":"e01","subject":{"id":"n0","category":["biolink:SmallMolecule"],"expandedCategories":["SmallMolecule"],"equivalentIDsUpdated":false,"curie":["PUBCHEM_COMPOUND:12795894"],"expanded_curie":{"PUBCHEM_COMPOUND:12795894":["PUBCHEM_COMPOUND:12795894"]},"entity_count":1,"held_curie":[],"held_expanded":{},"connected_to":{}},"object":{"id":"n1","category":["biolink:Gene"],"expandedCategories":["Gene"],"equivalentIDsUpdated":false,"curie":["UniProtKB:Q14534"],"expanded_curie":{"UniProtKB:Q14534":["UniProtKB:Q14534"]},"entity_count":1,"held_curie":[],"held_expanded":{},"connected_to":{}},"qualifier_constraints":[],"reverse":false,"executed":false,"logs":[],"records":[]}] +661ms
  bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 1 qEdges. +3m
  bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +0ms
  bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'e01' WITH entity count...(1) +0ms
  bte:biothings-explorer-trapi:edge-manager Checking entity max : (1)--(1) +0ms
  bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +0ms
  bte:biothings-explorer-trapi:QNode (8) Node "n1" holding ["UniProtKB:Q14534"] aside. +0ms
  bte:biothings-explorer-trapi:QEdge (8) Sub - Obj were same but chose subject (1) +0ms
  bte:biothings-explorer-trapi:edge-manager 'e01' : (1) --> (1) +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input node is n0 +3m
  bte:biothings-explorer-trapi:qedge2btedge Output node is n1 +0ms
  bte:biothings-explorer-trapi:qedge2btedge KG Filters: {
  bte:biothings-explorer-trapi:qedge2btedge   "input_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "SmallMolecule"
  bte:biothings-explorer-trapi:qedge2btedge   ],
  bte:biothings-explorer-trapi:qedge2btedge   "output_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "Gene"
  bte:biothings-explorer-trapi:qedge2btedge   ]
  bte:biothings-explorer-trapi:qedge2btedge } +0ms
  bte:biothings-explorer-trapi:edge-manager (3) Edge manager is managing 1 qEdges. +1ms
  bte:biothings-explorer-trapi:edge-manager (4) Edges not yet executed = 1 +0ms
  bte:biothings-explorer-trapi:edge-manager (5) Sending next edge 'e01' WITH entity count...(1) +0ms
  bte:biothings-explorer-trapi:edge-manager Checking entity max : (1)--(1) +0ms
  bte:biothings-explorer-trapi:QEdge (8) Choosing lower entity count in edge... +1ms
  bte:biothings-explorer-trapi:QNode (8) Node "n1" holding ["UniProtKB:Q14534"] aside. +1ms
  bte:biothings-explorer-trapi:QEdge (8) Sub - Obj were same but chose subject (1) +0ms
  bte:biothings-explorer-trapi:edge-manager 'e01' : (1) --> (1) +0ms
  bte:biothings-explorer-trapi:edge-manager (5) Executing current edge >> "e01" +0ms
  bte:biothings-explorer-trapi:batch_edge_query Node Update Start +3m
  bte:biothings-explorer-trapi:nodeUpdateHandler Getting equivalent IDs... +3m
  bte:biothings-explorer-trapi:nodeUpdateHandler curies: {"SmallMolecule":["PUBCHEM_COMPOUND:12795894"]} +0ms
  bte:biothings-explorer-trapi:nodeUpdateHandler Got Edge Equivalent IDs successfully. +247ms
  bte:biothings-explorer-trapi:batch_edge_query Node Update Success +247ms
  bte:biothings-explorer-trapi:batch_edge_query Start to convert qEdges into APIEdges.... +1ms
  bte:biothings-explorer-trapi:qedge2btedge Input node is n0 +249ms
  bte:biothings-explorer-trapi:qedge2btedge Output node is n1 +0ms
  bte:biothings-explorer-trapi:qedge2btedge KG Filters: {
  bte:biothings-explorer-trapi:qedge2btedge   "input_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "SmallMolecule"
  bte:biothings-explorer-trapi:qedge2btedge   ],
  bte:biothings-explorer-trapi:qedge2btedge   "output_type": [
  bte:biothings-explorer-trapi:qedge2btedge     "Gene"
  bte:biothings-explorer-trapi:qedge2btedge   ]
  bte:biothings-explorer-trapi:qedge2btedge } +0ms
  bte:biothings-explorer-trapi:qedge2btedge 1 APIs being used: ["Biothings Therapeutic Target Database API"] +0ms
  bte:biothings-explorer-trapi:qedge2btedge 4 SmartAPI edges are retrieved.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: PUBCHEM.COMPOUND +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: PUBCHEM.COMPOUND +0ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: TTD.DRUG +1ms
  bte:biothings-explorer-trapi:qedge2btedge Input prefix: TTD.DRUG +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge 0 metaKG are created.... +0ms
  bte:biothings-explorer-trapi:qedge2btedge No metaKG found for this query batch. +0ms
  bte:biothings-explorer-trapi:batch_edge_query qEdges are successfully converted into 0 APIEdges.... +1ms
  bte:biothings-explorer-trapi:edge-manager (X) Terminating..."e01" got 0 records. +249ms
  bte:biothings-explorer-trapi:worker1 Worker thread 1 completed task. +2s

I suspect this might due to smartapi.yaml was not written properly. In the x-bte-kgs-operations: I wrote:

body:

{"q": {{ queryInputs | replPrefix('association.predicate:"biolink:interacts_with"
             AND (_exists_:subject.name) AND (_exists_:object.name) AND subject.pubchem_compound') | dump}},
             "scopes": []
            }

full codes

    pubchem_interacts_with_uniprotkb:
    ## url: https://biothings.transltr.io/ttd/query?q=_exists_:subject.pubchem_compound%20AND%20association.predicate:%22biolink:interacts_with%22%20AND%20_exists_:object.uniprotkb%20AND%20_exists_:subject.name%20AND%20_exists_:object.name
    ## 38,914 records
    ## exists: subject.pubchem_compound, association.predicate.biolink:interacts_with, object.uniprotkb, subject.name, object.name
      - supportBatch: true
        useTemplating: true ## flag to say templating is being used below
        inputs:
          - id: "PUBCHEM.COMPOUND"
            semantic: SmallMolecule
        requestBodyType: object
        requestBody: ## no prefix
          body: >-
            {"q": {{ queryInputs | replPrefix('association.predicate:"biolink:interacts_with"
             AND (_exists_:subject.name) AND (_exists_:object.name) AND subject.pubchem_compound') | dump}},
             "scopes": []
            }
        outputs:
          - id: UniProtKB
            semantic: Gene
        parameters:
        ## not including these fields due to data-processing / biolink-modeling issues
        ## - association.ic50, ec50, ki
        ## - association.trial_status
        ## - object.bioclass
        ## - object.target_type
          fields: >-
            object.uniprotkb,
            object.name,
            subject.name
          size: 1000
        predicate: interacts_with
        source: "infores:ttd"
        response_mapping:
          "$ref": "#/components/x-bte-response-mapping/uniprotkb-object"
        # testExamples:
        #   - qInput: "PUBCHEM_COMPOUND:12795894"     ## Procyanidin B-2 3,3'-di-O-gallate
        #     oneOutput: "UniProtKB:Q14534"           ## Squalene monooxygenase (SQLE)

Specifically, do you mind checking the body for me? I did mapping to allow both the drugs and targets to have names and to retrieve the results only when drugs and targets have names. I am not sure if AND (_exists_:subject.name) AND (_exists_:object.name) is correctly written.

Then for the x-bte-response-mapping: I wrote:

    pubchem-subject:
      "PUBCHEM.COMPOUND": subject.pubchem_compound   # no prefix
      input_name: object.name
      output_name: subject.name

    uniprotkb-object:
      UniProtKB: object.uniprotkb
      input_name: subject.name
      output_name: object.name

I can slack you tomorrow as well! I feel bad for bothering you during off-work hours, so I am posting the issue here for now! Thank you!

lucyzhang95 · 2023-08-31T01:09:02Z

@colleenXu

I know you are pretty busy about resolving the translator issues with the code freeze! So, no rush to get to this issue!
I did some troubleshooting and found interesting log messages regarding the comment I posted above!

The body part seems to have no issue as I can retrieve the result directly from TTD translator API with https://biothings.transltr.io/ttd/query?q=association.predicate:%22biolink:interacts_with%22%20AND%20(_exists_:subject.name)%20AND%20(_exists_:object.name)%20AND%20subject.pubchem_compound:12795894

The last several log messages are as following:

{
            "timestamp": "2023-08-30T23:31:26.234Z",
            "level": "DEBUG",
            "message": "BTE found 4 metaKG edges corresponding to e01. These metaKG edges comes from 1 unique APIs. They are Biothings Therapeutic Target Database API",
            "code": null
        },
        {
            "timestamp": "2023-08-30T23:31:26.235Z",
            "level": "WARNING",
            "message": "BTE didn't find any metaKG for this batch. Your query terminates.",
            "code": null
        },
        {
            "timestamp": "2023-08-30T23:31:26.235Z",
            "level": "INFO",
            "message": "e01 execution: 0 queries (0 success/0 fail) and (0) cached qEdges return (0) records",
            "code": null
        },
        {
            "timestamp": "2023-08-30T23:31:26.235Z",
            "level": "WARNING",
            "message": "qEdge (e01) got 0 records. Your query terminates.",
            "code": null
        }
    ]
}

The query stopped after "BTE didn't find any metaKG for this batch. Your query terminates." and I am not sure what's the cause! Could you do a test locally when you have time? Thank you!

colleenXu · 2023-09-08T08:04:31Z

@lucyzhang95

No problem at all; in fact, I'm sorry for being so late in my reply >.<.

I think you've done great work, and we're super close to the finish line!

On the issue you identified

I think the issue is your queries, specifically the incorrect prefix for the pubchem compound IDs. It looks like you're using PUBCHEM_COMPOUND:12795894. But the correct format (for biolink-model) is with a period, not a _: PUBCHEM.COMPOUND:12795894.

I tested the operations that you may have been reviewing (pubchem_interacts_with_uniprotkb and pubchem_interacts_with_ttd_target_id?), and they behaved fine. I suspect they're the operations you had issues testing because the testExamples have the "PUBCHEM_COMPOUND" prefix.

I have two suggested "fixes"

click to expand

Both ttd_drug_id_treats_icd11 and its rev operation refer to subject.pubchem but that field doesn't exist. Those NOT _exists_ statements don't work right now. I imagine you meant the field name you fixed (subject.pubchem_compound)?
For_exists_:subject.name and _exists_:object.name in the requestBody…
- I suggest only having these when the operations use namespaces that Translator's SRI Node Norm might not have data for: TTD.DRUG, TTD.TARGET, ICD11.
- And it turns out, right now, only a few operations were ones that "needed" this:
  - pubchem_interacts_with_ttd_target_id and its rev operation: Helpful to have _exists_:object.name because there are some records that don't have ttd_target_id names that we want to avoid using
  - ttd_drug_id_interacts_with_ttd_target_id and its rev operation: helpful to have both _exists_. I see records that don't have names for one or both fields…

Plus minor things I noticed in your comments (click to expand)

Lines 1660-1661: labels for testExamples seem switched (Anakinra is the drug?)
Line 738: the end of the url is odd (probably remove %20##%20%20TODO:%201604%20records?)
Line 1476: the url has an error (the association:predicate part isn't set correctly to "biolink:interacts_with")
I think you have some comments that are "TODO". Did you want to complete these tasks and then remove the comments?

colleenXu · 2023-09-11T19:50:54Z

And a very minor thing I noticed: in the description, we may want to change TherapeuticTargetDatabase -> Therapeutic Target Database (TTD)

@lucyzhang95

credit to @lucyzhang95. see https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/smart_api/smartapi.yaml for original file

noted in https://github.com/biothings/pending.api/issues/123\#issuecomment-1711251329

colleenXu · 2023-10-19T06:47:03Z

After discussion with Lucy, I've taken responsibility for this issue.

The TTD SmartAPI yaml was put into the translator-api-registry repo and adjusted by following my two earlier posts above.

Everything was tested locally and worked. Then I registered the API in SmartAPI Registry and made the PR to add this API to BTE...and I tested the PR locally and it worked as well.

To test for yourself

send a POST request to the api-specific endpoint, BioThings TTD only. Like http://localhost:3000/v1/smartapi/e481efd21f8e8c1deac05662439c2294/query

Put this in the request body: It's querying with the gene Human immunodeficiency virus Envelope messenger RNA (HIV env mRNA)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["TTD.TARGET:T65414"],
                    "categories": ["biolink:Gene"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

You should get a response with this edge to TTD.DRUG:D0L3MP (VRX496):

                "3b4f17f07fc92cd5f0c3056aecda8aa0": {
                    "predicate": "biolink:interacts_with",
                    "subject": "TTD.TARGET:T65414",
                    "object": "TTD.DRUG:D0L3MP",
                    "attributes": [],
                    "sources": [
                        {
                            "resource_id": "infores:ttd",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-ttd",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:ttd"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-ttd"
                            ]
                        }
                    ]
                }
            }

colleenXu · 2023-10-24T20:53:03Z

Now being addressed by a different commit biothings/bte-server@58177d3. This is now deployed on dev/CI instances.

See Jackson's post here

colleenXu · 2023-12-06T06:40:47Z

Note:

there was an update in the x-bte annotations for Update BioThings x-bte annotation to use filter biothings_explorer#726 (comment) but some operations weren't tested due to a bug
that bug's been fixed and I retested and those new operations work as-expected

colleenXu · 2024-02-21T09:27:12Z

Closing this issue since the changes have been deployed to Prod with the Feb 2024 release.

I've confirmed that I can query BioThings TTD through BTE prod https://bte.transltr.io/v1/team/Service Provider/query with the example in #123 (comment) and get the expected response.

erikyao self-assigned this Jun 2, 2023

colleenXu assigned lucyzhang95 and unassigned erikyao Aug 3, 2023

colleenXu referenced this issue in NCATS-Tangerine/translator-api-registry Oct 11, 2023

biothings ttd: add yaml

3df5d66

credit to @lucyzhang95. see https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/smart_api/smartapi.yaml for original file

colleenXu referenced this issue in NCATS-Tangerine/translator-api-registry Oct 19, 2023

biothings ttd: fix field name subject.pubchem_compound

bedf9eb

colleenXu referenced this issue in NCATS-Tangerine/translator-api-registry Oct 19, 2023

biothings ttd: make adjustments and minor fixes

478c079

noted in https://github.com/biothings/pending.api/issues/123\#issuecomment-1711251329

colleenXu mentioned this issue Oct 19, 2023

feat: add biothings ttd to api config biothings/biothings_explorer#746

Closed

colleenXu added On CI Match https://github.com/biothings/biothings_explorer/labels x-bte data source Data source pending to create a new API labels Nov 1, 2023

colleenXu added On Test Match https://github.com/biothings/biothings_explorer/labels and removed On CI Match https://github.com/biothings/biothings_explorer/labels labels Dec 22, 2023

colleenXu added the api deployment done label Feb 1, 2024

colleenXu closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Therapeutic Target Database (TTD) Deployment #123

API Therapeutic Target Database (TTD) Deployment #123

lucyzhang95 commented Jun 1, 2023 •

edited by erikyao

Loading

erikyao commented Jun 1, 2023

colleenXu commented Jun 2, 2023

colleenXu commented Jun 2, 2023

erikyao commented Jun 2, 2023

erikyao commented Jun 2, 2023 •

edited

Loading

lucyzhang95 commented Jun 3, 2023

colleenXu commented Jun 5, 2023 •

edited

Loading

lucyzhang95 commented Jun 5, 2023

erikyao commented Jun 5, 2023

erikyao commented Jun 5, 2023

lucyzhang95 commented Jun 25, 2023

colleenXu commented Jun 26, 2023 •

edited

Loading

colleenXu commented Jun 27, 2023

lucyzhang95 commented Jun 27, 2023

colleenXu commented Jun 30, 2023 •

edited

Loading

Drug (chemicals) -> SmallMolecule

Target -> Protein (Gene)

Disease

Biomarker

drug - disease relationships

target - disease relationships

drug - target relationships

colleenXu commented Jul 18, 2023 •

edited

Loading

lucyzhang95 commented Jul 26, 2023 •

edited

Loading

colleenXu commented Aug 1, 2023

colleenXu commented Aug 1, 2023 •

edited

Loading

lucyzhang95 commented Aug 29, 2023

lucyzhang95 commented Aug 31, 2023

colleenXu commented Sep 8, 2023 •

edited

Loading

colleenXu commented Sep 11, 2023 •

edited

Loading

colleenXu commented Oct 19, 2023 •

edited

Loading

colleenXu commented Oct 24, 2023

colleenXu commented Dec 6, 2023

colleenXu commented Feb 21, 2024 •

edited

Loading

API Therapeutic Target Database (TTD) Deployment #123

API Therapeutic Target Database (TTD) Deployment #123

Comments

lucyzhang95 commented Jun 1, 2023 • edited by erikyao Loading

erikyao commented Jun 1, 2023

colleenXu commented Jun 2, 2023

colleenXu commented Jun 2, 2023

erikyao commented Jun 2, 2023

erikyao commented Jun 2, 2023 • edited Loading

1. Problem with object.id and object.icd11

2. Problem with object.uniprot

3. Problem with _id, object.name and object.symbol

4. Problem with subject.uniprot

lucyzhang95 commented Jun 3, 2023

Fixed problems 2, 3, and 4.

The current parser outputs:

Problem 2. object.uniprot

Problem 3. _id, object.name and object.symbol

Problem 4. subject.uniprot

colleenXu commented Jun 5, 2023 • edited Loading

lucyzhang95 commented Jun 5, 2023

erikyao commented Jun 5, 2023

erikyao commented Jun 5, 2023

lucyzhang95 commented Jun 25, 2023

colleenXu commented Jun 26, 2023 • edited Loading

Part 1

Part 2

colleenXu commented Jun 27, 2023

lucyzhang95 commented Jun 27, 2023

Part 1:

Part 2:

Table for unique entities:

colleenXu commented Jun 30, 2023 • edited Loading

Overview

Types of Things (entities) in this resource

Drug (chemicals) -> SmallMolecule

Target -> Protein (Gene)

Disease

Biomarker

what relationships (combos of subject-predicate-object) are in this resource

drug - disease relationships

target - disease relationships

drug - target relationships

colleenXu commented Jul 18, 2023 • edited Loading

lucyzhang95 commented Jul 26, 2023 • edited Loading

In the /query POST section:

Component section: operations

chebi_treats_mondo:

pubchem_treats_mondo:

uniprotkb_target_for_mondo:

chebi_interacts_with_uniprotkb:

x-bte-response-mapping:

colleenXu commented Aug 1, 2023

Updated: relationships + NOT _exists_ filters

colleenXu commented Aug 1, 2023 • edited Loading

Notes from earlier post, edited (not addressed during Friday meeting)

lucyzhang95 commented Aug 29, 2023

lucyzhang95 commented Aug 31, 2023

colleenXu commented Sep 8, 2023 • edited Loading

On the issue you identified

I have two suggested "fixes"

colleenXu commented Sep 11, 2023 • edited Loading

colleenXu commented Oct 19, 2023 • edited Loading

colleenXu commented Oct 24, 2023

colleenXu commented Dec 6, 2023

colleenXu commented Feb 21, 2024 • edited Loading

lucyzhang95 commented Jun 1, 2023 •

edited by erikyao

Loading

erikyao commented Jun 2, 2023 •

edited

Loading

1. Problem with `object.id` and `object.icd11`

2. Problem with `object.uniprot`

3. Problem with `_id`, `object.name` and `object.symbol`

4. Problem with `subject.uniprot`

Problem 2. `object.uniprot`

Problem 3. `_id, object.name and object.symbol`

Problem 4. `subject.uniprot`

colleenXu commented Jun 5, 2023 •

edited

Loading

colleenXu commented Jun 26, 2023 •

edited

Loading

colleenXu commented Jun 30, 2023 •

edited

Loading

colleenXu commented Jul 18, 2023 •

edited

Loading

lucyzhang95 commented Jul 26, 2023 •

edited

Loading

Updated: relationships + `NOT _exists_` filters

colleenXu commented Aug 1, 2023 •

edited

Loading

colleenXu commented Sep 8, 2023 •

edited

Loading

colleenXu commented Sep 11, 2023 •

edited

Loading

colleenXu commented Oct 19, 2023 •

edited

Loading

colleenXu commented Feb 21, 2024 •

edited

Loading