-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Therapeutic Target Database (TTD) Deployment #123
Comments
See also #57 |
Note that once this is deployed, we'll need some issue tracking the writing / deployment / hooking-up of the SmartAPI yaml w/ x-bte annotation |
Also not sure how to handle the github issue assignment stuff here. For now, moving the pending repo issue to Yao's section of the project manager for Translator... |
Will deploy tomorrow @colleenXu |
1. Problem with
|
Fixed problems 2, 3, and 4.Updated Git Branch/Commit: master bd9d85c The current parser outputs:Problem 2.
|
Err...throwing some ideas out here (also CC @andrewsu):
|
Hi @lucyzhang95,
I think Colleen meant that that uniprot field contains only labels instead of IDs. Typically we expect IDs. E.g. We can call some other API to get uniprot IDs from labels, if needed.
It's common practice for us to have something like: {
"id": "uniprot:Q13564",
"uniprot": "Q13564"
} where However I have no idea if we have a CURIE standard for TTD IDs like |
@colleenXu @lucyzhang95 I found that we do have CURIE for TTD, see https://bioregistry.io/registry/ttd.target |
@erikyao I also mapped their internal drug id to either pubchem_cid or chembi_id, which are also included in the _id. Besides, I double-checked the _id for weird formatting to see if there are whitespaces, slashes, and backslashes. The current _ids are free of all of those. Please let me know if you still find other weird-formatted _ids or fields! Thanks again for helping me out! I really appreciate it! Updated info:
|
Sorry for not responding >.<. Part 1Biolink-model doesn't seem to include any ttd ID-namespaces (there's a target one and a drug one?) or ICD11. So Translator Node Norm likely doesn't either. BTE uses Node Norm to find equivalent IDs and human-readable labels, aka what IDs are actually the same "node"/entity. EDIT: I know an effort was made for ttd.target -> uniprot IDs and for ttd.drug -> pubchem_cid and chembl_id. How many records have unmapped entities (only ID for subject/object is ttd.target or ttd.drug)? Have mapping efforts been tried for icd11 IDs? (a Disease ID-namespace in biothink-model, like MONDO or DOID?) Part 2Could you make a table / list of the MetaTriples in this KP: unique combos of subject ID-prefix / subject-type / predicate / object ID-prefix / object-type? This is needed for the x-bte annotation |
Note: updated my comment after noticing Lucy has done ttd.drug mapping... |
@colleenXu Part 1:
What would you suggest me to do in this case? Part 2:Table for unique entities:
I would love to learn how to do x-bte annotations and bte registry from you when you have time! Is it a bad time to have a quick chat with you this week? |
Overview@lucyzhang95 and I have discussed the records / associations in this API and laid out what we need to know to write x-bte annotation (types of things, ID-prefixes and categories, relationships and predicates). The next steps are:
Some notes:
Types of Things (entities) in this resourceThere is more work that can be done to map IDs or add fields describing the relationships... click to expandDrug (chemicals) -> SmallMolecule
Target -> Protein (Gene)
Target - compound activity: Could use the paper's definitions of "what is the relationship, how strong is the relationship" to "map" the IC50/Ki/EC50 values to relationships? Because the paper defines these, maybe we'll have more success than we did with BindingDB here... DiseaseICD11: we could do a mapping effort to get to ICD9 (partial support in Translator) or MONDO (has full support in Translator) -> EDIT: BASICALLY DONE, SEE BELOW Biomarkernot going to write x-bte annotation for biomarker - disease relationships
what relationships (combos of subject-predicate-object) are in this resourceevery row will become 2 x-bte operations (1 set, querying data from subject -> object and data form object -> subject) click to expanddrug - disease relationships
target - disease relationships
drug - target relationships
|
Here's the example operations. example operationsIn the /query POST section:
in the components section, the operations and response-mapping
And after writing this example, I have a bunch of advice / commentary >.<. The first two collapsed sections are the important ones. My advice on writing operations
Explaining the comments on missing fieldsIn my examples, you'll see comments saying that some record fields aren't included in the parameters.fields and response-mapping. Because of TRAPI / biolink-model validation issues, we are only keeping some fields in the response-mapping (like keywords BTE fully transforms into TRAPI like output ID-namespace, However, I think it's still useful to know what useful record fields could be retrieved in each set of operations. So I suggest that you write similar comments. Here's the fields I identified (using in the /metadata/fields endpoint response), that seem useful:
I think it'd also be useful to list the missing fields in a comment block at the top of the operations text, with possible values (for fields with limited number of possible values) or example values (for fields that are basically free text). I included this in my example operations as well. Observations (only for later reference)
|
Thank you so much again for the examples, detailed explanations, and comments! They are super helpful! I tried to write the rest of the x-bte-kgs-annotations, operations, and mappings using your example as a reference. I have some questions regarding the PS:
The full smartapi.yaml file can be found here: https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/smart_api/smartapi.yaml In the /query POST section:click to expand
Comments: I commented out all the $ref with internal ttd ids, such as ttd_target_id, ttd_drug_id, and ttd_biomarker_id. Component section: operationschebi_treats_mondo:click to expand
Comments and Questions:
pubchem_treats_mondo:click to expand
Comments and Questions:
uniprotkb_target_for_mondo:click to expand
Comments and Questions:
chebi_interacts_with_uniprotkb:click to expand
Comments and Questions:
x-bte-response-mapping:click to expand
Comments and Questions:
Sorry about the extremely long post! We can definitely have another Slack huddle meeting if you have time! |
Info from the convo @lucyzhang95 and I had Friday (7/28) afternoon (sorry for the belated posting >.<): Reminders:
Updated: relationships +
|
Subject-id | Subject-category | predicate | Object-id | Object-category |
---|---|---|---|---|
PUBCHEM.COMPOUND | SmallMolecule | treats | ICD11 | Disease |
PUBCHEM.COMPOUND | SmallMolecule | treats | MONDO | Disease |
TTD.DRUG | SmallMolecule | treats | ICD11 | Disease |
TTD.DRUG | SmallMolecule | treats | MONDO | Disease |
Reverse predicate is "treated_by"
We're not using CHEBI IDs because it looks like every subject with a CHEBI field also has a pubchem field (so this query (which only works right now before updates) gets no hits)
Pubchem and mondo are the preferred IDs. ttd.drug and icd11 are the backup / default, to use only when pubchem and mondo are unavailable:
- pubchem + mondo: keep original format with filled-out scopes
- pubchem + icd11: use empty-scopes format with NOT exists:object.mondo
- ttd.drug + mondo: use empty-scopes format with NOT exists:subject.pubchem_compound (future field name)
- ttd.drug + icd11: use empty-scopes format with NOT exists:subject.pubchem_compound AND NOT exists:object.mondo (future field name)
target (gene) - disease relationships
Subject-id | Subject-category | predicate | Object-id | Object-category |
---|---|---|---|---|
UniProtKB | Gene | target_for | ICD11 | Disease |
UniProtKB | Gene | target_for | MONDO | Disease |
TTD.TARGET | Gene | target_for | ICD11 | Disease |
TTD.TARGET | Gene | target_for | MONDO | Disease |
Reverse predicate is "has_target"
UniProtKB and MONDO are the preferred IDs. ttd.target and icd11 are the backup / default, to use only when UniProtKB and mondo are unavailable:
- UniProtKB + mondo: keep format with filled-out scopes
- UniProtKB + icd11: use empty-scopes format with NOT exists:object.mondo
- ttd.target + mondo: use empty-scopes format with NOT exists:subject.uniprotkb
- ttd.target + icd11: use empty-scopes format with NOT exists:subject.uniprotkb AND NOT exists:object.mondo
drug - target (gene) relationships
We're not using CHEBI IDs because it looks like every subject with a CHEBI field also has a pubchem field (so this query (which only works right now before updates) gets no hits)
Subject-id | Subject-category | predicate | Object-id | Object-category |
---|---|---|---|---|
PUBCHEM.COMPOUND | SmallMolecule | interacts_with | UniProtKB | Gene |
TTD.DRUG | SmallMolecule | interacts_with | UniProtKB | Gene |
PUBCHEM.COMPOUND | SmallMolecule | interacts_with | TTD.TARGET | Gene |
TTD.DRUG | SmallMolecule | interacts_with | TTD.TARGET | Gene |
pubchem and UniProtKB are the preferred IDs. ttd.drug and ttd.target are the backup / default, to use only when pubchem and UniProtKB are unavailable:
- pubchem + UniProtKB: keep format with filled-out scopes
- pubchem + ttd.target: use empty-scopes format with NOT exists:object.uniprotkb
- ttd.drug + UniProtKB: use empty-scopes format with NOT exists:subject.pubchem_compound
- ttd.drug + ttd.target: use empty-scopes format with NOT exists:subject.pubchem_compound AND NOT exists:object.uniprotkb
And including this, from my review of earlier discussions... These can also be left for later (not needed to get this SmartAPI / x-bte annotation written, registered, and used by BTE)... Notes from earlier post, edited (not addressed during Friday meeting)
|
Thank you so much for being super helpful! I really appreciate you spending the time to have multiple meetings with me! I have done updating the smartapi.yaml for ttd. The newest version of ttd has also been deployed on Translator APIs! While testing the post query locally, I found one issue with the drug-target relationships. I used Postman for the testing since you taught me how to use it last time! The testing result showed results from bte:biothings-explorer-trapiThe result is extremely long! I'm sorry about that!
I suspect this might due to smartapi.yaml was not written properly. In the
full codes
Specifically, do you mind checking the Then for the
I can slack you tomorrow as well! I feel bad for bothering you during off-work hours, so I am posting the issue here for now! Thank you! |
I know you are pretty busy about resolving the translator issues with the code freeze! So, no rush to get to this issue! The body part seems to have no issue as I can retrieve the result directly from TTD translator API with The last several log messages are as following:
The query stopped after |
No problem at all; in fact, I'm sorry for being so late in my reply >.<. I think you've done great work, and we're super close to the finish line! On the issue you identifiedI think the issue is your queries, specifically the incorrect prefix for the pubchem compound IDs. It looks like you're using I tested the operations that you may have been reviewing ( I have two suggested "fixes"click to expand
Plus minor things I noticed in your comments (click to expand)
|
And a very minor thing I noticed: in the description, we may want to change |
credit to @lucyzhang95. see https://github.com/lucyzhang95/BioThings_TTD_Dataplugin/blob/master/smart_api/smartapi.yaml for original file
After discussion with Lucy, I've taken responsibility for this issue. The TTD SmartAPI yaml was put into the translator-api-registry repo and adjusted by following my two earlier posts above. Everything was tested locally and worked. Then I registered the API in SmartAPI Registry and made the PR to add this API to BTE...and I tested the PR locally and it worked as well. To test for yourself
send a POST request to the api-specific endpoint, BioThings TTD only. Like Put this in the request body: It's querying with the gene
You should get a response with this edge to TTD.DRUG:D0L3MP (VRX496):
|
Now being addressed by a different commit biothings/bte-server@58177d3. This is now deployed on dev/CI instances. See Jackson's post here |
Note:
|
Closing this issue since the changes have been deployed to Prod with the Feb 2024 release. I've confirmed that I can query BioThings TTD through BTE prod |
The text was updated successfully, but these errors were encountered: