Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InChIKey14s can contain duplicate MOA/Target Info #17

Closed
gwaybio opened this issue Apr 22, 2020 · 5 comments
Closed

InChIKey14s can contain duplicate MOA/Target Info #17

gwaybio opened this issue Apr 22, 2020 · 5 comments

Comments

@gwaybio
Copy link
Member

gwaybio commented Apr 22, 2020

In #12 we used InChIKey14 to map broad_ids and in #11 we discussed why this is important.

While processing some data, I noticed that InChiKey14s do not map uniquely to MOA and Targets. I guess this is not surprising given that drugs are often used for different indications in various clinical phases, but it is worth documenting here! It is dangerous to use InChIKeys14s to map directly to MOA/Targets.

For example, InChIKey14 KTEIFNKAUNYNJU maps to two MOA/Targets. However, it looks like the full InChIKey does map uniquely. I didn't comprehensively explore this.

image

@niranjchandrasekaran - maybe I missed this, but was there a reason to use InChiKey14 instead of the full InChiKey?

@niranjchandrasekaran
Copy link
Member

@gwaygenomics I used InChIKey14 instead of InChIKey because the latter suffers from the same problem as broad_id, which is, both account for a compound's stereochemistry. If compounds have different stereochemistry across different repurposing hub versions, we wouldn't be able to map across versions. In your example above (#17 (comment)) the first four rows represent one isomer while the last three represent another. The broad_id and InChIKey are the same for the first four compounds and the last three compounds while InChIKey14 is the same across all of them.

As we briefly discussed in #11 (comment), ignoring stereochemistry may not be ideal. If different stereoisomers have different MOA annotations that are significantly different, perhaps the strategy of using InChIKey14 as the common field for mapping across the different repurposing hub versions is inadequate.

@gwaybio
Copy link
Member Author

gwaybio commented Apr 23, 2020

We discussed this issue in the profiling checkin - the full summary is here #11 (comment)

The pertinent info for this issue is:

To solve the different stereoisomer issues, we will create an alternate_moa and alternate_target column in the cases where the same InChiKey14 maps to two different moa/targets on the basis of different stereochemistry.

@gwaybio
Copy link
Member Author

gwaybio commented Apr 23, 2020

Concretely, the profiles for the compound above would look like this:

Metadata_broad_id Metadata_moa Metadata_target Metadata_alternative_moa Metadata_alternative_target
BRD-K78431006 (or whichever 2016 Broad ID matches to InChiKey14 KTEIFNKAUNYNJU) ALK tyrosine kinase receptor inhibitor ALK,MET MTH1 inhibitor NUDT1

We will also have to make some manual ordering decisions (i.e. which one is primary and alternative moa).

@niranjchandrasekaran
Copy link
Member

@gwaygenomics I believe the markdown renderer mistook the pipe between ALK and MET to indicate column separation in the markdown table. Just wanted to bring that to your attention.

@gwaybio
Copy link
Member Author

gwaybio commented Apr 23, 2020

thanks - updated

@gwaybio gwaybio closed this as completed Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants