Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBBind Dataset #715

Closed
hkmztrk opened this issue Aug 2, 2017 · 10 comments
Closed

PDBBind Dataset #715

hkmztrk opened this issue Aug 2, 2017 · 10 comments

Comments

@hkmztrk
Copy link

hkmztrk commented Aug 2, 2017

I failed to find the exact data set that's used in in this study (https://arxiv.org/pdf/1703.10603.pdf)
Could you please help me with that?

Thanks!

@lilleswing
Copy link
Member

@hkmztrk are you looking for the featurized datasets with the atomicconv featurizer? These scripts will get you the featurized core and refined sets.
https://github.com/deepchem/deepchem/blob/master/contrib/atomicconv/acnn/core/get_acnn_core.sh
https://github.com/deepchem/deepchem/blob/master/contrib/atomicconv/acnn/refined/get_acnn_refined.sh

If you are looking for the raw dataset (protein/ligand co-crystallized pairs) you can use this script.
https://github.com/deepchem/deepchem/blob/master/examples/pdbbind/get_pdbbind.sh

@hkmztrk
Copy link
Author

hkmztrk commented Aug 2, 2017

@lilleswing I was looking for the raw data set, thank you! It contains binding affinities of the pairs, right?

@hkmztrk hkmztrk closed this as completed Aug 2, 2017
@hkmztrk
Copy link
Author

hkmztrk commented Aug 20, 2017

Hello again,

Sorry if this is a trivial question!
I downloaded the raw data set, and I realized that (in general set for instance) there are three types of binding affinity types (Ki, Kd, IC50). I remember reading Ki binding is predicted, so how do you deal with other types, do you convert them to Ki?
In PL general data there are total 11,987 complexes, but only 3650 complexes with Ki.

Thanks in advance.

@hkmztrk hkmztrk reopened this Aug 20, 2017
@evanfeinberg
Copy link
Collaborator

evanfeinberg commented Aug 20, 2017 via email

@hkmztrk
Copy link
Author

hkmztrk commented Aug 21, 2017

Thanks for the clarification @evanfeinberg, about the units, they also seem to differ (nM, uM, mM etc.), which unit did you use? Does it matter which unit we convert them all? (e.g. all of them nM)

@rbharath
Copy link
Member

All training is done in log-units (log-Kd/log-Ki/log-IC50). These are directly provided in the labels files for PDBBind.

@hkmztrk
Copy link
Author

hkmztrk commented Aug 22, 2017

@rbharath what I mean is molar concentration unit, for instance, all three are Kd samples, but they have different molar concentrations (uM, mM, nM), do you perform a conversion among those?

3p9t 2.02 2011 6.20 Kd=0.63uM 3p9t.pdf (TCL)
4ad2 2.10 2012 6.20 Kd=625nM 4acz.pdf (GLC-IFM)
4qsw 1.80 2014 1.68 Kd=21mM 4qsw.pdf (38T)

@evanfeinberg
Copy link
Collaborator

evanfeinberg commented Aug 22, 2017 via email

@hkmztrk
Copy link
Author

hkmztrk commented Aug 22, 2017

thank you @evanfeinberg !

@hkmztrk hkmztrk closed this as completed Aug 22, 2017
@evanfeinberg
Copy link
Collaborator

evanfeinberg commented Aug 22, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants