Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protein/Ligand Preparation for Docking #368

Closed
rbharath opened this issue Jan 21, 2017 · 8 comments
Closed

Protein/Ligand Preparation for Docking #368

rbharath opened this issue Jan 21, 2017 · 8 comments

Comments

@rbharath
Copy link
Member

@bowenliu16 raised a number of interesting questions about protein preparation on the gitter which don't have great solutions within deepchem yet. I'm starting this thread to copy over the questions and discuss potential solutions:

  1. What kind of protein preparation is done? Do we need to manually remove crystallized water and other ions? What about missing protein residues/loops?
    • dc.feat.hydrogenate_and_compute_partial charges is the current extent of protein cleanup. I don't believe we handle water/ion removal. Also, no handling of missing protein residues/loops. https://github.com/pandegroup/pdbfixer by @peastman handles many of these issues, so the right answer might be to use PDBFixer to handle these issues.
  2. How many ligands can be in the .sdf file to be docked at the same time?
    • I think we only support one-ligand per .sdf file right now. Should probably generalize to support multiple ligands.
  3. And what time of ligand processing do we need to do? Eg is any 3D conformation fine? We also have to manually specify the protonation states and any tautomers right?
    • dc.feat.hydrogenate_and_compute_partial_charges is the only ligand processing right now. It partially handles protonation, but I'd wager it doesn't do an excellent job. We don't handle tautomers yets.

I'd love to hear suggestions on good ways to handle pdb-cleanup, protonation, and tautomers with existing open source tools :-)

CC @peastman, @joegomes, @evanfeinberg

@joegomes
Copy link
Member

These are good questions to consider and would make dc.dock more readily usable.

  1. No water/ion removal. I've used pdbfixer before and it works well for these tasks, it would be nice to incorporate it into file loading as an option. Some PDBs in pdbbind contain waters near the binding pocket that aren't removed before model training (perhaps another point of discussion ..), so perhaps we don't want to remove all waters. It will also be important to distinguish between solution ions and (possible) cations in the binding pocket.

  2. Currently .sdf files are read by the rdkit.mol.SDMolSupplier iterator but we assume one ligand per file. It would be easy to generalize the loading of multiple molecules.

  3. I think for now any conformation is fine (Vina does conformation optimization) and we leave it to the user to enumerate protonation states and tautomers. From some quick googling, it looks like MolVS, a python pkg built off rdkit, can enumerate tautomers given a rdkit mol object.

@proteneer
Copy link
Contributor

Expanding on 3)

3.1) Protonation states will require a pKa predictor of some form. Worse yet, you probably care about not just the solvent-phase protonation state but also the active-site protonation state (which requires a protein pKa predictor).

3.2) Tautomerization is really hard. Internally we have LigPrep (but obv it's a proprietary rule-based solution, which is less than ideal for an open-source package like DeepChem).

3.3) Another item to be aware of is different resonance structures of conjugated systems (which many tools probably doesn't do well on) that are not simple aromatic rings. @ptosco submitted a great PR to RDKit which can enumerate all resonance structures of a given mol, it can also rank them by stability using again, a rule-based heuristic: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.ResonanceMolSupplier-class.html

@evanfeinberg
Copy link
Collaborator

evanfeinberg commented Jan 26, 2017 via email

@rbharath
Copy link
Member Author

We might have to implement our own open source variants for DeepChem. Luckily, we can leverage the existing code in PDBFixer, MolVS, and RDKit to get a nice head start here.

@rbharath
Copy link
Member Author

rbharath commented Mar 7, 2018

PRs that improve the state of our protein-ligand docking would be welcome! I've marked this PR a "Good Intermediate Contribution." That is, you should already be comfortable contributing to DeepChem, since this PR might require some in-depth code changes. I've also marked this PR, "Scientific Knowledge Required" since you should have an understanding (or be willing to learn about) the science of preparing a ligand molecule for docking with a protein.

@vsomnath
Copy link
Contributor

vsomnath commented Jan 17, 2019

I would like to work on this, but unsure of where to start from, or what references to use. Any suggestions?

@rbharath
Copy link
Member Author

I'd recommend taking a look at some docking best practices guides. For example, perhaps this one:
http://ablab.ucsd.edu/pdf/Rueda_Abagyan_Wiley_Chapter_123112.pdf

Making progress here basically requires us to implement these best practices guides in DeepChem.

@rbharath
Copy link
Member Author

Closing this old issue. Let's continue the discussion in deepchem/deepdock#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants