Protein/Ligand Preparation for Docking #368

rbharath · 2017-01-21T21:54:14Z

@bowenliu16 raised a number of interesting questions about protein preparation on the gitter which don't have great solutions within deepchem yet. I'm starting this thread to copy over the questions and discuss potential solutions:

What kind of protein preparation is done? Do we need to manually remove crystallized water and other ions? What about missing protein residues/loops?
- dc.feat.hydrogenate_and_compute_partial charges is the current extent of protein cleanup. I don't believe we handle water/ion removal. Also, no handling of missing protein residues/loops. https://github.com/pandegroup/pdbfixer by @peastman handles many of these issues, so the right answer might be to use PDBFixer to handle these issues.
How many ligands can be in the .sdf file to be docked at the same time?
- I think we only support one-ligand per .sdf file right now. Should probably generalize to support multiple ligands.
And what time of ligand processing do we need to do? Eg is any 3D conformation fine? We also have to manually specify the protonation states and any tautomers right?
- dc.feat.hydrogenate_and_compute_partial_charges is the only ligand processing right now. It partially handles protonation, but I'd wager it doesn't do an excellent job. We don't handle tautomers yets.

I'd love to hear suggestions on good ways to handle pdb-cleanup, protonation, and tautomers with existing open source tools :-)

CC @peastman, @joegomes, @evanfeinberg

The text was updated successfully, but these errors were encountered:

joegomes · 2017-01-24T05:22:24Z

These are good questions to consider and would make dc.dock more readily usable.

No water/ion removal. I've used pdbfixer before and it works well for these tasks, it would be nice to incorporate it into file loading as an option. Some PDBs in pdbbind contain waters near the binding pocket that aren't removed before model training (perhaps another point of discussion ..), so perhaps we don't want to remove all waters. It will also be important to distinguish between solution ions and (possible) cations in the binding pocket.
Currently .sdf files are read by the rdkit.mol.SDMolSupplier iterator but we assume one ligand per file. It would be easy to generalize the loading of multiple molecules.
I think for now any conformation is fine (Vina does conformation optimization) and we leave it to the user to enumerate protonation states and tautomers. From some quick googling, it looks like MolVS, a python pkg built off rdkit, can enumerate tautomers given a rdkit mol object.

proteneer · 2017-01-25T20:36:14Z

Expanding on 3)

3.1) Protonation states will require a pKa predictor of some form. Worse yet, you probably care about not just the solvent-phase protonation state but also the active-site protonation state (which requires a protein pKa predictor).

3.2) Tautomerization is really hard. Internally we have LigPrep (but obv it's a proprietary rule-based solution, which is less than ideal for an open-source package like DeepChem).

3.3) Another item to be aware of is different resonance structures of conjugated systems (which many tools probably doesn't do well on) that are not simple aromatic rings. @ptosco submitted a great PR to RDKit which can enumerate all resonance structures of a given mol, it can also rank them by stability using again, a rule-based heuristic: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.ResonanceMolSupplier-class.html

evanfeinberg · 2017-01-26T05:32:42Z

Are there open source solutions to 3.1? More generally, are there components, like implementations of propka, of ```LigPrep``` and ```Protein Prep Wiz``` that can be open sourced?

…

On Wed, Jan 25, 2017 at 12:36 PM, Yutong Zhao ***@***.***> wrote: Expanding on 3) 3.1) Protonation states will require a pKa predictor of some form. Worse yet, you probably care about not just the solvent-phase protonation state but also the active-site protonation state (which requires a protein pKa predictor). 3.2) Tautomerization is really hard. Internally we have LigPrep (but obv it's a proprietary rule-based solution, which is less than ideal for an open-source package like DeepChem). 3.3) Another item to be aware of is different resonance structures of conjugated systems (which many tools probably doesn't do well on) that are not simple aromatic rings. @ptosco <https://github.com/ptosco> submitted a great PR to RDKit which can enumerate all resonance structures of a given mol, it can also rank them by stability using again, a rule-based heuristic: http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem. ResonanceMolSupplier-class.html — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#368 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABre1rYsJz9gRr13MyHXgm16nI_GYXipks5rV7I_gaJpZM4LqMcv> .

rbharath · 2017-01-26T06:10:21Z

We might have to implement our own open source variants for DeepChem. Luckily, we can leverage the existing code in PDBFixer, MolVS, and RDKit to get a nice head start here.

rbharath · 2018-03-07T23:00:54Z

PRs that improve the state of our protein-ligand docking would be welcome! I've marked this PR a "Good Intermediate Contribution." That is, you should already be comfortable contributing to DeepChem, since this PR might require some in-depth code changes. I've also marked this PR, "Scientific Knowledge Required" since you should have an understanding (or be willing to learn about) the science of preparing a ligand molecule for docking with a protein.

vsomnath · 2019-01-17T04:05:51Z

I would like to work on this, but unsure of where to start from, or what references to use. Any suggestions?

rbharath · 2019-01-17T20:12:07Z

I'd recommend taking a look at some docking best practices guides. For example, perhaps this one:
http://ablab.ucsd.edu/pdf/Rueda_Abagyan_Wiley_Chapter_123112.pdf

Making progress here basically requires us to implement these best practices guides in DeepChem.

rbharath · 2020-03-22T23:34:00Z

Closing this old issue. Let's continue the discussion in deepchem/deepdock#3

rbharath mentioned this issue Jan 25, 2017

Replace Open Babel usage with RDKit #371

Closed

rbharath mentioned this issue Jan 28, 2017

Planning for an alpha release #376

Closed

rbharath added Contribution Welcome Good Intermediate Contribution Scientific Knowledge Required labels Mar 7, 2018

rbharath mentioned this issue Mar 22, 2020

Protein/Ligand Preparation for Docking deepchem/deepdock#3

Open

rbharath closed this as completed Mar 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protein/Ligand Preparation for Docking #368

Protein/Ligand Preparation for Docking #368

rbharath commented Jan 21, 2017

joegomes commented Jan 24, 2017

proteneer commented Jan 25, 2017

evanfeinberg commented Jan 26, 2017 via email

rbharath commented Jan 26, 2017

rbharath commented Mar 7, 2018

vsomnath commented Jan 17, 2019 •

edited

Loading

rbharath commented Jan 17, 2019

rbharath commented Mar 22, 2020

Protein/Ligand Preparation for Docking #368

Protein/Ligand Preparation for Docking #368

Comments

rbharath commented Jan 21, 2017

joegomes commented Jan 24, 2017

proteneer commented Jan 25, 2017

evanfeinberg commented Jan 26, 2017 via email

rbharath commented Jan 26, 2017

rbharath commented Mar 7, 2018

vsomnath commented Jan 17, 2019 • edited Loading

rbharath commented Jan 17, 2019

rbharath commented Mar 22, 2020

vsomnath commented Jan 17, 2019 •

edited

Loading