Protein preparation with HTMD
===============

*Toni Giorgino*


The system preparation phase is based on the PDB2PQR software. It 
includes the following steps (from the
[PDB2PQR algorithm
description](http://www.poissonboltzmann.org/docs/pdb2pqr-algorithm-description/)):

 * Assign titration states at the user-chosen pH;
 * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;
 * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);
 * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;
 * Optimizing all water hydrogens.

The hydrogen bonding network calculations are performed by the
[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa
calculations are performed by the [PROPKA
3.1](https://github.com/jensengroup/propka-3.1) software packages.
Please see the copyright, license  and citation terms distributed with each.

Note that this version was modified in order to use an 
externally-supplied propKa **3.1** (installed automatically via dependencies), whereas
the original had propKa 3.0 *embedded*!

The results of the function should be roughly equivalent of the system
preparation wizard's preprocessing and optimization steps
of Schrodinger's Maestro software.

Usage
----------

The `systemPrepare` function requires a molecule object, the protein to be prepared, as an argument, and returns the prepared system, also as a molecule object. Logging messages will provide information and warnings about the process.

The full documentation is in the docstring, accessible via the usual Python help mechanism.

In [3]:
from htmd.ui import *
tryp = Molecule('3PTB')
tryp_op = systemPrepare(tryp)

2021-11-16 09:49:21,972 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2021-11-16 09:49:25,647 - binstar - INFO - Using Anaconda API: https://api.anaconda.org



Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. https://dx.doi.org/10.1021/acs.jctc.6b00049

HTMD Documentation at: https://www.htmd.org/docs/latest/



2021-11-16 09:49:26,485 - moleculekit.readers - INFO - Using local copy for 3PTB: /home/sdoerr/Work/moleculekit/moleculekit/test-data/pdb/3ptb.pdb


You are on the latest HTMD version (unpackaged : /home/sdoerr/Work/htmd/htmd).






---- Molecule chain report ----
Chain A:
    First residue: ILE:16:
    Final residue: HOH:809:
---- End of chain report ----



2021-11-16 09:49:28,779 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:22 to CYX
2021-11-16 09:49:28,780 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:40 to HIE
2021-11-16 09:49:28,780 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:42 to CYX
2021-11-16 09:49:28,781 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:57 to HIP
2021-11-16 09:49:28,781 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:58 to CYX
2021-11-16 09:49:28,781 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:91 to HID
2021-11-16 09:49:28,782 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:128 to CYX
2021-11-16 09:49:28,782 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:136 to CYX
2021-11-16 09:49:28,782 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:157 to CYX
2021-11-16 09:49:28,782 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:168 to CYX
2021-11-16 09:49

The optimized molecule can be written and further manipulated as usual.

In [4]:
tryp_op.write('systempreparation-test-main-ph-7.pdb')

## Information about the prepared system

A table of useful information, an object of type `pandas.DataFrame`, is available as a return argument if the `return_details` argument is set:

In [5]:
tryp_op, df = systemPrepare(tryp, return_details=True)




---- Molecule chain report ----
Chain A:
    First residue: ILE:16:
    Final residue: HOH:809:
---- End of chain report ----



2021-11-16 09:49:58,458 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:22 to CYX
2021-11-16 09:49:58,459 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:40 to HIE
2021-11-16 09:49:58,459 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:42 to CYX
2021-11-16 09:49:58,459 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:57 to HIP
2021-11-16 09:49:58,460 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:58 to CYX
2021-11-16 09:49:58,460 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:91 to HID
2021-11-16 09:49:58,460 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:128 to CYX
2021-11-16 09:49:58,462 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:136 to CYX
2021-11-16 09:49:58,462 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:157 to CYX
2021-11-16 09:49:58,463 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:168 to CYX
2021-11-16 09:49

The `pandas.DataFrame` object carries a wealth of information on the preparation results. It can be easily written as a spreadsheet in Excel or CSV format.

In [6]:
df.to_excel("tryp-report.xlsx")

## Membrane proteins

Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues ([Teixera et al.](http://dx.doi.org/10.1021/acs.jctc.5b01114)). Although the effect is not currently   taken into account quantitatively, if a `hydrophobic_thickness` argument is provided, warnings will be generated for residues exposed to the lipid region.

The following example shows the preparation of the mu opioid receptor, 4DKL. The pre-oriented structure is retrieved  from the OPM database.

In [11]:
from moleculekit.util import opm

mor, thickness = opm("4dkl")

mor_opt, df = systemPrepare(mor, return_details=True,
                                hydrophobic_thickness=thickness)

df.to_excel("mor_report.xlsx")

2021-11-16 09:51:10,564 - moleculekit.molecule - INFO - Removed 2546 atoms. 4836 atoms remaining in the molecule.



---- Molecule chain report ----
Chain A:
    First residue: MET:65:
    Final residue: HOH:735:
Chain B:
    First residue: MET:65:
    Final residue: HOH:735:
---- End of chain report ----



2021-11-16 09:51:17,440 - moleculekit.tools.preparation - INFO - Modified residue ASP:A:114 to ASH
2021-11-16 09:51:17,440 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:140 to CYX
2021-11-16 09:51:17,441 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:171 to HID
2021-11-16 09:51:17,441 - moleculekit.tools.preparation - INFO - Modified residue CYS:A:217 to CYX
2021-11-16 09:51:17,441 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:223 to HID
2021-11-16 09:51:17,441 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:297 to HID
2021-11-16 09:51:17,442 - moleculekit.tools.preparation - INFO - Modified residue HIS:A:319 to HIE
2021-11-16 09:51:17,443 - moleculekit.tools.preparation - INFO - Modified residue ASP:B:114 to ASH
2021-11-16 09:51:17,444 - moleculekit.tools.preparation - INFO - Modified residue CYS:B:140 to CYX
2021-11-16 09:51:17,444 - moleculekit.tools.preparation - INFO - Modified residue HIS:B:171 to HID
2021-11-16

Modified residue names
----------------------

The molecule produced by the preparation step has residue names modified
according to their protonation.
Later system-building functions assume these residue names. 
Note that support for alternative charge states varies between the  forcefields.

Charge +1    |  Neutral   | Charge -1
-------------|------------|----------
 -           |  ASH       | ASP
 -           |  CYS       | CYM
 -           |  GLH       | GLU
HIP          |  HID/HIE   |  -
LYS          |  LYN       |  -
 -           |  TYR       | TYM
ARG          |  AR0       |  -



# Full help

In [12]:
help(systemPrepare)

Help on function systemPrepare in module moleculekit.tools.preparation:

systemPrepare(mol_in, titration=True, pH=7.4, force_protonation=None, no_opt=None, no_prot=None, no_titr=None, hold_nonpeptidic_bonds=True, verbose=True, return_details=False, hydrophobic_thickness=None, plot_pka=None, _logger_level='ERROR', _molkit_ff=True)
    Prepare molecular systems through protonation and h-bond optimization.
    
    The preparation routine protonates and optimizes protein and nucleic residues.
    It will also take into account any non-protein, non-nucleic molecules for the pKa calculation
    but will not attempt to protonate or optimize those.
    
    Returns a Molecule object, where residues have been renamed to follow
    internal conventions on protonation (below). Coordinates are changed to
    optimize the H-bonding network.
    
    The following residue names are used in the returned molecule:
    
    ASH Neutral ASP
    CYX SS-bonded CYS
    CYM Negative CYS
    GLH Neutral GLU

Acknowledgements and citations
=========

Please acknowledge your use of PDB2PQR by citing:

 *   Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res, 35, W522-5, 2007. 
 *   Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA. PDB2PQR: an automated pipeline for the setup, execution, and analysis of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res, 32, W665-W667, 2004.
 
 
Please acknowledge your use of PROPKA by citing:

 *   Sondergaard, Chresten R., Mats HM Olsson, Michal Rostkowski, and Jan H. Jensen. "Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values." Journal of Chemical Theory and Computation 7, no. 7 (2011): 2284-2295.
 *   Olsson, Mats HM, Chresten R. Sondergaard, Michal Rostkowski, and Jan H. Jensen. "PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions." Journal of Chemical Theory and Computation 7, no. 2 (2011): 525-537.




