Protein preparation with HTMD
===============

*Toni Giorgino*


The system preparation phase is based on the PDB2PQR software. It 
includes the following steps (from the
[PDB2PQR algorithm
description](http://www.poissonboltzmann.org/docs/pdb2pqr-algorithm-description/)):

 * Assign titration states at the user-chosen pH;
 * Flipping the side chains of HIS (including user defined HIS states), ASN, and GLN residues;
 * Rotating the sidechain hydrogen on SER, THR, TYR, and CYS (if available);
 * Determining the best placement for the sidechain hydrogen on neutral HIS, protonated GLU, and protonated ASP;
 * Optimizing all water hydrogens.

The hydrogen bonding network calculations are performed by the
[PDB2PQR](http://www.poissonboltzmann.org/) software package. The pKa
calculations are performed by the [PROPKA
3.1](https://github.com/jensengroup/propka-3.1) software packages.
Please see the copyright, license  and citation terms distributed with each.

Note that this version was modified in order to use an 
externally-supplied propKa **3.1** (installed automatically via dependencies), whereas
the original had propKa 3.0 *embedded*!

The results of the function should be roughly equivalent of the system
preparation wizard's preprocessing and optimization steps
of Schrodinger's Maestro software.

Usage
----------

The `proteinPrepare` function requires a molecule object, the protein to be prepared, as an argument, and returns the prepared system, also as a molecule object. Logging messages will provide information and warnings about the process.

The full documentation is in the docstring, accessible via the usual Python help mechanism.

In [1]:
from htmd import *
tryp = Molecule('3PTB')
tryp_op = proteinPrepare(tryp)

2016-06-14 16:57:31,532 - htmdmol.molecule - INFO - Using local copy for 3PTB: /home/toni/work/htmd/htmd/htmd/data/pdb/3ptb.pdb
2016-06-14 16:57:31,714 - propka - INFO - No pdbfile provided


Please cite. HTMD: High-Throughput Molecular Dynamics for Molecular Discovery, J. Chem. Theory Comput., 2016, 12 (4), pp 1845-1852. 
http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049


You are on the latest HTMD version (unpackaged).




The optimized molecule can be written and further manipulated as usual.

In [2]:
tryp_op.write('systempreparation-test-main-ph-7.pdb')

## Information about the prepared system

A table of useful information, an object of type `ResidueData`, is available as a return argument if the `returnDetails` argument is set:

In [3]:
tryp_op, prepData = proteinPrepare(tryp, returnDetails=True)



The `ResidueData` object carries a wealth of information on the preparation results. Most of it is accessible in the `data` property, which is a Pandas object. As such, it can be easily written as a spreadsheet in Excel or CSV format.

In [4]:
prepData.data.to_excel("tryp-report.xlsx")

## Membrane proteins

Membrane-embedded proteins are in contact with an hydrophobic region which may alter pKa values for membrane-exposed residues ([Teixera et al.](http://dx.doi.org/10.1021/acs.jctc.5b01114)). Although the effect is not currently   taken into account quantitatively, if a `hydrophobicThickness` argument is provided, warnings will be generated for residues exposed to the lipid region.

The following example shows the preparation of the mu opioid receptor, 4DKL. The pre-oriented structure is retrieved  from the OPM database.

In [5]:
mor, thickness = htmd.util.opm("4dkl")

mor_opt, mor_data = proteinPrepare(mor, returnDetails=True,
                                   hydrophobicThickness=thickness)

exposedRes = mor_data.data.membraneExposed
mor_data.data[exposedRes].to_excel("mor_exposed_residues.xlsx")

2016-06-14 16:57:45,286 - htmdmol.molecule - INFO - Removed 2546 atoms. 4836 atoms remaining in the molecule.


Modified residue names
----------------------

The molecule produced by the preparation step has residue names modified
according to their protonation.
Later system-building functions assume these residue names. 
Note that support for alternative charge states varies between the  forcefields.

Charge +1    |  Neutral   | Charge -1
-------------|------------|----------
 -           |  ASH       | ASP
 -           |  CYS       | CYM
 -           |  GLH       | GLU
HIP          |  HID/HIE   |  -
LYS          |  LYN       |  -
 -           |  TYR       | TYM
ARG          |  AR0       |  -



# Full help

In [6]:
help(proteinPrepare)

Help on function proteinPrepare in module htmd.builder.preparation:

proteinPrepare(mol_in, pH=7.0, verbose=0, returnDetails=False, hydrophobicThickness=None, holdSelection=None)
    A system preparation wizard for HTMD.
    
    Returns a Molecule object, where residues have been renamed to follow
    internal conventions on protonation (below). Coordinates are changed to
    optimize the H-bonding network. This should be roughly equivalent to mdweb and Maestro's
    preparation wizard.
    
    The following residue names are used in the returned molecule:
    
        ASH     Neutral ASP
        CYX     SS-bonded CYS
        CYM     Negative CYS
        GLH     Neutral GLU
        HIP     Positive HIS
        HID     Neutral HIS, proton HD1 present
        HIE     Neutral HIS, proton HE2 present
        LYN     Neutral LYS
        TYM     Negative TYR
        AR0     Neutral ARG
    
    having -h<z<h and are buried in the protein by less than 75%. The list of such residues can be a

Acknowledgements and citations
=========

Please acknowledge your use of PDB2PQR by citing:

 *   Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res, 35, W522-5, 2007. 
 *   Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA. PDB2PQR: an automated pipeline for the setup, execution, and analysis of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res, 32, W665-W667, 2004.
 
 
Please acknowledge your use of PROPKA by citing:

 *   Sondergaard, Chresten R., Mats HM Olsson, Michal Rostkowski, and Jan H. Jensen. "Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values." Journal of Chemical Theory and Computation 7, no. 7 (2011): 2284-2295.
 *   Olsson, Mats HM, Chresten R. Sondergaard, Michal Rostkowski, and Jan H. Jensen. "PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions." Journal of Chemical Theory and Computation 7, no. 2 (2011): 525-537.




