# System building intro
by Stefan Doerr

You can watch the presentation here:

[![](http://pub.htmd.org/73hboiwia98hdj209jq0/opioid_youtube.png)](https://youtu.be/DF9cHKBX19A)

## Contents

1. Building
1. Tools
1. Workflow
1. Considerations

## Building
What is system building:

* Starting from molecular structures
* Modify them, position them and combine them to
* Prepare a biological system for simulation
* For a given forcefield

## Tools
![](http://pub.htmd.org/73hboiwia98hdj209jq0/molbuilding.png)

## Workflow

1. Obtain structures
1. Clean structures
1. Define segments
1. Combine structures
1. Solvate
1. Build and ionize


## Before building your system (preliminary considerations)

The PDB format is very old. In an effort to handle its legacy shortcomings, several versions have been made over the years, they are not all readily interchangeable, and not all software can handle each version perfectly. The most important things to watch out for are: * Columns: the PDB format has very rigid rules about what values can go in each space. Keep in mind that it is not a space/tab/comma delimited format, but rather has rigid definitions of what should be in each space/column. * The PDB format as originally designed cannot handle more than 9,999 resids or 99,999 atoms (due to the column format issue). Several workarounds have been devised, such as using hexadecimal numbers or other compact number formats. VMD has no trouble saving more atoms/residues.

In addition, one needs to know well the working system, thus:
* Always review your PDB file: inspect the REMARK sections of the PDB file. You can often find keyspecific information regarding the structure (e.g. disulfide bridges, mising atoms, etc.). 

* Protonation/pH: the protonation state of the system is critical. Since molecular dynamics simulations typically don't allow for bond breaking, the initial protonation of the system must be accurate. Knowing what pH you are trying to reproduce is therefore important to obtain the correct results. If you suspect changing protonation is important to your system and you still want to use classical mechanics, consider simulating both states (protonated and not protonated). Histidine residues can have three different protonations states even at pH 7, therefore, a correct protonation of this residue is particularly critical. This residue can be protonated at either delta (most common), epsilon (very common also) or at both nitrogens (special situations and low pH).

![](http://docs.htmd.org/img/histidines.png)

The best way to determine how histidine should be protonated is to look at the the structure. Typically, a histidine residue is protonated if it is close enough to an electron donor (e.g. a glutamic acid), thus creating a hydrogen bond. Certain automated tools predict the protonation state of histidines based on their surrounding environment (e.g. Autodock tools). Since histidines are frequently present at protein active sites, a correct protonation state is particularly important in ligand binding simulations.

* Disulfide bonds present in the system must be identified. As shown below, this is automatically done by htmd
* Metalloproteins: if the metal ion is not an active part of an interaction it may be acceptable to just allow it to act as a cation perhaps restraining it with some harmonic constraints if neccesary.
* Duplicate atoms in the PDB file: typically simply delete one of the duplicated groups. However, if both conformations are potentially important (e.g. such loops involved in molecular recognition) it might be necessary to simulate both conformations separately.

## List of common patches

C-terminal patches:

<table class="summarytable">
    <thead>
        <tr>
            <th>Name</th>
            <th>Class</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>CTER</td>
            <td>-1.00</td>
            <td>standard C-terminus</td>
        </tr>
        <tr>
            <td>CT1</td>
            <td>0.00</td>
            <td>methylated C-terminus from methyl acetate</td>
        </tr>
        <tr>
            <td>CT2</td>
            <td>0.00</td>
            <td>amidated C-terminus</td>
        </tr>
        <tr>
            <td>CT3</td>
            <td>0.00</td>
            <td>N-Methylamide C-terminus</td>
        </tr>
    </tbody>
</table>

N-terminal patches:
<table class="summarytable">
    <thead>
        <tr>
            <th>Name</th>
            <th>Class</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>NTER</td>
            <td>1.00</td>
            <td>standard N-terminus</td>
        </tr>
        <tr>
            <td>ACE</td>
            <td>0.00</td>
            <td>acetylated N-terminus (to create dipeptide)</td>
        </tr>
        <tr>
            <td>ACP</td>
            <td>0.00</td>
            <td>acetylated N-terminus (for proline dipeptide)</td>
        </tr>
        <tr>
            <td>PROP</td>
            <td>1.00</td>
            <td>Proline N-Terminal</td>
        </tr>
        <tr>
            <td>GLYP</td>
            <td>1.00</td>
            <td>Glycine N-terminus </td>
        </tr>
    </tbody>
</table>

Side chain patches:

<table class="summarytable">
    <thead>
        <tr>
            <th>Name</th>
            <th>Class</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>ASPP</td>
            <td>0.00</td>
            <td>patch for protonated aspartic acid, proton on od2</td>
        </tr>
        <tr>
            <td>GLUP</td>
            <td>0.00</td>
            <td>patch for protonated glutamic acid, proton on oe2</td>
        </tr>
        <tr>
            <td>CYSD</td>
            <td>-1.0</td>
            <td>patch for deprotonated CYS</td>
        </tr>
        <tr>
            <td>DISU</td>
            <td>-0.36</td>
            <td>patch for disulfides. Patch must be 1-CYS and 2-CYS</td>
        </tr>
        <tr>
            <td>HS2</td>
            <td>0.00</td>
            <td>Patch for neutral His, move proton from ND1 to NE2</td>
        </tr>
        <tr>
            <td>TP1</td>
            <td>-1.00</td>
            <td>convert tyrosine to monoanionic phosphotyrosine</td>
        </tr>
        <tr>
            <td>TP1A</td>
            <td>-1.00</td>
            <td>patch to convert tyrosine to monoanionic phenol-phosphate model
            compound when generating tyr, use first none last none for terminal
            patches</td>
        </tr>
        <tr>
            <td>TP2</td>
            <td>-2.00</td>
            <td>patch to convert tyrosine to dianionic phosphotyrosine</td>
        </tr>
        <tr>
            <td>TP2A</td>
            <td>-2.00</td>
            <td>patch to convert tyrosine to dianionic phosphotyrosine when
            generating tyr, use first none last none for terminal patches this
            converts a single tyrosine to a phenol phosphate</td>
        </tr>
        <tr>
            <td>TMP1</td>
            <td>-1.00</td>
            <td>patch to convert tyrosine to monoanionic phosphonate ester O -&gt;
            methylene (see RESI BMPH)</td>
        </tr>
        <tr>
            <td>TMP2</td>
            <td>-2.00</td>
            <td>patch to convert tyrosine to dianionic phosphonate ester O -&gt;
            methylene (see RESI BMPD)</td>
        </tr>
        <tr>
            <td>TDF1</td>
            <td>-1.00</td>
            <td>patch to convert tyrosine to monoanionic difluoro phosphonate ester
            O -&gt;  methylene (see RESI BDFH)</td>
        </tr>
    </tbody>
</table>

Circular protein chain patches:

<table class="summarytable">
    <thead>
        <tr>
            <th>Name</th>
            <th>Class</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>LIG1</td>
            <td>0.00000</td>
            <td>linkage for cyclic peptide, 1 refers to the C terminus which is a
            glycine , 2 refers to the N terminus</td>
        </tr>
        <tr>
            <td>LIG2</td>
            <td>0.00000</td>
            <td>linkage for cyclic peptide, 1 refers to the C terminus, 2 refers to
            the N terminus which is a glycine</td>
        </tr>
        <tr>
            <td>LIG3</td>
            <td>0.00000</td>
            <td>linkage for cyclic peptide, 1 refers to the C terminus which is a
            glycine, 2 refers to the N terminus which is a glycine</td>
        </tr>
    </tbody>
</table>