# Coarse graining with CG_Compound class 
Coarse-graining is a technique which provides a computational speed-up in molecular simulation by abstracting away some atomistic detail and thus reducing the degrees of freedom. One example of a coarse-grain model is the united atom model in which hydrogen atoms are treated implicitly and lumped in with with their heavier neighbors. (e.g., a carbon bonded to three hydrogens, $CH_3$, would be treated as one bead; the bending and stretching degrees of freedom between $C-H$ would be lost.) More examples of coarse-grain models and their benefits can be found in our recent [perspective paper](https://doi.org/10.1016/j.commatsci.2019.109129). 

This tutorial will demonstrate how to use SMILES strings and the CG_Compound class to create a coarse-grain structure.

---
**Learning Objectives**
1. Understand the motivation behind using a coarse-grain model.
1. Use smiles strings to select the coarse-grain beads and initialize systems.
1. Prepare a coarse-grain structure for use in molecular simulation.
1. Run and analyze a molecular dynamics simulation.
---

In [None]:
import mbuild as mb
import utils
from utils import CG_Compound

For convenience we've provided a dictionary of some SMILES strings for you to try:

In [None]:
features = utils.features_dict
for feature, smiles in features.items():
    print(feature, ":", smiles)

A CG_Compound uses [class inheritance](https://docs.python.org/3/tutorial/classes.html#inheritance) to build on to the [mbuild.Compound](https://mosdef.org/mbuild/data_structures.html#compound). This allows us to implement already developed tools! Any mbuild compound can be converted to a CG_Compound using `mbuild.Compound.to_pybel()` (then the `coarse` function) or `CG_Compound.from_mbuild()`. In the following cell, we create a dodecane mbuild compound using only a [SMILES string](https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html):

In [None]:
smiles = "CCCCCCCCCCCC"
mb_dodecane = mb.load(smiles, smiles=True)
mb_dodecane.visualize().show();

CG_Compound uses [pybel's SMARTS matching](http://openbabel.org/docs/current/UseTheLibrary/Python_Pybel.html#smarts-matching) to identify the beads based on the provided bead strings. Because neither the mbuild.Compound nor the CG_Compound keep track of the bond order, the bond order update has to be forced after conversion to a pybel mol. 

Then using the `coarse` function, the user can select a SMILES/SMARTS string used to detect the bead and a name for the bead. In this example, the three alkyl carbon SMILES string from the features dictionary generates a coarse-grain structure for nonane, which reduces the 35 atom structure to just 4 beads! 

In [None]:
# convert to pybel mol
mol = mb_dodecane.to_pybel()

# to_pybel imports all bonds as order=1, this will type the bond correctly
# if the structure is good
mol.OBMol.PerceiveBondOrders()
cg_dodecane = utils.coarse(mol, [("_C3", features["alkyl_3"])])

# Visualize the compound imposed over the atomistic structure 
# with the show_atomistic flag
cg_dodecane.visualize(color_scheme={"_C3": "blue"}, 
                      show_atomistic=True).show();

By building on the functionality of mbuild, it is straight-forward to export your coarse-grain structure to any chemical format already supported by mbuild. More about the mbuild.Compound `save` function [here](https://mosdef.org/mbuild/data_structures.html#mbuild.compound.Compound.save).

In [None]:
# Visualize just the coarse-grained compound
cg_dodecane.visualize(color_scheme={"_C3": "blue"}).show();

# Save to a file. Other possible extensions include: 
# ‘hoomdxml’, ‘gro’, ‘top’, ‘lammps’, ‘lmp’, ‘json’
cg_dodecane.save("nonane.gsd", overwrite=True);




Try out different coarse-grain mappings! What happens if you use 2, 4 or 6 alkyl carbons?
Replace `YOUR_BEAD_NAME` and `YOUR_SMILES_STRING`:

In [None]:
cg_name = "YOUR_BEAD_NAME"
smiles = "YOUR_SMILES_STRING"

cg_dodecane = utils.coarse(mol, [(cg_name, 
                                  smiles)])

cg_dodecane.visualize(color_scheme={cg_name: "blue"}, 
                      show_atomistic=True).show();

Next let's make a structure that's a little more complicated! We're going to use our function which wraps a python module called [deepsmiles](https://github.com/nextmovesoftware/deepsmiles) to build a p3ht polymer. 

In [None]:
from polysmiles import poly_smiles

# poly_smiles generates a polymer; the input is a deepsmiles string 
# with asterisks surrounding the polymer site
p3ht10_str = poly_smiles('cs*c*cc5CCCCCC', length=10)
p3ht10 = mb.load(p3ht10_str, smiles=True)

p3ht10.visualize().show()

We'll follow the same procedure as before, but this time we'll use a coarse-grain mapping which is loosely based on an [existing model](https://doi.org/10.1016/j.fluid.2010.07.025). In the following cell the coarse-grain bead strings given to the `coarse` function are a thiophene ring and three alkyl carbons. Beads representing thiophene rings will be named "_B" (B for backbone; the underscore is a convention used to denote a coarse-grain particle) and colored <span style="color:blue">blue</span> while beads representing a group of three alkyl carbons will be named "_S" (S for sidechain) and colored <span style="color:orange">orange</span>. **The order that the beads are specified matters!** Beads containing aromatic rings are allowed to share atoms, while non-aromatic atoms can only belong to one bead. Because of this, in general we recommend providing the bead string for aromatic before non-aromatic groups. The built-in visualization and the `show_atomistic` flag should help you to determine if the beads are being created as you expect. 

In [None]:
mol10 = p3ht10.to_pybel()
mol10.OBMol.PerceiveBondOrders()

cg_p3ht10 = utils.coarse(mol10, 
                         [("_B", features["thiophene"]), 
                         ("_S",features["alkyl_3"])])
cg_p3ht10.visualize(color_scheme={"_B": "blue", "_S": "orange"}, 
                    show_atomistic=True).show()

In order to run a simulation, we'll need to  pack our structure into a box, which can be done using mbuild's packing function:

In [None]:
box = mb.box.Box([7, 7, 7])
cg_box = mb.packing.fill_box(cg_p3ht10, n_compounds=10, box=box) # fill the box with 10 compounds
cg_box.visualize(color_scheme={"_B": "blue", "_S": "orange"}).show()

Then we can apply our forcefield using [foyer](https://github.com/mosdef-hub/foyer):

In [None]:
from foyer import Forcefield
cg_box_pmd = cg_box.to_parmed(box=box)
ff = Forcefield(forcefield_files="forcefields/p3ht-cg.xml")
struc = ff.apply(cg_box_pmd, 
                 assert_bond_params=True, 
                 assert_angle_params=True, 
                 assert_dihedral_params=False
                )

We can easily initialize a [HOOMD](https://hoomd-blue.readthedocs.io/en/stable/) simulation using mbuild's create_hoomd_simulation function:

In [None]:
from mbuild.formats.hoomd_simulation import create_hoomd_simulation
import hoomd
import hoomd.md
import hoomd.group

create_hoomd_simulation(struc, 
                        r_cut=1.2,
                        auto_scale=True)

_all = hoomd.group.all()
hoomd.md.integrate.mode_standard(dt=0.0001)
integrator = hoomd.md.integrate.nvt(group=_all, kT=1.0, tau=1)
hoomd.dump.gsd("start.gsd", period=None, group=_all, overwrite=True)
hoomd.dump.gsd("traj.gsd", period=1e5, group=_all, phase=0, overwrite=True)

hoomd.run(1e6)
hoomd.dump.gsd("out.gsd", period=None, group=_all, overwrite=True);

We can also analyze the simulation trajectories. Here is an example of calculating the radial distribution function (rdf) between the side chain beads using a wrapper for [freud](https://freud.readthedocs.io/en/stable/). By setting `start=1`, the rdf is averaged from the first to the last frame.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
rdf = utils.gsd_rdf("traj.gsd", "_S", "_S", start=1)
plt.plot(rdf.bin_centers, rdf.rdf)
plt.show()

## Supplementary Material:

The next example shows how a coarse-grain structure can be mapped to frames of a molecular dynamics trajectory and how to handle some of the issues with coarse-graining arbitrary structures with SMILES strings. This coarse-grain mapping could be the starting point for iterative Boltzmann inversion (IBI): an example IBI code written by McCabe's group at Vanderbilt which implements a multi-state flavor can be found [here](https://github.com/mosdef-hub/msibi). 

The trajectory file (gsd) was created using [PlanckTon](https://bitbucket.org/cmelab/planckton/src/master/) which simplifies the set up and submission of large parameter sweeps by tying together multiple simulation tools.

In [None]:
# Download the gsd file
!wget https://bitbucket.org/cmelab/msibi_tests/downloads/P3HT_4-density_0.75-n_compounds_20-traj.gsd

In [None]:
gsdfile = "P3HT_4-density_0.75-n_compounds_20-traj.gsd"

# Coordinates are scaled from planckton sigma units
scale_factor = 0.356
comp0 = CG_Compound.from_gsd(gsdfile, frame=0, scale=scale_factor)

# The gsd from PlanckTon was run using the General AMBER Force Field (GAFF)
# and pybel will not correctly parse particles with AMBER typing
comp0.amber_to_element()

# Molecular dynamics simulations in HOOMD use periodic boundary conditions,
# so, in order to identify the beads, the frame must be unwrapped.
# The unwrap feature won't move particles if the compound doesn't have bonds
# that span the periodic boundary -- note the "no changes made" message
comp0.unwrap()

mol0 = comp0.to_pybel(box=comp0.box)

# to_pybel imports all bonds as order=1, this will type the bond correctly
# if the structure is good
mol0.OBMol.PerceiveBondOrders()

In [None]:
# Notice that the initial frame is typed correctly
# the structure is good so pybel can type it
cg_comp0 = utils.coarse(mol0, 
                        [("_B", features["thiophene"]), ("_S",features["alkyl_3"])]
                       )

view = cg_comp0.visualize(
    color_scheme={"_B": "blue", "_S": "orange"}, show_atomistic=True
)

**The starting structure of the atomistic compound matters!** If your starting is unphysical (e.g., non-planar aromatic rings), then pybel's SMARTS matching will not recognise the beads correctly. In the following cells we show a workaround for the last (distorted) frame of the trajectory which works as long as the first frame has a chemically sound structure:

In [None]:
# same process as above but with last frame of trajectory
comp1 = CG_Compound.from_gsd(gsdfile, frame=-1, scale=scale_factor)

comp1.amber_to_element()

# PlanckTon initializes with a large volume then shrinks, so this last frame
# has a lot of bonds that span the periodic bounary
comp1.visualize().show();

In [None]:
# the bonds that span the boundary can be fixed using unwrap
# the function will iterate until all bonds are fixed
comp1.unwrap()

comp1.visualize().show();

In [None]:
mol1 = comp1.to_pybel(box=comp1.box)
mol1.OBMol.PerceiveBondOrders()

# Even with fixing pbc issues, the last frame is distorted enough that
# pybel can't recognise the features (bendy aromatic rings are NO)
# note the "WARNING" message
cg_comp1 = utils.coarse(mol1, 
                        [("_B", features["thiophene"]), ("_S",features["alkyl_3"])]
                       )

cg_comp1.visualize(
    color_scheme={"_B": "blue", "_S": "orange"}, show_atomistic=True
).show()

In [None]:
# But since these are from the same trajectory, they have 
# the same number of particles in the same order, so we can
# "fix" the bad morphology using the good one!
mol1_fixed = utils.map_good_on_bad(mol0, mol1)

# And it's fixed =D
cg_comp1_fixed = utils.coarse(mol1_fixed,
                              [("_B", features["thiophene"]), 
                               ("_S", features["alkyl_3"])]
                             )

cg_comp1_fixed.visualize(color_scheme={"_B": "blue", "_S": "orange"}).show();

# and we can rewrap it into the box
cg_comp1_fixed.wrap()

cg_comp1_fixed.visualize(color_scheme={"_B": "blue", "_S": "orange"}).show();