# Coarse graining with CG_Compound class 
Coarse-graining is a technique which provides a computational speed-up in molecular simulation by abstracting away some atomistic detail and thus reducing the degrees of freedom. One example of a coarse-grain model is the united atom model in which hydrogen atoms are treated implicitly and lumped in with with their heavier neighbors. (e.g., a carbon bonded to three hydrogens, $CH_3$, would be treated as one bead; the bending and stretching degrees of freedom between $C-H$ would be lost.) More examples of coarse-grain models and their benefits can be found in our recent [perspective paper](https://doi.org/10.1016/j.commatsci.2019.109129). 

This tutorial will demonstrate how to use SMILES strings and the CG_Compound class to create a coarse-grain structure.

---
**Learning Objectives**
1. Understand the motivation behind using a coarse-grain model.
1. Use smiles strings to select the coarse-grain beads.
1. Prepare a coarse-grain structure for use in molecular simulation.
---

In [None]:
import mbuild as mb

# the file utils.py contains the CG_Compound used in this tutorial
import utils

# dictionary of SMILES features using for coarse graining
features = utils.features_dict

for feature, smiles in features.items():
    print(feature,":", smiles)

A CG_Compound uses [class inheritance](https://docs.python.org/3/tutorial/classes.html#inheritance) to build on to the [mbuild.Compound](https://mosdef.org/mbuild/data_structures.html#compound). This allows us to implement already developed tools! Anything that mbuild can import can be converted to a CG_Compound. In the following cell, we create a nonane structure using only a [SMILES string](https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html):

In [None]:
smiles = "CCCCCCCCC"
mb_nonane = mb.load(smiles, smiles=True)

nonane = utils.CG_Compound()

# The from_mbuild function copies the particles and bonds from 
# an mbuild.Compound to a CG_Compound.
nonane.from_mbuild(mb_nonane)

nonane.visualize().show();

CG_Compound uses [pybel's SMARTS matching](http://openbabel.org/docs/current/UseTheLibrary/Python_Pybel.html#smarts-matching) to identify the beads based on the provided bead strings. Because neither the mbuild.Compound nor the CG_Compound keep track of the bond order, the bond order update has to be forced after conversion to a pybel mol. 

Then using the `coarse` function, the user can select a SMILES/SMARTS string for the bead. In this example, the three alkyl carbon SMILES string from the features dictionary generates a coarse-grain structure for nonane, which reduces the 29 atom structure to just 3 beads! 

In [None]:
# convert to pybel mol
mol = nonane.to_pybel()

# to_pybel imports all bonds as order=1, this will type the bond correctly
# if the structure is good
mol.OBMol.PerceiveBondOrders()

cg_nonane = utils.coarse(mol, [features["alkyl_3"]], atomistic=True)

cg_nonane.visualize(color_scheme={"_A": "pink"}).show();

By building on the functionality of mbuild, it is straight-forward to export your coarse-grain structure to any chemical format already supported by mbuild. More about the mbuild.Compound `save` function [here](https://mosdef.org/mbuild/data_structures.html#mbuild.compound.Compound.save).

In [None]:
# The way the coa
cg_nonane = utils.coarse(mol, [features["alkyl_3"]])

#can also do ‘hoomdxml’, ‘gro’, ‘top’, ‘lammps’, ‘lmp’, ‘json’
cg_nonane.save("nonane.gsd")

The next example shows how a coarse-grain structure can be mapped to frames of a molecular dynamics trajectory. This coarse-grain mapping could be the starting point for iterative Boltzmann inversion (IBI): an example IBI code written by McCabe's group at Vanderbilt which implements a multi-state flavor can be found [here](https://github.com/mosdef-hub/msibi). Or the coarse-grain mapping, as in the following example, can be used to automate the structure generation for an [existing model](https://doi.org/10.1016/j.fluid.2010.07.025).

The trajectory file (gsd) was created using [PlanckTon](https://bitbucket.org/cmelab/planckton/src/master/) which simplifies the set up and submission of large parameter sweeps by tying together multiple simulation tools.

In [None]:
!wget https://bitbucket.org/cmelab/msibi_tests/downloads/P3HT_4-density_0.75-n_compounds_20-traj.gsd

In [None]:
gsdfile = "P3HT_4-density_0.75-n_compounds_20-traj.gsd"

# Coordinates are scaled from planckton sigma units
scale_factor = 0.356
comp0 = utils.CG_Compound.from_gsd(gsdfile, frame=0, scale=scale_factor)

# The gsd from PlanckTon was run using the General AMBER Force Field (GAFF)
# and pybel will not correctly parse particles with AMBER typing
comp0.amber_to_element()

# Molecular dynamics simulations in HOOMD use periodic boundary conditions,
# so, in order to identify the beads, the frame must be unwrapped.
# The unwrap feature won't move particles if the compound doesn't have bonds
# that span the periodic boundary -- note the warning msg
comp0.unwrap()

mol0 = comp0.to_pybel()

# to_pybel imports all bonds as order=1, this will type the bond correctly
# if the structure is good
mol0.OBMol.PerceiveBondOrders()

In the following cell the coarse-grain bead strings given to the `coarse` function are a thiophene ring and three alkyl carbons. **The order that the beads are specified matters!** Beads containing aromatic rings are allowed to share atoms, while non-aromatic atoms can only belong to one bead. Because of this, in general we recommend providing the string for aromatic before non-aromatic groups. The built-in visualization and the `atomistic` flag in the `coarse` function should help you to determine if the beads are being created as you expect. (Setting `atomistic=True` in the `coarse` function will leave the atomistic strructure as part of the compound, so it will appear as an overlay in the visualization.)

In [None]:
# Notice that the initial frame is typed correctly
# the structure is good so pybel can type it
cg_comp0 = utils.coarse(
    mol0, [features["thiophene"], features["alkyl_3"]], atomistic=True
)


cg_comp0.visualize(color_scheme={"Car": "black", "_A": "pink", "_B": "green"}).show();

**The starting structure of the atomistic compound matters!** If your starting is unphysical (e.g., non-planar aromatic rings), then pybel's SMARTS matching will not recognise the beads correctly. In the following cells we show a workaround for the last (distorted) frame of the trajectory which works as long as the first frame has a chemically sound structure:

In [None]:
# same process as above but with last frame of trajectory
comp1 = utils.CG_Compound.from_gsd(gsdfile, frame=-1, scale=scale_factor)

comp1.amber_to_element()

# PlanckTon initializes with a large volume then shrinks, so this last frame
# has a lot of bonds that span the periodic bounary
comp1.visualize(color_scheme={"Car": "black", "_A": "pink", "_B": "green"}).show();

In [None]:
# the bonds that span the boundary can be fixed using unwrap
# the function will iterate until all bonds are fixed
comp1.unwrap()

comp1.visualize(color_scheme={"Car": "black", "_A": "pink", "_B": "green"}).show();

In [None]:
mol1 = comp1.to_pybel(box=mb.Box(comp1.box))
mol1.OBMol.PerceiveBondOrders()

# Even with fixing pbc issues, the last frame is distorted enough that
# pybel can't recognise the features (bendy aromatic rings are NO)
cg_comp1 = utils.coarse(
    mol1, [features["thiophene"], features["alkyl_3"]], atomistic=True
)

cg_comp1.visualize(color_scheme={"Car": "black", "_A": "pink", "_B": "green"}).show()

In [None]:
# But since these are from the same trajectory, they have
# the same number of particles in the same order, so we can
# "fix" the bad morphology using the good one!
mol1_fixed = utils.map_good_on_bad(mol0, mol1)

# Hey look it's fixed =D
cg_comp1_fixed = utils.coarse(
    mol1_fixed, [features["thiophene"], features["alkyl_3"]], atomistic=True
)

cg_comp1_fixed.visualize(
    color_scheme={"Car": "black", "_A": "pink", "_B": "green"}
).show()

# and we can rewrap it into the box
cg_comp1_fixed.wrap()


cg_comp1_fixed.visualize(
    color_scheme={"Car": "black", "_A": "pink", "_B": "green"}
).show();