<a href="https://colab.research.google.com/github/mosdef-hub/CECAM-MoSDeF-Workshop/blob/main/polymer_workflow/hoomd-organics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 0. Orientation, Installation, & Setup
## Learning objectives
This notebook provides interactive examples that will assist learners in using MoSDeF tools to:
* Initialize complex macromolecules for molecular simulation
* Demonstrate how to run HOOMD-Blue simulations with these molecules
* Use and inspect forcefields
* Generate coarse-grained representations

## Software stack setup
After running the cell below the kernel will restart -- This is necessary for conda dependencies, but you'll need to wait for that kernel restart before running the second cell.

## Interface notes
There are two types of output in these Colab notebooks that can be a little tricky:

1. If the output is very long, for example from the mamba command in the second cell, scrolling past the output can feel onerous. In this case, scrolling up and down in the narrow grey area between the sidebar menu and the cells can help you navigate.

2. If the output is a visualization of a molecule or simulation configuration, scrolling up or down will zoom in or out if the cursor is over the visualization. In these cases, take some care to scroll outside of the visualization.

In [1]:
!pip install -q condacolab
!git clone --single-branch --branch cecam https://github.com/cmelab/hoomd-organics
import condacolab
condacolab.install()

Cloning into 'hoomd-organics'...
remote: Enumerating objects: 2805, done.[K
remote: Counting objects: 100% (1232/1232), done.[K
remote: Compressing objects: 100% (396/396), done.[K
remote: Total 2805 (delta 895), reused 1041 (delta 825), pack-reused 1573[K
Receiving objects: 100% (2805/2805), 1017.44 KiB | 10.49 MiB/s, done.
Resolving deltas: 100% (1895/1895), done.
⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.1.0-1/Mambaforge-23.1.0-1-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:13
🔁 Restarting kernel...


It will take about 2-3 minutes to install the python dependencies after the kernel restarts. Once the kernel does restart, you can run this cell right away. This cell and the previous one only need to be run once each, and running either one a second time can cause some confusions.

In [1]:
import os
os.chdir("hoomd-organics")
!mamba env update -n base -f environment-cpu.yml
!python -m pip install -e .

[?25l[2K[0G[+] 0.0s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.1s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.2s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.3s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.4s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.5s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.6s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.7s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.8s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 0.9s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 1.0s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [2K[1A[2K[1A[2K[0G[+] 1.1s
conda-forge/linux-64  ⣾  
conda-forge/noarch    ⣾  [


#1.  HOOMD simulations from start to finish with MoSDeF Tools
## Overview:
We'll see how to run simulations of polyphenylene sulfide (PPS) molecules using the `hoomd-organics`, a package of MoSDeF tools for initializing and performing common MD simulations of organic molecules. We will use the [`HOOMD-blue`](https://hoomd-blue.readthedocs.io/en/v4.1.0/) simulation engine.



# TODO: Strip?

In summary, the `hoomd-organics` package has three main classes:

-  `Molecule`: This class is used to define the structure of a molecule (for example the structure of a polymer built from a monomer).

- `System`: This class is used to define the system of molecules (for example a system of polymers) in a box and creates the initial `gsd` snapshot of the system. It also applies the forcefiled to the system and prepares the required forces for the simulation.

- `Simulation`: This class is used to run the simulation using the `HOOMD-blue` simulation engine. In order to initialize a simulation, a `gsd` snapshot of the system and a list of `Hoomd` forces are required.

## 1.1 Initialization, parameterization, and simulation

First, let's see everything in one block:

With just a couple of imports and four lines of code, we are able to initialize 30 8-mers of PPS, randomly pack them into a volume, and perform a simulation in the NVT ensemble.

In [6]:
from hoomd_organics.library import PPS, OPLS_AA_PPS
from hoomd_organics import Pack, Simulation

molecules = PPS(num_mols=30, lengths=8)
system = Pack(molecules=molecules, force_field=OPLS_AA_PPS(), density=0.5, r_cut=2.5, auto_scale=True, scale_charges=True, packing_expand_factor=5)
sim = Simulation(initial_state=system.hoomd_snapshot, forcefield=system.hoomd_forcefield, gsd_write_freq=100, log_write_freq=100)
sim.run_NVT(n_steps=1000, kT=1.0, tau_kt=0.01)

  and should_run_async(code)


Initializing simulation state from a snapshot.
Step 100 of 1000; TPS: 42.31; ETA: 0.4 minutes
Step 200 of 1000; TPS: 41.35; ETA: 0.3 minutes
Step 300 of 1000; TPS: 43.73; ETA: 0.3 minutes
Step 400 of 1000; TPS: 48.79; ETA: 0.2 minutes
Step 500 of 1000; TPS: 54.56; ETA: 0.2 minutes
Step 600 of 1000; TPS: 59.12; ETA: 0.1 minutes
Step 700 of 1000; TPS: 62.63; ETA: 0.1 minutes
Step 800 of 1000; TPS: 65.75; ETA: 0.1 minutes
Step 900 of 1000; TPS: 68.46; ETA: 0.0 minutes


In the above example, a lot of functionality is provided by two key imports: `PPS`, `and OPLS_AA_PPS`. `PPS()` uses `mBuild` tools to initialize PPS chemistries specificially, and `OPLS_AA_PPS` is an instance of a `foyer.Forcefield` that provides the subset of parameters from OPLS_AA needed by PPS specifically.

### Step 1: Defining the Molecule
In this example, we are using the pre-defined PPS molecule defined in the `hoomd_organics` library. The `PPS` class is a subclass of the `Molecule` class. This class includes all the necessary information about the PPS molecule, including the monomer structure and how the monomers bond to create a chain. All we need to specify is the polymer length and how many polymer chain we want to create. In this example, we will create a system of 3 PPS chains with a length of 10 monomers.


  and should_run_async(code)


### Step 2: Defining the System
In this step, we will use the `Pack` class, which is a subclass of the `System` class to pack a box of PPS molecules given a density. The system class creates the box and fill it with molecules, applies the force-field (if provided) to the system and creates
the initial state of the system in form a `gsd` snapshot. If force-field is provided, this class also gets the list of forces that defines the bonded and non-bonded interactions between the particles.

In this example, we pass the molecule object created in step 1 to pack a box with density=0.8. For the force-field, we use the pre-defined `OPLS` force-field class which includes all the parameters found in the OPLS xml force-field file.

  and should_run_async(code)


NameError: ignored

In [None]:
system.system.visualize()

  and should_run_async(code)


<py3Dmol.view at 0x783fce2bbdf0>

We can obtain the `gsd` snapshot of the system by calling the `system.snapshot` attribute.


In [None]:
hoomd_snapshot = system.hoomd_snapshot #a little slow? And unused below?

We can also obtain the list of forces applied to the system by calling the `system.forces` attribute.

In [None]:
hoomd_forces = system.hoomd_forcefield #also a little slow? but in principle we can use these two objects below in our simulation


In [None]:
lj_force = hoomd_forces[3]

dict(lj_force.params)

{('ca',
  'ca'): _HOOMDDict{'epsilon': 0.16470588235294115, 'sigma': 0.9861111111111112},
 ('ca',
  's'): _HOOMDDict{'epsilon': 0.37091488871610456, 'sigma': 0.9930312739844155},
 ('ca',
  'ha'): _HOOMDDict{'epsilon': 0.10782531046954916, 'sigma': 0.8141779918845362},
 ('ca',
  'sh'): _HOOMDDict{'epsilon': 0.40583972495671383, 'sigma': 0.9930312739844155},
 ('ca', 'hs'): _HOOMDDict{'epsilon': 0.0, 'sigma': 0.0},
 ('s', 's'): _HOOMDDict{'epsilon': 0.8352941176470587, 'sigma': 1.0},
 ('ha',
  's'): _HOOMDDict{'epsilon': 0.2428207934182387, 'sigma': 0.8198915917499229},
 ('s', 'sh'): _HOOMDDict{'epsilon': 0.9139442639718566, 'sigma': 1.0},
 ('hs', 's'): _HOOMDDict{'epsilon': 0.0, 'sigma': 0.0},
 ('ha',
  'ha'): _HOOMDDict{'epsilon': 0.07058823529411763, 'sigma': 0.6722222222222223},
 ('ha',
  'sh'): _HOOMDDict{'epsilon': 0.2656844656620285, 'sigma': 0.8198915917499229},
 ('ha', 'hs'): _HOOMDDict{'epsilon': 0.0, 'sigma': 0.0},
 ('sh', 'sh'): _HOOMDDict{'epsilon': 0.9999999999999999, 'sigma

### Step 3: Running the Simulation

Using the snapshot and forces provided by the system class, we can initialize the simulation. The `Simulation` class  logs snapshots of the simulation in form of a `gsd` trajectory file while running simulation. The `gsd_write_freq` specifies the frequency of saving snapshots into the gsd file. This class also logs other simulation data such as timestep, potential energy, kinetic temperature, pressure and volume into a text file. The frequency for logging these information can be set by `log_write_freq` parameter.

In [None]:
sim = Simulation(initial_state=system.hoomd_snapshot, forcefield=system.hoomd_forcefield, gsd_write_freq=100, log_write_freq=100)#kinda slow - rehashing outputs from above?

  and should_run_async(code)


Initializing simulation state from a snapshot.


We can now run the simulation for 1000 time steps using the NVT ensemble at a given scaled temperature of 1.0.

In [5]:
sim.run_NVT(n_steps=1000, kT=1.0, tau_kt=0.01) #just got a slow node on sunday night?

  and should_run_async(code)


NameError: ignored

The simulation class also allows user to run the simulation under different conditions such as NPT ensemble, NVE ensemble, Langevin dynamics. Checkout `hoomd_organics/base/simulation.py` for more functionalities.

In the rest of this tutorial, we will go through some of the features that are available in the `hoomd_organics` package that can be tailored to specific needs.

## Defining your own Molecule

You can define your own molecule in a couple of different ways:
- Using the SMILES string of the molecule
- Using the molecule file (accepted formats are: `.mol` and `.sdf`)
- Using a [`mbuild`](https://mbuild.mosdef.org/en/stable/) compound or a [`gmso`](https://gmso.mosdef.org/en/stable/) topology
- Define a subclass of the `Molecule` class

### Option 1: Using the SMILES string of the molecule

In [None]:
# example of loading a molecule using the SMILES string
from hoomd_organics import Molecule
benzoic_acid_mol = Molecule(num_mols=20, smiles="c1cc(C(O)=O)ccc1")

  and should_run_async(code)


We will use `mbuild` visualization function to visualize one of the 20 benzoic acid molecules.

In [None]:
benzoic_acid_mol.molecules[0].visualize()

  and should_run_async(code)


<py3Dmol.view at 0x784029a42290>

### Option 2: Using the molecule file

In [None]:
# example of loading a molecule using the molecule file
#!wget https://raw.githubusercontent.com/cmelab/hoomd-organics/main/hoomd_organics/assets/molecule_files/IPH.mol2
#phenol_mol = Molecule(num_mols=20, file="IPH.mol2")
phenol_mol = Molecule(num_mols=20, file="hoomd_organics/assets/molecule_files/IPH.mol2")

  and should_run_async(code)


In [None]:
phenol_mol.molecules[0].visualize()

  and should_run_async(code)


<py3Dmol.view at 0x783fcd83ef50>

### Option 3: Using a [`mbuild`](https://mbuild.mosdef.org/en/stable/) compound or a [`gmso`](https://gmso.mosdef.org/en/stable/) topology

In [None]:
# example of loading a molecule from mbuild compound or gmso topology
import mbuild as mb

mb_compound = mb.load("c1ccccc1", smiles=True) #let's doublecheck benzene

gmso_top = mb_compound.to_gmso()

benzene_mol = Molecule(num_mols=20, compound=mb_compound)
benzene_mol = Molecule(num_mols=20, compound=gmso_top)


  and should_run_async(code)


### Option 4: Define a subclass of the `Molecule` class

Checkout some examples of polymer classes defined in `hoomd_organics/library/polymers.py`.

## Defining your own Forcefield
`hoomd-organics` package has a list of pre-defined force-fields that can be used to initialize the system. If you have the `xml` file of the forcefield, you can use the `FF_from_file` class from `hoomd_organics.library` to create a force-field object.
You can also define your own forcefield by creating a subclass of the `foyer.Forcefield` class.


In [None]:
# example of defining a force-field using the xml file
from hoomd_organics.library import FF_from_file

#benzene_ff = FF_from_file(xml_file="benzene_opls.xml")
benzene_ff = FF_from_file(
    xml_file="hoomd_organics/assets/forcefields/benzene_opls.xml")

  and should_run_async(code)


Checkout `hoomd_organics/library/forcefields.py` for more some examples of defining a forcefield using a subclass of `foyer.Forcefield` for specific molecules.

## Defining your own System


`hoomd_organics` package has two methods of filling the box built in the `System` class: `Pack` and `Lattice`. (more info about pack and lattice?). Note that the base `System` class is considered an abstract class and cannot be called directly.

In [None]:
# example of defining a system using the Lattice method

from hoomd_organics import Lattice
from hoomd_organics.library import OPLS_AA

benzene_mol = Molecule(num_mols=32, smiles="c1ccccc1")

lattice = Lattice(
            molecules=[benzene_mol],
            force_field=OPLS_AA(),
            density=1.0,
            r_cut=2.5,
            x=1,
            y=1,
            n=4,
            auto_scale=True
        )

  and should_run_async(code)


In [None]:
lattice.system.visualize()

  and should_run_async(code)


<py3Dmol.view at 0x783fc9c13190>

You can also define your own method of filling the box by creating a subclass of the `System` class. For example, one method of filling a box with two types of molecule is creating alternate layers of each molecule type.

##  Example of a system with multiple molecule types

The system class can take a list of different molecule types along with different forcefields. If all molecule types use the same forcefield, then you only need to pass the forcefield once.

In [None]:
#!wget https://github.com/cmelab/hoomd-organics/raw/main/hoomd_organics/assets/forcefields/dimethylether_opls.xml
from hoomd_organics.library import OPLS_AA_DIMETHYLETHER
dimethylether_mol = Molecule(num_mols=20, smiles="COC")
pps_mol = PPS(num_mols=10, lengths=4)
multi_type_system = Pack(
    molecules=[dimethylether_mol, pps_mol],
    density=0.8,
    r_cut=2.5,
    force_field=[OPLS_AA_DIMETHYLETHER(), OPLS_AA_PPS()],
    auto_scale=True,
)

  and should_run_async(code)


In [None]:
multi_type_system.system.visualize()

  and should_run_async(code)


<py3Dmol.view at 0x783fc7c1e7a0>

In [None]:
#CG examples below
from hoomd_organics.base import Pack, Simulation
from hoomd_organics.library import PPS, BeadSpring

  and should_run_async(code)


In [None]:


pps_mol = PPS(num_mols=300, lengths=6)

pps_mol.molecules[0].visualize()

  and should_run_async(code)


<py3Dmol.view at 0x783fbda975b0>

In [None]:
pps_mol.coarse_grain(beads={"A": "c1ccc(S)cc1"})


  and should_run_async(code)


In [None]:


pps_mol.molecules[0].visualize()



  and should_run_async(code)


<py3Dmol.view at 0x783fc9b03430>

In [None]:
ff = BeadSpring(
    r_cut=2.5,
    beads={
        "A": dict(epsilon=1.0, sigma=1.0),
    },
    bonds={
        "A-A": dict(r0=1.1, k=300),
    },
    angles={"A-A-A": dict(t0=2.0, k=200)},
    dihedrals={"A-A-A-A": dict(phi0=0.0, k=100, d=-1, n=1)},
)

  and should_run_async(code)


In [None]:
cg_system = Pack(molecules=pps_mol, density=0.1, r_cut=2.5, auto_scale=False)

  and should_run_async(code)


In [None]:


cg_system.system.visualize()



  and should_run_async(code)


<py3Dmol.view at 0x783fb1626950>

In [None]:
cg_sim = Simulation(initial_state=cg_system.hoomd_snapshot, forcefield=ff.hoomd_forcefield)

  and should_run_async(code)


Initializing simulation state from a snapshot.


In [None]:
cg_system.hoomd_snapshot.particles.types

  and should_run_async(code)


['A']

In [None]:
cg_sim.run_NVT(n_steps=1e3, kT=0.7, tau_kt=1.0)

  and should_run_async(code)
