# PolymerVisualizer3D Tutorial
### Assigning Chemical Information to a Homopolymer and Heteropolymer 
PolymerVisualizer3D is a jupyter tool that helps in the processing of polyer pdbs in conjunction with the general polymer testing branch of the openff toolkit. Oftentimes with large polymer PDBs, it is necessary to assign chemical information (which here, is limited to bond order, and formal charge) over a PDB. Since PDB's only offer information about basic graph connectivity, substructures that carry chemical information must be isomorphically matched to the chemical graph, which is accomplished with `openff.toolkit.topology.Molecule.from_pdb_and_monomer_info()`. PolymerVizualizer3D assists in the creation of these substructures as a graphical implementation of `substructure_generator.SubstructureGenerator`. 

This tutorial will cover the monomer generation of natural rubber, a PEG/PLGA heteropolymer, and a dyed protein. 

## Imports

In [2]:
from substructure_vizualizer import PolymerVisualizer3D, ChemistryEngine
from rdkit import Chem

## Natural Rubber (homopolymer)
Our first example will be a simple homopolymer. To find monomers, two things must be known: 1) what continguous atoms constitute a monomer, 2) what is the chemical information in that monomer (bond orders and formal charges). Useful info can be found [here](https://www.chemtube3d.com/_rubberf/). 

#### Steps:
1. Load the following cell. Note the lack of any chemical information (natural rubber has double bonds!) loaded from the PDB
2. In the `Test Load` tab, try clicking `Run` to attempt assigning monomers to the polymer. Since no monomers have been created, all atoms will be "unmatched", designated by a red sphere. 
3. Create a Monomer for Natural Rubber
    1. Click on the `Edit Monomers` tab
    2. Select the 13 atoms (including Hs!) that make up a typical natural rubber monomer. Click `Next`.
        1. If unsure, hover over and click on the following atoms: 40, 41, 42, 43, 49, 50, 51, 52, 44, 45, 46, 47, 48
    3. In the `Assign Chemical Info` Tab, note the connection atoms (highlighted in yellow) that show neighboring monomers or terminal groups. 
    4. In the middle of the monomer (see prev. link for location), create a double bond.
        1. Click once on `Make Double Bond`
        2. Select the two adjacent atoms of the double bond
        3. Once the two atoms are highlighted in yellow, select `Commit Double Bonds`
    5. Once bond orders are satisfactory, click `Next`
    6. In the `Inspect Caps` tab, a list of found terminal groups can be found. These are instances where the selected monomer appears at the edge of the polymer in the input PDB, which may or may not have special chemistry. In the case of natural rubber, terminal groups only differ from middle-chain groups by a single capping hydrogen, so no chemical info is sufficiently different. Try clicking on the different Terminal groups (TERM1, TERM2, etc).
    7. In the `Monomer Name` box, enter a fitting name (required) and click `Finish Monomer`
4. Click on the `Test Load` tab and click `Print Selection`. The printed dictionary is a representation of the information stored from step 3, which includes a middle-polymer monomer (with two wildtype (*) atoms representing neighbors) and a few terminal group monomers (with only one wildtype atom). 
5. Finally, click `Run`. The polymer should now have zero red circles and double bonds in the correct places. For a vizual representation of where the monomers were mapped, click the `See Color Code` button. Atoms may also be clicked to see a brief description under the `See Color Code` button. 

In [3]:
file = "polymer_examples/rdkit_simple_polymers/naturalrubber.pdb"
engine = ChemistryEngine(file)
viz = PolymerVisualizer3D(engine)
viz

<py3Dmol.view at 0x7f53440f5220>

VBox(children=(HBox(children=(ToggleButtons(index=1, layout=Layout(width='799px'), options=('Test Load', 'Edit…

## Output
Json files can be saved and stored for later use with the following code. `substructure_generator` is a member of the `substructure_generator.SubstructureGenerator` class, so monomers may be entered manually used in the `Run` button of the viz tool through the `viz.chemistry_engine.substructure_generator` variable. 

In [5]:
viz.chemistry_engine.substructure_generator.output_monomer_info_json("natural_rubber_monomers.json")

## PEG PLGA heterpolymer 
Heteropolymers are just another collection of monomers/substructures and can be handled as well. PEG and PLGA monomers have the following structure, usually being connected by two carbons and an oxygen along the backbone of the monomer. Note that PEG has no special chemical info (like the double bonds in the two PLGA monomers), but must still be included as a monomer to be properly matched in the polymer with PLGA. 

![title](img/PEG_PLGA_monomers.png)

#### Steps:
1. Load the following cell and click the `Edit Monomers` button. 
2. Select the atoms for a PEG monomer and click `Next`.
    1. Any PEG monomer may be chosen, but if you are unsure, pick the following atoms: 56, 57, 58, 59, 60, 61, 62
3. Since there is no chemical info to be inputted for PEG, click `Next`.
4. In the `Inspect Caps` window, look over the found terminal groups. Note that there will usually be more terminal groups than what is needed (for a linear polymer such as this, there are only ever 2 terminal groups). If you know which terminal groups are redundant, delete them. Otherwise, there is no major downside to including all found terminal groups. 
5. Enter "PEG" in the `Monomer Name` box and click `Finish Monomer`
6. We can see how much of the polymer can be matched now: click the `Test Load` tab and click `Run`. Note that only a small fraction of the molecule can be matched. The isomorphism fitting algorithm tries to find the largest contiguous set of monomers, which is now, only 2 PEG monomers. 
7. Add the first PLGA monomer with a methyl group extending from the backbone. Follow the same steps as with natural rubber to add a double bond to the appropriate oxygen atom.
    1. If unsure about which monomer to pick, the following is an exmaple: 40, 41, 42, 43, 44, 45, 46, 47, 48
    2. There will be no found terminal groups, since PLGA does not appear at the ends of the polymer in this pdb
    3. Make sure to pick a unique name, such as "PLGA1"
7. Add the second PLGA monomer with no methyl group extending from the backbone. Follow the same steps as with natural rubber to add a double bond to the appropriate oxygen atom.
    1. If unsure about which monomer to pick, the following is an exmaple: 71, 72, 73, 74, 75, 76
    2. There will be no found terminal groups, since PLGA does not appear at the ends of the polymer in this pdb
    3. Make sure to pick a unique name, such as "PLGA2"
8. In the `Test Load` tab, click Run. The entire monomer should be matched with no red spheres. Use the `See Color Code` button to inspect which monomers are assigned where. Note the new presence of double bonds within all PLGA monomers. 

In [7]:
file = "polymer_examples/rdkit_simple_polymers/PEG_PLGA_heteropolymer.pdb"
json_monomer_file = "polymer_examples/rdkit_simple_polymers/PEG_PLGA_monomers.json"
engine = ChemistryEngine(file)
viz = PolymerVisualizer3D(engine)
viz

<py3Dmol.view at 0x7f52c435d430>

VBox(children=(HBox(children=(ToggleButtons(layout=Layout(width='799px'), options=('Test Load', 'Edit Monomers…

In [12]:
viz.chemistry_engine.substructure_generator.output_monomer_info_json("PEG_PLGA_monomers.json")

## Dyed Protein
Proteins are just heteropolymers, albeit very complicated polymers. Instead of manually generating the amino acids for all proteins, a library is available that can be loaded into the PolymerVisualizer3D class before starting. In this case, a dyed protein utalizes some of these pre-built amino acid monomers, but a new version of an amino acid must be generated for the bonded dye.

In [6]:
file = "polymer_examples/rdkit_simple_polymers/dyed_protein.pdb"
engine = ChemistryEngine(file)
for bond in engine.full_molecule.GetBonds():
    bond.SetBondType(Chem.BondType.SINGLE)
# load 26 amino acids manually
monomer_info = {
        "ALA": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#1:8])-[#6:9](=[#8:10])-[O]-[H]", []),
        "ARG": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6:11](-[#1:12])(-[#1:13])-[#7:14](-[#1:15])-[#6:16](-[#7:17](-[#1:18])-[#1:19])=[#7&+:20](-[#1:21])-[#1:22])-[#6:23](=[#8:24])-[O]-[H]", []),
        "ASH": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#8:9])-[#8:10]-[#1:11])-[#6:12](-[#8:13])-[O]-[H]", []),
        "ASN": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](=[#8:9])-[#7:10](-[#1:11])-[#1:12])-[#6:13](=[#8:14])-[O]-[H]", []),
        "ASP": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](=[#8:9])-[#8:10])-[#6:11](=[#8:12])-[O]-[H]", []),
        "CYS": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#16:8]-[#1:9])-[#6:10](=[#8:11])-[O]-[H]", []),
        "GLH": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6:11](-[#8:12])-[#8:13]-[#1:14])-[#6:15](-[#8:16])-[O]-[H]", []),
        "GLN": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6:11](=[#8:12])-[#7:13](-[#1:14])-[#1:15])-[#6:16](=[#8:17])-[O]-[H]", []),
        "GLU": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6:11](=[#8:12])-[#8:13])-[#6:14](=[#8:15])-[O]-[H]", []),
        "GLY": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#1:5])-[#6:6](=[#8:7])-[O]-[H]", []),
        "HID": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8]1-[#7:9](-[#1:10])-[#6:11](-[#1:12])-[#7:13]-[#6:14]-1-[#1:15])-[#6:16](-[#8:17])-[O]-[H]", []),
        "HIE": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8]1-[#7:9]-[#6:10](-[#1:11])-[#7:12](-[#1:13])-[#6:14]-1-[#1:15])-[#6:16](-[#8:17])-[O]-[H]", []),
        "HIP": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8]1-[#7:9](-[#1:10])-[#6:11](-[#1:12])-[#7:13](-[#1:14])-[#6:15]-1-[#1:16])-[#6:17](-[#8:18])-[O]-[H]", []),
        "ILE": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6@@:5](-[#1:6])(-[#6:7](-[#1:8])(-[#1:9])-[#1:10])-[#6:11](-[#1:12])(-[#1:13])-[#6:14](-[#1:15])(-[#1:16])-[#1:17])-[#6:18](=[#8:19])-[O]-[H]", []),
        "LEU": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#6:10](-[#1:11])(-[#1:12])-[#1:13])-[#6:14](-[#1:15])(-[#1:16])-[#1:17])-[#6:18](=[#8:19])-[O]-[H]", []),
        "LYN": ("[H]-[#7:1](-[#1:2])-[#6:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6:11](-[#1:12])(-[#1:13])-[#6:14](-[#1:15])(-[#1:16])-[#7:17](-[#1:18])-[#1:19])-[#6:20](-[#8:21])-[O]-[H]", []),
        "LYS": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6:11](-[#1:12])(-[#1:13])-[#6:14](-[#1:15])(-[#1:16])-[#7&+:17](-[#1:18])(-[#1:19])-[#1:20])-[#6:21](=[#8:22])-[O]-[H]", []),
        "MET": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#16:11]-[#6:12](-[#1:13])(-[#1:14])-[#1:15])-[#6:16](=[#8:17])-[O]-[H]", []),
        "PHE": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8]1:[#6:9](-[#1:10]):[#6:11](-[#1:12]):[#6:13](-[#1:14]):[#6:15](-[#1:16]):[#6:17]:1-[#1:18])-[#6:19](=[#8:20])-[O]-[H]", []),
        "PRO": ("[H]-[#7:1]1-[#6:2](-[#1:3])(-[#1:4])-[#6:5](-[#1:6])(-[#1:7])-[#6:8](-[#1:9])(-[#1:10])-[#6@@:11]-1(-[#1:12])-[#6:13](=[#8:14])-[O]-[H]", []),
        "SER": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#8:8]-[#1:9])-[#6:10](=[#8:11])-[O]-[H]", []),
        "THR": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6@@:5](-[#1:6])(-[#6:7](-[#1:8])(-[#1:9])-[#1:10])-[#8:11]-[#1:12])-[#6:13](=[#8:14])-[O]-[H]", []),
        "TRP": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8]1=[#6:9](-[#1:10])-[#7:11](-[#1:12])-[#6:13]2:[#6:14](-[#1:15]):[#6:16](-[#1:17]):[#6:18](-[#1:19]):[#6:20](-[#1:21]):[#6:22]:2-1)-[#6:23](=[#8:24])-[O]-[H]", []),
        "TYR": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#1:7])-[#6:8]1:[#6:9](-[#1:10]):[#6:11](-[#1:12]):[#6:13](-[#8:14]-[#1:15]):[#6:16](-[#1:17]):[#6:18]:1-[#1:19])-[#6:20](=[#8:21])-[O]-[H]", []),
        "VAL": ("[H]-[#7:1](-[#1:2])-[#6@@:3](-[#1:4])(-[#6:5](-[#1:6])(-[#6:7](-[#1:8])(-[#1:9])-[#1:10])-[#6:11](-[#1:12])(-[#1:13])-[#1:14])-[#6:15](=[#8:16])-[O]-[H]", []),
    }
for name, substructure_and_caps in monomer_info.items():
    smarts, caps = substructure_and_caps
    engine.substructure_generator.add_monomer_as_smarts_fragment(smarts, name, caps, add_caps_from_discarded_ids=False)
viz = PolymerVisualizer3D(engine)
viz

<py3Dmol.view at 0x7f52c4d65400>

VBox(children=(HBox(children=(ToggleButtons(layout=Layout(width='799px'), options=('Test Load', 'Edit Monomers…

## Limitations 
...

In [3]:
viz.highlights

set()