# Polymer Building: Step-by-Step Visualization

This tutorial demonstrates the fundamental process of building a polymer chain step-by-step, with visualization at each stage.

## Overview

We'll build a simple polymer by:
1. Creating a monomer from BigSMILES notation
2. Adding a second monomer with proper positioning
3. Connecting monomers through a chemical reaction (dehydration)

Each step exports a LAMMPS data file for visualization, allowing you to see exactly how the polymer grows.

## Design Rationale

**Why step-by-step?** This approach helps you understand:
- How monomers are positioned relative to each other
- How chemical reactions modify the structure
- How topology (bonds, angles, dihedrals) is managed
- How force field typing works at each stage

**Why dehydration reaction?** This is a common polymerization mechanism where:
- Two -OH groups react
- A water molecule (H₂O) is removed
- An ether bond (C-O-C) is formed


## Step 1: Import Required Libraries

We need:
- `molpy` core modules for molecular structures
- `RDKitAdapter` for 3D coordinate generation
- `Reacter` for chemical reactions
- `OplsAtomisticTypifier` for force field assignment


In [None]:

from pathlib import Path

import numpy as np

import molpy as mp
from molpy.core.atomistic import Atom, Atomistic
from molpy.external import Generate3D, RDKitAdapter
from molpy.io.data.lammps import LammpsDataWriter
from molpy.parser.smiles import bigsmilesir_to_monomer, parse_bigsmiles
from molpy.reacter import (
    Reacter,
    find_port_atom,
    form_single_bond,
    select_c_neighbor,
    select_hydroxyl_group,
    select_hydroxyl_h_only,
)
from molpy.typifier.atomistic import OplsAtomisticTypifier

## Step 2: Define Helper Functions

### Monomer Builder

The `build_monomer()` function:
1. Parses BigSMILES notation: `{[<]OCCOCCOCCO[>]}`
   - `[<]` and `[>]` are reactive ports (connection points)
   - `OCCOCCOCCO` is the monomer backbone (ethylene oxide dimer)
2. Converts to 3D coordinates using RDKit
3. Adds hydrogens and optimizes geometry

**Design choice:** We use BigSMILES because it explicitly marks reactive sites, making it clear where polymerization occurs.


In [None]:
def build_monomer() -> Atomistic:
    """Build monomer with -OH end groups from BigSMILES."""
    bigsmiles = "{[][<]OCCOCCOCCO[>][]}"
    ir = parse_bigsmiles(bigsmiles)
    monomer = bigsmilesir_to_monomer(ir)

    adapter = RDKitAdapter(internal=monomer)
    generate_3d = Generate3D(
        add_hydrogens=True,
        embed=True,
        optimize=True,
        update_internal=True,
    )
    adapter = generate_3d(adapter)
    monomer = adapter.get_internal()
    return monomer

### LAMMPS Export Function

The `export_frame_to_lammps()` function:
1. Converts Atomistic to Frame (molpy's data structure)
2. Adds simulation box
3. Ensures required fields (id, mol, q) are present
4. Writes LAMMPS data file

**Why export at each step?** This allows visualization in tools like OVITO or VMD to see the structural changes.


In [None]:
def export_frame_to_lammps(
    atomistic: Atomistic,
    output_path: Path,
    box_size: float = 20.0,
) -> None:
    """Export Atomistic to LAMMPS after typification."""
    frame = atomistic.to_frame()
    frame.metadata["box"] = mp.Box.cubic(length=box_size)

    n_atoms = frame["atoms"].nrows

    # Ensure ID field
    if "id" not in frame["atoms"]:
        ids = [atom.get("id", i) for i, atom in enumerate(atomistic.atoms, start=1)]
        frame["atoms"]["id"] = np.array(ids, dtype=int)

    # Ensure mol field
    frame["atoms"]["mol"] = np.array([1] * n_atoms, dtype=int)

    # Ensure charge field is available as 'q' (LAMMPS uses 'q')
    if "charge" in frame["atoms"]:
        frame["atoms"]["q"] = frame["atoms"]["charge"]

    writer = LammpsDataWriter(output_path, atom_style="full")
    writer.write(frame)

## Step 3: Load Force Field

We use OPLS-AA (Optimized Potentials for Liquid Simulations - All Atom):
- Widely used for organic molecules
- Provides parameters for bonds, angles, dihedrals, and non-bonded interactions
- The typifier automatically assigns atom types based on chemical environment

**Why strict typing?** This ensures all atoms get proper force field parameters, catching any missing types early.


In [None]:
# Create output directory
output_dir = Path("case0_output")
output_dir.mkdir(parents=True, exist_ok=True)

# Load force field
ff = mp.io.read_xml_forcefield("oplsaa.xml")
typifier = OplsAtomisticTypifier(ff, strict_typing=True)
print("✅ Force field loaded successfully")

## Step 4: Build and Export First Monomer

Process:
1. Build monomer from BigSMILES
2. Generate topology (bonds, angles, dihedrals)
3. Assign force field types
4. Merge into polymer container
5. Export to LAMMPS

**Why generate topology?** The typifier needs to know the bonding pattern to assign correct atom types.

**Why merge into polymer?** This creates a container that will hold the growing polymer chain.


In [None]:
monomer = build_monomer()
monomer.get_topo(gen_angle=True, gen_dihe=True)
typifier.typify(monomer)

polymer = Atomistic()
polymer.merge(monomer)

print("✅ Step 1 completed:")
print(f"   Monomer: {len(monomer.atoms)} atoms")
print(f"   Polymer: {len(polymer.atoms)} atoms")

export_frame_to_lammps(polymer, output_dir / "step1_monomer.data", box_size=20.0)
print(f"   Exported: {output_dir / 'step1_monomer.data'}")

left_monomer = monomer

## Step 5: Add Second Monomer with Positioning

This is the most complex step. We need to:
1. Build a second monomer
2. Position it near the first monomer
3. Orient it correctly for reaction

### Positioning Strategy

**Translation:** Move monomer2 so its port is near monomer1's port
- Calculate direction from polymer center to port1
- Place monomer2 along this direction
- Distance: 2× bond length (to avoid overlap)

**Rotation:** Align monomer2's port direction opposite to monomer1's
- Calculate rotation axis using cross product
- Rotate by 180° to face monomers toward each other

**Why this approach?** Proper positioning ensures:
- Monomers are close enough to react
- Ports are aligned for bond formation
- No atomic overlaps that would cause simulation issues


In [None]:
monomer2 = build_monomer()

# Assign new IDs (must be unique)
max_id = max((atom.get("id", 0) for atom in polymer.atoms), default=0)
for atom in monomer2.atoms:
    max_id += 1
    atom["id"] = max_id

# Get port atoms: ">" on polymer, "<" on monomer2
port1_atom: Atom | None = None
port2_atom: Atom | None = None

for atom in polymer.atoms:
    if atom.get("port") == ">":
        port1_atom = atom
        break

for atom in monomer2.atoms:
    if atom.get("port") == "<":
        port2_atom = atom
        break

if port1_atom is None or port2_atom is None:
    raise RuntimeError("Could not find ports '>' or '<' on monomers")

# Calculate translation
port1_pos = np.array([port1_atom["x"], port1_atom["y"], port1_atom["z"]], dtype=float)
port2_pos = np.array([port2_atom["x"], port2_atom["y"], port2_atom["z"]], dtype=float)

polymer_coords = np.array(
    [[atom["x"], atom["y"], atom["z"]] for atom in polymer.atoms],
    dtype=float,
)
polymer_center = np.mean(polymer_coords, axis=0)
direction = port1_pos - polymer_center
direction = direction / np.linalg.norm(direction)

bond_length = 1.43  # C-O bond length (Å)
target_pos = port1_pos + direction * (bond_length * 2)
translation = target_pos - port2_pos

# Calculate rotation
monomer2_coords = np.array(
    [[atom["x"], atom["y"], atom["z"]] for atom in monomer2.atoms],
    dtype=float,
)
monomer2_center = np.mean(monomer2_coords, axis=0)
port2_direction = port2_pos - monomer2_center
port2_direction = port2_direction / np.linalg.norm(port2_direction)

axis = np.cross(port2_direction, -direction)
axis_norm = np.linalg.norm(axis)
if axis_norm < 1e-6:
    # Parallel case: choose arbitrary perpendicular axis
    if abs(port2_direction[0]) < 0.9:
        axis = np.array([1.0, 0.0, 0.0])
    else:
        axis = np.array([0.0, 1.0, 0.0])
else:
    axis = axis / axis_norm

angle = np.pi  # 180 degrees
monomer2.rotate(axis=axis.tolist(), angle=angle, about=monomer2_center.tolist())
monomer2.move(delta=translation.tolist())

# Typify and merge
monomer2.get_topo(gen_angle=True, gen_dihe=True)
typifier.typify(monomer2)

polymer.merge(monomer2)
polymer.get_topo(gen_angle=True, gen_dihe=True)
typifier.typify(polymer)

print("✅ Step 2 completed:")
print(f"   Added monomer: {len(monomer2.atoms)} atoms")
print(f"   Polymer total: {len(polymer.atoms)} atoms")

export_frame_to_lammps(polymer, output_dir / "step2_two_monomers.data", box_size=25.0)
print(f"   Exported: {output_dir / 'step2_two_monomers.data'}")

left_monomer_copy = left_monomer.copy()
right_monomer_copy = monomer2.copy()

## Step 6: Connect Monomers via Dehydration Reaction

### Reaction Mechanism

Dehydration reaction: R-OH + HO-R' → R-O-R' + H₂O

**Reacter Configuration:**
- `anchor_selector_left`: Selects C neighbor of -OH (where new bond forms)
- `anchor_selector_right`: Selects O atom itself
- `leaving_selector_left`: Selects entire -OH group
- `leaving_selector_right`: Selects only H from -OH
- `bond_former`: Creates C-O single bond

**Why this design?**
- Left side: C-OH → C (remove OH, keep C)
- Right side: HO-R → O-R (remove H, keep O)
- Result: C-O bond formation, H₂O removed

**Port atoms:** We use `find_port_atom()` to locate the reactive sites marked in BigSMILES.


In [None]:
# Define dehydration reaction
dehydration = Reacter(
    name="dehydration_ether",
    anchor_selector_left=select_c_neighbor,  # C next to -OH
    anchor_selector_right=lambda struct, port_atom: port_atom,  # O atom itself
    leaving_selector_left=select_hydroxyl_group,  # Remove -OH
    leaving_selector_right=select_hydroxyl_h_only,  # Remove H only
    bond_former=form_single_bond,  # Create C-O bond
)

# Run reaction
result = dehydration.run(
    left=left_monomer_copy,
    right=right_monomer_copy,
    port_atom_L=find_port_atom(left_monomer_copy, ">"),
    port_atom_R=find_port_atom(right_monomer_copy, "<"),
    compute_topology=True,
)

product = result.product_info.product

# Typify product
product.get_topo(gen_angle=True, gen_dihe=True)
typifier.typify(product)

print("✅ Step 3 completed:")
print(f"   Reacted polymer: {len(product.atoms)} atoms")
print(f"   Removed atoms: {len(result.topology_changes.removed_atoms)} atoms (water: O+H+H)")
print(f"   New bonds: {len(result.topology_changes.new_bonds)} bonds")

export_frame_to_lammps(product, output_dir / "step3_connected.data", box_size=25.0)
print(f"   Exported: {output_dir / 'step3_connected.data'}")

## Summary

We've successfully demonstrated the step-by-step polymer building process:

1. ✅ Built monomer from BigSMILES notation
2. ✅ Positioned second monomer with proper alignment
3. ✅ Connected monomers through dehydration reaction

**Key Takeaways:**
- BigSMILES provides clear reactive site marking
- Proper positioning is crucial for successful reactions
- Reacter handles complex bond formation and atom removal
- Force field typing must be updated after structural changes

**Next Steps:**
- Scale up to longer chains using automated builders
- Add polydispersity for realistic molecular weight distributions
- Create cross-linked networks for thermoset polymers

**Visualization:**
Load the exported `.data` files in OVITO or VMD to see the structural evolution!
