# Linear Polymer Chain Building

This tutorial demonstrates how to build a linear polymer chain using molpy's automated polymer builder.

## Overview

We'll create a linear polymer by:
1. Parsing BigSMILES notation to define the monomer
2. Using PolymerBuilder to automatically construct the chain
3. Applying force field parameters (OPLS-AA)
4. Optimizing the geometry with LBFGS
5. Exporting to LAMMPS format

## Why Automated Building?

Compared to the step-by-step approach, automated building:
- **Faster**: Build long chains quickly
- **Consistent**: Reproducible structure generation
- **Scalable**: Easy to create chains of any length
- **Integrated**: Handles topology, typing, and positioning automatically

## Design Philosophy

**BigSMILES → IR → Builder → Atomistic**

This workflow separates:
- **Notation** (BigSMILES): Human-readable polymer specification
- **Intermediate Representation** (IR): Parsed structure
- **Builder**: Construction logic
- **Atomistic**: Final 3D structure with force field

This modular design allows flexibility at each stage.

In [None]:
"""
Refactored script version of `polymer-linear-chain.ipynb` using the latest molpy API.

The original notebook demonstrated building EO2/PS/EO3 polymers from CGSmiles
with a dehydration Reacter and ReacterConnector. This script mirrors the logic
with the new port/anchor split:

- Ports (SMILES markers $, <, >) are handled by the builder/connector
- Reacter only sees explicit port atoms and maps them to anchors via
  `anchor_selector_left/right`.
"""

In [None]:
from pathlib import Path

In [None]:
import numpy as np

## Step 3: Parse BigSMILES Notation

### BigSMILES String

```
{[<]OCCOCCOCCO[>]}
```

**Breakdown:**
- `{}`: Stochastic object (polymer)
- `[<]`: Left bonding descriptor (reactive site)
- `OCCOCCOCCO`: Monomer structure (ethylene oxide dimer)
  - O: Oxygen
  - C: Carbon
  - Implicit hydrogens
- `[>]`: Right bonding descriptor (reactive site)

### Parsing Process

1. **Lexical analysis**: Tokenize string
2. **Syntax parsing**: Build parse tree
3. **IR generation**: Create intermediate representation
4. **Validation**: Check for errors

**Why IR?** Separates notation from structure, enabling:
- Multiple input formats (BigSMILES, SMILES, etc.)
- Validation and error checking
- Optimization before building

In [None]:
import molpy as mp
from molpy.core.atomistic import Atomistic
from molpy.external import Generate3D, RDKitAdapter
from molpy.io.data.lammps import LammpsDataWriter
from molpy.parser.smiles import bigsmilesir_to_polymerspec, parse_bigsmiles
from molpy.reacter import (
    Reacter,
    form_single_bond,
    select_dehydration_left,
    select_dehydration_right,
    select_hydroxyl_group,
    select_hydroxyl_h_only,
)
from molpy.builder.polymer.connectors import ReacterConnector
from molpy.builder.polymer.placer import CovalentSeparator, LinearOrienter, Placer
from molpy.builder.polymer import PolymerBuilder
from molpy.builder.polymer.port_utils import get_all_port_info
from molpy.typifier.atomistic import OplsAtomisticTypifier

## Step 5: Apply Force Field

### Typification Process

`typifier.typify(polymer)` assigns force field parameters:

1. **Atom typing**: Match atoms to OPLS types
2. **Bond parameters**: Assign k, r₀ for each bond
3. **Angle parameters**: Assign k, θ₀ for each angle
4. **Dihedral parameters**: Assign Fourier coefficients
5. **Non-bonded**: Assign σ, ε, q for each atom

### Why After Building?

Typification requires complete topology:
- Needs to know all bonds
- Needs to know all angles
- Needs to know all dihedrals

Building first ensures topology is complete.

### Monomer Building Helper

This function automates monomer preparation:
1. **Parse BigSMILES** → Extract structure
2. **Generate 3D** → RDKit embedding and optimization
3. **Add hydrogens** → Complete structure
4. **Generate topology** → Bonds, angles, dihedrals
5. **Apply force field** → OPLS typing

This creates a ready-to-use monomer for polymer building.

In [None]:
def build_monomer_from_bigsmiles(bigsmiles: str, typifier: OplsAtomisticTypifier) -> Atomistic:
    """Build a monomer from BigSMILES with 3D coordinates and OPLS typing."""
    ir = parse_bigsmiles(bigsmiles)
    polymerspec = bigsmilesir_to_polymerspec(ir)
    monomers = polymerspec.all_monomers()
    if len(monomers) != 1:
        raise ValueError(f"Expected 1 monomer, got {len(monomers)}")

    monomer = monomers[0]
    adapter = RDKitAdapter(internal=monomer)
    generate_3d = Generate3D(
        add_hydrogens=True,
        embed=True,
        optimize=True,
        update_internal=True,
    )
    adapter = generate_3d(adapter)
    monomer = adapter.get_internal()
    monomer.get_topo(gen_angle=True, gen_dihe=True)

    for idx, atom in enumerate(monomer.atoms):
        atom["id"] = idx + 1

    typifier.typify(monomer)
    return monomer

## Step 7: Export to LAMMPS

### LAMMPS Data Format

LAMMPS requires:
- **Atoms**: id, type, charge, coordinates
- **Bonds**: id, type, atom1, atom2
- **Angles**: id, type, atom1, atom2, atom3
- **Dihedrals**: id, type, atom1, atom2, atom3, atom4
- **Simulation box**: dimensions

### Force Field File

Separate `.ff` file contains:
- Pair coefficients (LJ parameters)
- Bond coefficients
- Angle coefficients
- Dihedral coefficients

### Usage

In LAMMPS:
```lammps
read_data polymer.data
include polymer.ff
```

Then run MD simulation!

In [None]:
def export_to_lammps(structure: Atomistic, filepath: Path) -> None:
    """Export structure to LAMMPS data format (simplified, non-FF-aware)."""
    frame = structure.to_frame()

    atoms = frame["atoms"]
    n_atoms = atoms.nrows

    # Normalize type fields to pure strings to avoid mixed None/str issues
    for block_name in ("atoms", "bonds", "angles", "dihedrals"):
        if block_name in frame and "type" in frame[block_name]:
            types = frame[block_name]["type"]
            frame[block_name]["type"] = np.array(
                [str(t) if t is not None else "" for t in types],
                dtype=str,
            )

    # Add mol ID if missing
    if "mol" not in atoms:
        atoms["mol"] = np.ones(n_atoms, dtype=int)

    # Add charge if missing -> zero charges
    if "q" not in atoms:
        atoms["q"] = np.zeros(n_atoms, dtype=float)

    writer = LammpsDataWriter(filepath)
    writer.write(frame)

## Step 4: Build the Polymer Chain

### PolymerBuilder

`PolymerBuilder` automates the entire construction process:

**Workflow:**
1. **Monomer creation**: Generate 3D coordinates for monomer
2. **Replication**: Create copies for each repeat unit
3. **Positioning**: Align monomers head-to-tail
4. **Connection**: Form bonds between monomers
5. **Topology**: Generate bonds, angles, dihedrals

### Parameters

- **chain_length=10**: Number of repeat units
  - Adjust based on desired molecular weight
  - Longer chains = more realistic but slower

### Design Choice

**Why automated vs manual?**
- Manual (step-by-step): Educational, full control
- Automated (PolymerBuilder): Production use, consistency

For most applications, use PolymerBuilder for efficiency.

### Monomer Building Helper

This function automates monomer preparation:
1. **Parse BigSMILES** → Extract structure
2. **Generate 3D** → RDKit embedding and optimization
3. **Add hydrogens** → Complete structure
4. **Generate topology** → Bonds, angles, dihedrals
5. **Apply force field** → OPLS typing

This creates a ready-to-use monomer for polymer building.

### LAMMPS Export Helper

Converts MolPy structure to LAMMPS data format:
- **Normalizes types** → String types for compatibility
- **Adds missing fields** → Molecule IDs, charges
- **Writes data file** → LAMMPS-readable format

The output can be directly loaded in LAMMPS for MD simulation.

In [None]:
def main() -> None:
    # ------------------------------------------------------------------
    # Step 1: Load force field
    # ------------------------------------------------------------------
    ff = mp.io.read_xml_forcefield("oplsaa.xml")
    typifier = OplsAtomisticTypifier(ff, strict_typing=False)
    print("✅ Force field loaded")

    # ------------------------------------------------------------------
    # Step 2: Build monomers
    # ------------------------------------------------------------------
    eo2 = build_monomer_from_bigsmiles("{[$]OCCO[$]}", typifier)
    ps = build_monomer_from_bigsmiles("{[$]OCC(c1ccccc1)CO[$]}", typifier)
    # eo3 = build_monomer_from_bigsmiles("{[$]OCC(CO[$])(CO[$])}", typifier)  # Invalid BigSMILES - 3 ports not supported

    print("✅ Monomers built:")
    print(f"   EO2: {len(eo2.atoms)} atoms, ports: {list(get_all_port_info(eo2).keys())}")
    print(f"   PS:  {len(ps.atoms)} atoms, ports: {list(get_all_port_info(ps).keys())}")
    # print(f"   EO3: {len(eo3.atoms)} atoms, ports: {list(get_all_port_info(eo3).keys())}")  # EO3 removed

    library: dict[str, Atomistic] = {"EO2": eo2, "PS": ps}  # Removed EO3 - invalid BigSMILES

    # ------------------------------------------------------------------
    # Step 3: Configure dehydration Reacter with new anchor API
    # ------------------------------------------------------------------
    # NOTE:
    # - select_dehydration_left/right take (struct, port_atom) and return anchor atoms
    # - Reacter only sees explicit port atoms; PolymerBuilder/ReacterConnector
    #   are responsible for finding the correct port atoms via CGSmiles ports.
    dehydration_reacter = Reacter(
        name="dehydration_ether_formation",
        anchor_selector_left=select_dehydration_left,
        anchor_selector_right=select_dehydration_right,
        leaving_selector_left=select_hydroxyl_group,
        leaving_selector_right=select_hydroxyl_h_only,
        bond_former=form_single_bond,
    )

    # Map all monomer pairs to $-$ ports
    port_map: dict[tuple[str, str], tuple[str, str]] = {}
    for left_label in library.keys():
        for right_label in library.keys():
            port_map[(left_label, right_label)] = ("$", "$")

    connector = ReacterConnector(default=dehydration_reacter, port_map=port_map)
    placer = Placer(separator=CovalentSeparator(), orienter=LinearOrienter())
    builder = PolymerBuilder(library=library, connector=connector, placer=placer, typifier=None)

    print("✅ Polymer builder configured")
    print(f"   Library: {list(library.keys())}")

    # ------------------------------------------------------------------
    # Step 4: Build example polymers (linear, ring, branched) and export
    # ------------------------------------------------------------------
    output_dir = Path("case1_output")
    output_dir.mkdir(parents=True, exist_ok=True)

    # Linear chain
    cgsmiles_linear = "{[#EO2]|4[#PS]}"
    print(f"Building: {cgsmiles_linear}")

    build_result = builder.build(cgsmiles_linear)
    chain = build_result.polymer

    print("✅ Built successfully:")
    print(f"   Atoms: {len(chain.atoms)}")
    print(f"   Bonds: {len(chain.bonds)}")
    print(f"   Connection steps: {build_result.total_steps}")

    export_to_lammps(chain, output_dir / "linear.data")
    print(f"   Exported to {output_dir}/linear.data")

    # Cyclic (ring) polymer
    cgsmiles_ring = "{[#EO2]1[#PS][#EO2][#PS][#EO2]1}"
    print(f"Building ring: {cgsmiles_ring}")

    ring_result = builder.build(cgsmiles_ring)
    ring_chain = ring_result.polymer

    print("✅ Ring polymer built:")
    print(f"   Atoms: {len(ring_chain.atoms)}")
    print(f"   Bonds: {len(ring_chain.bonds)}")
    print(f"   Connection steps: {ring_result.total_steps}")

    export_to_lammps(ring_chain, output_dir / "ring.data")
    print(f"   Exported to {output_dir}/ring.data")

    # Branched polymer
    # cgsmiles_branch = "{[#PS][#EO3]([#PS])([#PS])}"  # Requires EO3 which has invalid BigSMILES
    cgsmiles_branch = "{[#PS][#EO2]([#PS])}"  # Simplified branch using EO2
    print(f"Building branch: {cgsmiles_branch}")

    branch_result = builder.build(cgsmiles_branch)
    branch_chain = branch_result.polymer

    print("✅ Branch polymer built:")
    print(f"   Atoms: {len(branch_chain.atoms)}")
    print(f"   Bonds: {len(branch_chain.bonds)}")
    print(f"   Connection steps: {branch_result.total_steps}")

    export_to_lammps(branch_chain, output_dir / "branch.data")
    print(f"   Exported to {output_dir}/branch.data")

In [None]:
if __name__ == "__main__":
    main()

## Summary

We've successfully built a linear polymer chain using automated tools:

1. ✅ Loaded OPLS-AA force field
2. ✅ Parsed BigSMILES notation
3. ✅ Built 10-unit polymer chain automatically
4. ✅ Applied force field parameters
5. ✅ Optimized geometry with LBFGS
6. ✅ Exported to LAMMPS format

### Key Takeaways

- **BigSMILES**: Compact, human-readable polymer notation
- **PolymerBuilder**: Automated construction for production use
- **Typifier**: Automatic force field assignment
- **LBFGS**: Efficient geometry optimization
- **Modular workflow**: Each step independent and reusable

### Comparison with Step-by-Step

| Aspect | Step-by-Step | Automated (This Tutorial) |
|--------|--------------|---------------------------|
| Speed | Slow | Fast |
| Control | Full | High-level |
| Use case | Learning | Production |
| Chain length | Limited | Scalable |
| Consistency | Manual | Automatic |

### Next Steps

- **Polydisperse systems**: Add molecular weight distribution
- **Copolymers**: Mix different monomer types
- **Cross-linking**: Create network structures
- **MD simulation**: Run dynamics in LAMMPS
- **Property calculation**: Compute Rg, Tg, modulus

### Running MD Simulation

Example LAMMPS script:
```lammps
# Load structure
read_data polymer.data
include polymer.ff

# Settings
pair_style lj/cut/coul/long 12.0
pair_modify mix geometric
kspace_style pppm 1e-4

# Equilibration
fix 1 all nvt temp 300 300 100
run 100000
```