# Molecular Building

Whether you're assembling polymers or packing solvent boxes, MolPy's building tools give you flexible, composable workflows for molecular construction.

**Design Principles:**

- **Composable** – work with `Atomistic`, `Frame`, and `Box` data structures
- **Engine-agnostic** – export later to LAMMPS, OpenMM, or other simulation engines
- **Script-friendly** – small, chainable functions you can call from Python or CLI

In this guide, you'll learn:

- How to build linear polymers from monomer templates
- How to pack molecules into simulation boxes
- How these pieces fit into a typical simulation workflow

---

## 1. Building Linear Polymers

Polymer building in MolPy combines **monomer templates** with **chemical reaction rules** to assemble polymer chains. This approach gives you full control over polymerization chemistry while automating the tedious work of connecting monomers.

**Key components:**

- `Monomer` – an `Atomistic` structure with reactive **ports** (connection sites)
- `ReacterConnector` – handles port selection and bond formation chemistry
- `linear()` – assembles a polymer from a sequence of monomer labels
- `OplsAtomisticTypifier` (optional) – assigns force field atom types after assembly

### Complete Example: Building a Simple Polymer

Let's build a simple alternating copolymer from two monomer types (A and B). We'll start by creating the monomer structures:


In [None]:
from molpy import Atom, Bond
from molpy.core.wrappers.monomer import Monomer
from molpy.builder.polymer.connectors import ReacterConnector
from molpy.builder.polymer.linear import linear
from molpy.typifier.atomistic import OplsAtomisticTypifier

In [None]:
# Create monomer A: a simple CH2 unit
def create_monomer_A():
    """Create a simple -CH2- monomer with two reactive ports."""
    mono_A = Monomer(name="A")

    # Add atoms
    c = Atom(symbol="C", name="C")
    h1 = Atom(symbol="H", name="H1")
    h2 = Atom(symbol="H", name="H2")

    mono_A.add_entity(c, h1, h2)
    mono_A.add_link(Bond(c, h1), Bond(c, h2))

    # Define ports: the carbon is the reactive site
    # Port "left" and "right" will be used by ReacterConnector
    mono_A.define_port("left", c, role="left")
    mono_A.define_port("right", c, role="right")

    return mono_A


# Create monomer B: another CH2 unit (could be different)
def create_monomer_B():
    """Create another -CH2- monomer (identical for simplicity)."""
    mono_B = Monomer(name="B")

    c = Atom(symbol="C", name="C")
    h1 = Atom(symbol="H", name="H1")
    h2 = Atom(symbol="H", name="H2")

    mono_B.add_entity(c, h1, h2)
    mono_B.add_link(Bond(c, h1), Bond(c, h2))

    mono_B.define_port("left", c, role="left")
    mono_B.define_port("right", c, role="right")

    return mono_B


# Build the monomer library
mono_A = create_monomer_A()
mono_B = create_monomer_B()
library = {"A": mono_A, "B": mono_B}

print(f"Monomer A: {len(list(mono_A.atoms))} atoms, {len(mono_A.ports)} ports")
print(f"Monomer B: {len(list(mono_B.atoms))} atoms, {len(mono_B.ports)} ports")

Now that we have our monomer library, we can assemble a polymer using the `linear()` function:


In [None]:
# Set up the connector and optional typifier
connector = ReacterConnector()  # Handles port selection and bond formation
typifier = OplsAtomisticTypifier()  # Optional: assign OPLS atom types

# Build an alternating ABABAB polymer
poly = linear(
    sequence="ABABAB",  # Monomer sequence
    library=library,
    connector=connector,
    typifier=typifier,
)

print(f"Polymer: {poly}")
print(f"Total atoms: {len(list(poly.atoms))}")
print(f"Total bonds: {len(list(poly.bonds))}")


### How It Works

The `linear()` function automates the polymerization workflow:

1. **Validation** – Checks that all sequence labels exist in your library
2. **Port selection** – Uses the connector to pick compatible ports between monomers
3. **Bond formation** – Creates chemical bonds and removes leaving groups (via the `reacter` module)
4. **Retypification** – Optionally reassigns atom types after all bonds are formed

**When to use this approach:**
- Building linear polymers from repeating units
- Situations where you need explicit control over connection chemistry
- Systems where force field typing depends on local bonding environment

For more complex polymer architectures (branched, star, comb), see the full tutorial: `tutorials/polymer-building.ipynb`.

---

## 2. Packing Molecules into a Box

Once you have a molecule (from building, reactions, or file input), you often need to create a simulation-ready system with many copies distributed in a box. The `molpy.pack` module handles this using external optimization engines like **Packmol**.

**Why packing matters:**
- Solvation: surround your molecule with water or organic solvent
- Bulk systems: create liquids, polymer melts, or mixed-phase systems
- Proper initial configurations prevent simulation artifacts (overlapping atoms, unrealistic densities)

### Complete Example: Packing a Simple Molecule

Let's pack 100 copies of a simple 3-atom molecule into a cubic box:


In [None]:
from pathlib import Path
import molpy as mp
from molpy.pack.constraints import BoxConstraint
from molpy.pack.molpack import Molpack

# 1. Create a single-molecule template as a Frame
#    This represents one copy of the molecule to be packed
frame = mp.Frame()
atoms = mp.Block()
atoms["x"] = [0.0, 1.0, 2.0]
atoms["y"] = [0.0, 0.0, 0.0]
atoms["z"] = [0.0, 0.0, 0.0]
atoms["element"] = ["C", "C", "C"]
frame["atoms"] = atoms

# Define the simulation box (20 Å cube)
frame.metadata["box"] = mp.Box.cubic(20.0)

# 2. Set up the packer
#    workdir: where intermediate files are written
#    packer: which backend to use ("packmol" requires Packmol installation)
workdir = Path("packing")
packer = Molpack(workdir=workdir, packer="packmol")

# 3. Define packing target
#    number: how many copies to pack
#    constraint: where to place them (inside the box)
constraint = BoxConstraint(box=frame.metadata["box"])
packer.add_target(frame, number=100, constraint=constraint)

# 4. Run the packing optimization
#    This calls Packmol to find a non-overlapping configuration
result = packer.optimize(max_steps=1000, seed=42)

print(f"Packed {result['atoms'].nrows} atoms")
print(f"Expected: {100 * 3} atoms (100 molecules × 3 atoms)")
print(f"Box: {result.metadata.get('box')}")


### Understanding the Packing Workflow

Here's what happens under the hood:

1. **Template definition** – You provide one or more molecular `Frame` objects
2. **Target specification** – You define how many copies and where to place them (via constraints)
3. **File generation** – `Molpack` writes intermediate structure files for the packer backend
4. **Optimization** – The backend (Packmol) finds a non-overlapping configuration
5. **Result parsing** – You get back a packed `Frame` ready for simulation or file I/O

**Parameter choices explained:**
- `number=100` – balances simulation size with computational cost. Fewer molecules run faster but may not capture bulk behavior.
- `box_size=20.0` – should be large enough to accommodate all molecules without excessive density. Check the target density for your system.
- `max_steps=1000` – more steps improve packing quality but take longer. Start with 1000 and increase if you see overlaps.

**When to use packing:**
- Creating solvent boxes around a solute
- Building bulk liquid or amorphous systems
- Initializing mixed-component systems (e.g., polymer + plasticizer)

For finer control, you can use the lower-level `Packer` classes directly or write custom packer implementations.

---

## 3. From Building to Simulation

Now that you know how to build and pack, let's see how these pieces fit into a typical molecular simulation workflow:

**Typical Pipeline:**

1. **Build** – Use `builder` (and sometimes `reacter`) to create molecular structures as `Atomistic` objects
2. **Convert** – Transform to `Frame` + `Box` representation (needed for I/O and simulation engines)
3. **Pack** – Use `molpy.pack` to create a simulation-ready multi-molecule system
4. **Assign force field** – Type atoms and assign parameters (see `user-guide/typifier.ipynb`)
5. **Export** – Write LAMMPS/OpenMM/GROMACS input files (see `user-guide/io.ipynb`)

**Data structures at each stage:**

- **Building & chemistry**: `Atomistic` + wrappers (`Monomer`, `Polymer`)  
  *Why*: Graph-based representation makes bond manipulation and reaction tracking easy

- **Spatial packing**: `Frame` + `Box`  
  *Why*: Array-based structure is efficient for geometric operations and file I/O

- **Simulation**: `Frame` + force field + engine-specific writers  
  *Why*: Simulation engines expect structured text/binary formats with coordinates and force field parameters

This separation of concerns keeps your code modular and makes it easy to swap components (different builders, packers, or export formats) without rewriting everything.
