# Core Concepts in molpy

In this section, we explore the foundational concepts that underpin the molpy library. Understanding these core ideas will help you effectively utilize and extend molpy for your molecular modeling needs.

## 1. Entity: The Fundamental Building Block

**`Entity`** is the base class for all structural elements in molpy. It's essentially a dictionary with identity-based hashing.

### Key Characteristics:

- Inherits from `UserDict` - behaves like a dictionary
- Identity-based hashing - each entity is unique by identity, not content
- Minimal design - no IDs, no persistence, no global context
- Stores arbitrary key-value pairs in `.data`

### When to Subclass:

Create custom entities when you need domain-specific structural elements:
- Atoms, residues, molecules, crystals
- Coarse-grained beads
- Virtual sites, dummy atoms

In [1]:
import molpy as mp
import numpy as np

# Generic entity
generic = mp.Entity()
generic['property'] = 'value'
print(f"Generic entity: {generic}")

# Atom entity (subclass with special repr)
atom = mp.Atom(symbol='C', type='CA', xyz=[0.0, 0.0, 0.0], charge=-0.2)
print(f"Atom: {atom}")
print(f"Atom symbol: {atom['symbol']}")
print(f"Atom position: {atom['xyz']}")

# Identity-based hashing
atom2 = mp.Atom(symbol='C', type='CA', xyz=[0.0, 0.0, 0.0], charge=-0.2)
print(f"\nSame content, different identity: {atom is atom2}")
print(f"Can be used as dict keys: {hash(atom) != hash(atom2)}")

Generic entity: {'property': 'value'}
Atom: <Atom: C>
Atom symbol: C
Atom position: [0.0, 0.0, 0.0]

Same content, different identity: False
Can be used as dict keys: True


### Creating Custom Entities

Example: A coarse-grained bead entity

In [2]:
# Custom entity example
class Bead(mp.Entity):
    """Coarse-grained bead for polymer simulations"""
    def __repr__(self) -> str:
        bead_type = self.get('type', 'unknown')
        return f"<Bead: {bead_type}>"

# Use it
bead = Bead(type='A', xyz=[1.0, 2.0, 3.0], mass=72.0)
print(bead)
print(f"Bead type: {bead['type']}")
print(f"Bead mass: {bead['mass']}")

<Bead: A>
Bead type: A
Bead mass: 72.0


## 2. Link: Connecting Entities

**`Link`** represents connectivity between entities. It holds direct references to endpoint entities.

### Key Characteristics:

- Generic container for N endpoints (2 for bonds, 3 for angles, etc.)
- Also inherits from `UserDict` for arbitrary attributes
- Identity-based hashing
- Endpoints stored in `.endpoints` tuple

### When to Subclass:

Create custom links for specific connectivity types:
- Bonds, angles, dihedrals
- Hydrogen bonds
- Constraints, restraints

In [3]:
# Create atoms
a1 = mp.Atom(symbol='C', xyz=[0.0, 0.0, 0.0])
a2 = mp.Atom(symbol='C', xyz=[1.5, 0.0, 0.0])
a3 = mp.Atom(symbol='O', xyz=[2.2, 1.2, 0.0])
a4 = mp.Atom(symbol='H', xyz=[3.0, 1.5, 0.5])

# Bond: 2 endpoints
bond = mp.Bond(a1, a2, type='C-C', order=1)
print(f"Bond: {bond}")
print(f"  Endpoints: {bond.endpoints}")
print(f"  i-atom: {bond.itom}, j-atom: {bond.jtom}")
print(f"  Bond type: {bond['type']}")

# Angle: 3 endpoints
angle = mp.Angle(a1, a2, a3, type='C-C-O')
print(f"\nAngle: {angle}")
print(f"  Endpoints: {angle.endpoints}")

# Dihedral: 4 endpoints
dihedral = mp.Dihedral(a1, a2, a3, a4, type='C-C-O-H')
print(f"\nDihedral: {dihedral}")
print(f"  Endpoints: {dihedral.endpoints}")

Bond: <Bond: <Atom: C> - <Atom: C>>
  Endpoints: (<Atom: C>, <Atom: C>)
  i-atom: <Atom: C>, j-atom: <Atom: C>
  Bond type: C-C

Angle: <Angle: <Atom: C> - <Atom: C> - <Atom: O>>
  Endpoints: (<Atom: C>, <Atom: C>, <Atom: O>)

Dihedral: <Dihedral: <Atom: C> - <Atom: C> - <Atom: O> - <Atom: H>>
  Endpoints: (<Atom: C>, <Atom: C>, <Atom: O>, <Atom: H>)


### Creating Custom Links

In [4]:
# Custom link: Hydrogen bond
class HydrogenBond(mp.Link):
    """Hydrogen bond between donor and acceptor"""
    def __init__(self, donor: mp.Atom, acceptor: mp.Atom, /, **attrs):
        super().__init__([donor, acceptor], **attrs)
    
    @property
    def donor(self) -> mp.Atom:
        return self.endpoints[0]
    
    @property
    def acceptor(self) -> mp.Atom:
        return self.endpoints[1]
    
    def __repr__(self) -> str:
        return f"<HBond: {self.donor} ... {self.acceptor}>"

# Use it
donor = mp.Atom(symbol='O', type='O_w', xyz=[0.0, 0.0, 0.0])
acceptor = mp.Atom(symbol='O', type='O_w', xyz=[2.8, 0.0, 0.0])
hbond = HydrogenBond(donor, acceptor, energy=-5.2)
print(hbond)
print(f"Energy: {hbond['energy']} kcal/mol")

<HBond: <Atom: O> ... <Atom: O>>
Energy: -5.2 kcal/mol


## 3. Struct: The Container

**`Struct`** is the root container class that organizes entities and links into type-specific buckets.

### Key Characteristics:

- `entities`: TypeBucket for organizing entities by type
- `links`: TypeBucket for organizing links by type
- `_props`: Dictionary for struct-level properties
- Supports dict-like access to props via `[]`

### TypeBucket:

A specialized container that:
- Groups objects by their concrete type
- Supports queries by type (exact or including subclasses)
- Returns `Entities[T]` for column-style access

### When to Subclass:

Create custom Struct subclasses for:
- Different molecular representations (atomistic, coarse-grained)
- Domain-specific structures (proteins, polymers, crystals)
- Systems with special entity/link types

In [5]:
# Create a basic struct
struct = mp.Struct(name='my_structure', temperature=300)

# Add entities
atom1 = mp.Atom(symbol='C', xyz=[0.0, 0.0, 0.0])
atom2 = mp.Atom(symbol='N', xyz=[1.5, 0.0, 0.0])
bead1 = Bead(type='A', xyz=[3.0, 0.0, 0.0])

struct.entities.add(atom1)
struct.entities.add(atom2)
struct.entities.add(bead1)

# Add link
bond = mp.Bond(atom1, atom2, type='C-N')
struct.links.add(bond)

# Query by type
print(f"All atoms: {struct.entities[mp.Atom]}")
print(f"All beads: {struct.entities[Bead]}")
print(f"All entities: {struct.entities.all()}")
print(f"All bonds: {struct.links[mp.Bond]}")

# Access properties
print(f"\nStruct properties:")
print(f"  Name: {struct['name']}")
print(f"  Temperature: {struct['temperature']}")

# Total counts
print(f"\nTotal entities: {len(struct.entities)}")
print(f"Total links: {len(struct.links)}")

All atoms: [<Atom: C>, <Atom: N>]
All beads: [<Bead: A>]
All entities: [<Atom: C>, <Atom: N>, <Bead: A>]
All bonds: [<Bond: <Atom: C> - <Atom: N>>]

Struct properties:
  Name: my_structure
  Temperature: 300

Total entities: 3
Total links: 1


### Column-Style Access with Entities

The `Entities` container (returned by TypeBucket) supports column-style access:

In [6]:
# Column-style access
atoms = struct.entities[mp.Atom]
print(f"Atoms: {atoms}")

# Extract column data
symbols = atoms['symbol']
positions = atoms['xyz']

print(f"\nSymbols: {symbols}")
print(f"Positions shape: {positions.shape if hasattr(positions, 'shape') else len(positions)}")
print(f"Positions:\n{positions}")

Atoms: [<Atom: C>, <Atom: N>]

Symbols: ['C' 'N']
Positions shape: (2, 3)
Positions:
[[0.  0.  0. ]
 [1.5 0.  0. ]]


### Struct Operations: Copy and Merge

In [7]:
# Copy: Deep copy with entity/link remapping
struct_copy = struct.copy()
print(f"Original entities: {len(struct.entities)}")
print(f"Copied entities: {len(struct_copy.entities)}")
print(f"Different identity: {struct is not struct_copy}")

# Merge: Transfer without copying
struct2 = mp.Struct()
atom3 = mp.Atom(symbol='O', xyz=[4.5, 0.0, 0.0])
struct2.entities.add(atom3)

print(f"\nBefore merge: {len(struct.entities)} entities")
struct.merge(struct2)  # Transfers struct2 into struct
print(f"After merge: {len(struct.entities)} entities")
# Note: struct2 should not be used after merge!

Original entities: 3
Copied entities: 3
Different identity: True

Before merge: 3 entities
After merge: 4 entities


## 4. Mixins: Composable Functionality

molpy provides mixins that add specific capabilities to Struct subclasses:

### Available Mixins:

1. **`SpatialMixin`** - Geometric operations (move, rotate, scale, align)
2. **`MembershipMixin`** - CRUD operations for entities and links
3. **`ConnectivityMixin`** - Connectivity queries (get neighbors)

These mixins follow the composition pattern - mix and match what you need.

### SpatialMixin: Geometric Operations

In [8]:
# Example with Atomistic (which includes SpatialMixin)
    
system = mp.Atomistic()
a1 = system.def_atom(symbol='C', x=0.0, y=0.0, z=0.0)
a2 = system.def_atom(symbol='C', x=1.0, y=0.0, z=0.0)
a3 = system.def_atom(symbol='C', x=2.0, y=0.0, z=0.0)

print("Original positions:")
print(system.xyz)

# Move all atoms
system.move([0.0, 5.0, 0.0], entity_type=mp.Atom)
print("\nAfter move [0, 5, 0]:")
print(system.xyz)

# Rotate around z-axis by 90 degrees
import math
system.rotate(axis=[0, 0, 1], angle=math.pi/2, entity_type=mp.Atom)
print("\nAfter 90° rotation around z:")
print(system.xyz)

Original positions:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

After move [0, 5, 0]:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

After 90° rotation around z:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


### MembershipMixin: Entity/Link Management

In [9]:
system = mp.Atomistic()

# Create atoms
atoms = [system.def_atom(symbol='C', xyz=[i*1.5, 0, 0]) for i in range(4)]

# Create bonds using def_bond (uses MembershipMixin internally)
bond1 = system.def_bond(atoms[0], atoms[1], type='single')
bond2 = system.def_bond(atoms[1], atoms[2], type='double')
bond3 = system.def_bond(atoms[2], atoms[3], type='single')

print(f"Initial: {len(system.atoms)} atoms, {len(system.bonds)} bonds")

# Remove an atom and its incident bonds
system.remove_entity(atoms[1], drop_incident_links=True)
print(f"After removing middle atom: {len(system.atoms)} atoms, {len(system.bonds)} bonds")

Initial: 4 atoms, 3 bonds
After removing middle atom: 3 atoms, 1 bonds


### ConnectivityMixin: Neighbor Queries

In [10]:
system = mp.Atomistic()

# Create a simple chain: C-C-C-O
c1 = system.def_atom(symbol='C', x=0, y=0, z=0)
c2 = system.def_atom(symbol='C', x=1.5, y=0, z=0)
c3 = system.def_atom(symbol='C', x=3.0, y=0, z=0)
o1 = system.def_atom(symbol='O', x=4.5, y=0, z=0)

system.def_bond(c1, c2)
system.def_bond(c2, c3)
system.def_bond(c3, o1)

# Find neighbors
neighbors_c2 = system.get_neighbors(c2, link_type=mp.Bond)
print(f"Neighbors of C2: {neighbors_c2}")
print(f"Number of neighbors: {len(neighbors_c2)}")

neighbors_o1 = system.get_neighbors(o1, link_type=mp.Bond)
print(f"\nNeighbors of O1: {neighbors_o1}")
print(f"Number of neighbors: {len(neighbors_o1)}")

Neighbors of C2: [<Atom: C>, <Atom: C>]
Number of neighbors: 2

Neighbors of O1: [<Atom: C>]
Number of neighbors: 1


## 5. Atomistic: Specialized Struct

**`Atomistic`** is a specialized Struct for atomistic molecular systems.

### Key Features:

- Inherits from `Struct` + all three mixins
- Pre-registered entity types: `Atom`
- Pre-registered link types: `Bond`, `Angle`, `Dihedral`
- Convenience properties: `.atoms`, `.bonds`, `.angles`, `.dihedrals`
- Convenience methods: `.def_atom()`, `.def_bond()`, `.def_angle()`, `.def_dihedral()`
- Properties: `.symbols`, `.xyz`, `.positions`

This is the most commonly used class for atomistic simulations.

In [11]:
# Create an atomistic system
mol = mp.Atomistic(name='ethanol')

# Add atoms using def_atom
c1 = mol.def_atom(symbol='C', type='CT', x=0.0, y=0.0, z=0.0, charge=-0.18)
c2 = mol.def_atom(symbol='C', type='CT', x=1.5, y=0.0, z=0.0, charge=0.15)
o1 = mol.def_atom(symbol='O', type='OH', x=2.2, y=1.2, z=0.0, charge=-0.68)
h1 = mol.def_atom(symbol='H', type='HC', x=-0.5, y=-0.9, z=0.0, charge=0.06)
h2 = mol.def_atom(symbol='H', type='HC', x=-0.5, y=0.9, z=0.0, charge=0.06)
h3 = mol.def_atom(symbol='H', type='HC', x=2.0, y=-0.9, z=0.0, charge=0.06)
h4 = mol.def_atom(symbol='H', type='HO', x=3.2, y=1.2, z=0.0, charge=0.42)

# Add bonds
mol.def_bond(c1, c2, type='C-C')
mol.def_bond(c2, o1, type='C-O')
mol.def_bond(c1, h1, type='C-H')
mol.def_bond(c1, h2, type='C-H')
mol.def_bond(c2, h3, type='C-H')
mol.def_bond(o1, h4, type='O-H')

# Add an angle
mol.def_angle(c1, c2, o1, type='C-C-O')

# Add a dihedral
mol.def_dihedral(h1, c1, c2, o1, type='H-C-C-O')

print(mol)
print(f"\nAtoms: {len(mol.atoms)}")
print(f"Bonds: {len(mol.bonds)}")
print(f"Angles: {len(mol.angles)}")
print(f"Dihedrals: {len(mol.dihedrals)}")
print(f"\nSymbols: {mol.symbols}")
print(f"\nPositions:\n{mol.xyz}")

<Atomistic, 7 atoms (C:2 H:4 O:1), 6 bonds>

Atoms: 7
Bonds: 6
Angles: 1
Dihedrals: 1

Symbols: ['C', 'C', 'O', 'H', 'H', 'H', 'H']

Positions:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


### Creating Custom Atomistic Subclasses

In [12]:
# Example: Protein-specific atomistic system
class Residue(mp.Entity):
    """Protein residue entity"""
    def __repr__(self) -> str:
        name = self.get('resname', 'UNK')
        resid = self.get('resid', '?')
        return f"<Residue: {name}{resid}>"

class Protein(mp.Atomistic):
    """Atomistic system specialized for proteins"""
    def __init__(self, **props):
        super().__init__(**props)
        # Register residue type
        self.entities.register_type(Residue)
    
    @property
    def residues(self):
        return self.entities[Residue]
    
    def def_residue(self, **attrs) -> Residue:
        """Create and add a residue"""
        res = Residue(**attrs)
        self.entities.add(res)
        return res

# Use it
protein = Protein(name='myprotein')
res1 = protein.def_residue(resname='ALA', resid=1, chain='A')
res2 = protein.def_residue(resname='GLY', resid=2, chain='A')

ca1 = protein.def_atom(symbol='C', type='CA', x=0, y=0, z=0)
ca2 = protein.def_atom(symbol='C', type='CA', x=3.8, y=0, z=0)

print(f"Protein: {len(protein.residues)} residues, {len(protein.atoms)} atoms")
print(f"Residues: {protein.residues}")

Protein: 2 residues, 2 atoms
Residues: [<Residue: ALA1>, <Residue: GLY2>]


## 6. Frame: Blocks with metadata

**`Frame`** represents a snapshot of a molecular system with:

- **`Block`**: Column-oriented data (positions, velocities, forces, etc.)

### Key Characteristics:

- **Block**: Like a DataFrame, maps variable names to numpy arrays
- Optimized for trajectory analysis and I/O
- Supports advanced indexing (by key, by index, by mask, by selector)

### When to Use:

- Reading/writing trajectory files
- Analyzing simulation data
- Working with large systems where performance matters

In [13]:
from molpy.core.frame import Frame, Block

# Create a Block
block = Block({
    'id': np.arange(5),
    'type': np.array([1, 1, 2, 2, 3]),
    'x': np.array([0.0, 1.5, 3.0, 4.5, 6.0]),
    'y': np.array([0.0, 0.0, 0.0, 0.0, 0.0]),
    'z': np.array([0.0, 0.0, 0.0, 0.0, 0.0]),
    'mol': np.array([1, 1, 1, 2, 2])
})

print(f"Block with {block.nrows} rows, {len(block)} columns")
print(f"Columns: {list(block.keys())}")

# Column access
print(f"\nX coordinates: {block['x']}")

# Row access
print(f"\nFirst row: {block[0]}")

# Slice access
print(f"\nFirst 3 rows: {block[0:3]}")

# Boolean mask
mask = block['mol'] == 1
mol1_block = block[mask]
print(f"\nMolecule 1: {mol1_block.nrows} atoms")
print(f"  X: {mol1_block['x']}")

Block with 5 rows, 6 columns
Columns: ['id', 'type', 'x', 'y', 'z', 'mol']

X coordinates: [0.  1.5 3.  4.5 6. ]

First row: Block(id: shape=(), type: shape=(), x: shape=(), y: shape=(), z: shape=(), mol: shape=())

First 3 rows: Block(id: shape=(3,), type: shape=(3,), x: shape=(3,), y: shape=(3,), z: shape=(3,), mol: shape=(3,))

Molecule 1: 3 atoms
  X: [0.  1.5 3. ]


### Topology: Graph-Based Connectivity

In [14]:
from molpy.core.topology import Topology

# Create topology
top = Topology(5)  # 5 atoms

# Add bonds
top.add_bond(0, 1)
top.add_bond(1, 2)
top.add_bond(2, 3)
top.add_bond(3, 4)

print(f"Topology: {top.n_atoms} atoms, {top.n_bonds} bonds")
print(f"Bonds: {top.bonds}")

# Angles are auto-detected
print(f"\nAngles: {top.n_angles}")
print(f"Angle list:\n{top.angles}")

# Dihedrals too
print(f"\nDihedrals: {top.n_dihedrals}")
print(f"Dihedral list:\n{top.dihedrals}")

Topology: 5 atoms, 4 bonds
Bonds: [[0 1]
 [1 2]
 [2 3]
 [3 4]]

Angles: 3
Angle list:
[[0 1 2]
 [1 2 3]
 [2 3 4]]

Dihedrals: 2
Dihedral list:
[[0 1 2 3]
 [1 2 3 4]]


## 8. Wrapper: Composition Pattern

**`Wrapper`** is an advanced pattern for adding behavior to Struct objects through composition rather than inheritance.

### Key Characteristics:

- Composition-based: holds an `inner` Struct
- Semi-transparent: explicitly forwards selected APIs
- Type-safe: uses generics to preserve inner type information
- Chainable: supports `.unwrap()` for deeply nested wrappers

### When to Use:

- Add temporary behavior without modifying base classes
- Implement design patterns (decorator, adapter, etc.)
- Create composable transformations

**Note**: For most use cases, subclassing (like Monomer/Polymer) is simpler than Wrapper.

In [15]:
# Example: Custom wrapper that adds labeling
class LabeledStructure(mp.Wrapper[mp.Atomistic]):
    """Wrapper that adds labeling functionality"""
    
    def __init__(self, inner: mp.Atomistic, label: str = '', **props):
        super().__init__(inner, **props)
        self.label = label
        self._atom_labels: dict[mp.Atom, str] = {}
    
    def label_atom(self, atom: mp.Atom, label: str) -> None:
        self._atom_labels[atom] = label
    
    def get_atom_label(self, atom: mp.Atom) -> str | None:
        return self._atom_labels.get(atom)
    
    def labeled_atoms(self) -> list[tuple[mp.Atom, str]]:
        return list(self._atom_labels.items())

# Use it
mol = mp.Atomistic()
ca = mol.def_atom(symbol='C', type='CA', x=0, y=0, z=0)
cb = mol.def_atom(symbol='C', type='CB', x=1.5, y=0, z=0)

labeled = LabeledStructure(mol, label='important')
labeled.label_atom(ca, 'alpha-carbon')
labeled.label_atom(cb, 'beta-carbon')

print(f"Structure label: {labeled.label}")
print(f"Labeled atoms:")
for atom, label in labeled.labeled_atoms():
    print(f"  {atom}: {label}")

# Access inner object
print(f"\nInner type: {type(labeled.inner)}")
print(f"Unwrapped: {labeled.unwrap()}")

Structure label: important
Labeled atoms:
  <Atom: C>: alpha-carbon
  <Atom: C>: beta-carbon

Inner type: <class 'molpy.core.atomistic.Atomistic'>
Unwrapped: <Atomistic, 2 atoms (C:2), 0 bonds>


## 9. Other Base Classes

molpy provides several other base classes for different purposes:

### Compute: Computation Operations

Base class for defining computational operations with hooks and context.

In [16]:
from dataclasses import dataclass
from molpy.compute import Compute, Result

@dataclass
class CountResult(Result):
    n_atoms: int
    n_bonds: int

# Define a compute operation
class CountCompute(Compute[mp.Atomistic, CountResult]):
    """Count atoms and bonds in a structure"""
    
    def compute(self, input: mp.Atomistic) -> CountResult:
        return CountResult(
            n_atoms=len(input.atoms),
            n_bonds=len(input.bonds)
        )

# Use it
mol = mp.Atomistic()
mol.def_atom(symbol='C', x=0, y=0, z=0)
mol.def_atom(symbol='C', x=1.5, y=0, z=0)
mol.def_bond(mol.atoms[0], mol.atoms[1])

compute = CountCompute()
result = compute(mol)
print(f"Result: {result.n_atoms} atoms, {result.n_bonds} bonds")

Result: 2 atoms, 1 bonds


## Next Steps

Now that you understand the core abstractions:

1. **Explore tutorials** - See these concepts in action
2. **Read API docs** - Detailed documentation for each class
3. **Try user guides** - Learn module-specific features
4. **Extend molpy** - Create your own custom classes

Key resources:
- [Tutorials](../tutorials/index.md)
- [API Reference](../api/index.md)
- [User Guide](../user-guide/index.md)
- [Developer Guide](../developer/index.md)