# File I/O

Getting molecular data in and out of MolPy is straightforward. All our readers return a `Frame` (with optional `Box` in metadata), and all writers consume a `Frame` to produce standard file formats.

**Why standardize on Frame:**
- Consistent interface across all formats
- Easy conversion between formats (just read + write)
- Interoperability with external tools (LAMMPS, GROMACS, VMD, etc.)

This guide provides **small, runnable examples** for the most common file formats you'll encounter in molecular simulations.

---

## PDB: Quick Coordinate Exchange

PDB (Protein Data Bank format) is the go-to format for visualization and sharing coordinates. It's human-readable and universally supported.

**When to use PDB:**
- Visualizing structures in VMD, PyMOL, or ChimeraX
- Exchanging coordinates with collaborators
- Quick debugging of structure building

**Limitations:**
- Limited precision (3 decimal places)
- Awkward for large systems (>99,999 atoms)
- Doesn't store force field parameters


In [None]:
import numpy as np
import molpy as mp
from molpy.io import read_pdb, write_pdb

# Build a tiny system
box = mp.Box.cubic(20.0)
frame = mp.Frame()

# Create atoms block with coordinates
# NOTE: PDB reader/writer expect an 'xyz' field (Nx3 array)
atoms = mp.Block()
atoms["xyz"] = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [2.0, 0.0, 0.0]])
atoms["element"] = ["C", "C", "C"]

frame["atoms"] = atoms
frame.metadata["box"] = box

# Write and read back
write_pdb("example.pdb", frame)
loaded = read_pdb("example.pdb")

print(f"Original: {frame['atoms'].nrows} atoms")
print(f"Loaded: {loaded['atoms'].nrows} atoms")
print(f"Coordinates shape: {loaded['atoms']['xyz'].shape}")


### What Gets Stored in PDB Files

**Reading (`read_pdb`):**
- `frame["atoms"]` – populated from `ATOM`/`HETATM` records
  - `xyz` field: (N, 3) coordinate array
  - Additional fields: `element`, `name`, `resName`, `chainID`, etc.
- `frame["bonds"]` – built from `CONECT` records (if present)
- `frame.metadata["box"]` – parsed from `CRYST1` record

**Writing (`write_pdb`):**
- Writes `ATOM`/`HETATM` records from `frame["atoms"]`
- Writes `CRYST1` for the Box
- Writes `CONECT` records from `frame["bonds"]` (if present)

---

## LAMMPS Data Files: Simulation-Ready Structures

LAMMPS data files store not just coordinates, but also topology (bonds, angles, dihedrals) and atom type information needed for simulations.

**When to use LAMMPS data:**
- Preparing systems for LAMMPS molecular dynamics
- Storing force field typed structures
- Systems with complex topology (polymers, biomolecules)

### Writing LAMMPS Data


In [None]:
from molpy.io import write_lammps_data

# Assuming 'frame' has been constructed or loaded previously
# The 'full' atom_style requires: id, mol, type, q, x, y, z
write_lammps_data("system.data", frame, atom_style="full")

print("✓ Wrote LAMMPS data file: system.data")


**What the writer expects:**

- `frame["atoms"]` with columns matching your chosen `atom_style`
  - `atom_style="full"`: needs `id`, `mol`, `type`, `q`, `x`, `y`, `z`
  - `atom_style="atomic"`: needs `id`, `type`, `x`, `y`, `z`
- Optional blocks: `"bonds"`, `"angles"`, `"dihedrals"`, `"impropers"`
- `frame.metadata["box"]` – defines the simulation cell

### Reading LAMMPS Data


In [None]:
from molpy.io import read_lammps_data

frame = read_lammps_data("system.data", atomstyle="full")

print(f"Loaded {frame['atoms'].nrows} atoms")
print(f"Has bonds: {'bonds' in list(frame.blocks())}")
print(f"Box: {frame.metadata.get('box')}")


> **Note:** For  legacy code, there's also `molpy.io.read_lammps(...)` which can load both data files and force field scripts. For new projects, prefer using `read_lammps_data()` and handle force fields separately.

---

## LAMMPS Trajectories: Streaming Large Simulations

Trajectory files can be huge (millions of atoms × thousands of frames). MolPy's trajectory reader is designed for **lazy, memory-efficient** iteration.

**When to use trajectory readers:**
- Analyzing MD simulation output
- Computing time-averaged properties
- Extracting specific frames for visualization


In [None]:
from molpy.io.trajectory.lammps import LammpsTrajectoryReader

reader = LammpsTrajectoryReader("traj.lammpstrj")

# Iterate through frames - only one frame is in memory at a time
for i, frame in enumerate(reader):
    print(f"Frame {i}: {frame['atoms'].nrows} atoms")
    if i >= 9:
        break  # Process first 10 frames only


**Memory efficiency tip:**  
Use this iterator pattern when analyzing long trajectories. Only the current frame is held in memory, so you can process multi-GB files without exhausting RAM.

---

## AMBER & GROMACS Systems

MolPy provides convenience functions for reading complete systems (structure + force field) from AMBER and GROMACS files.

### AMBER (prmtop + inpcrd)

AMBER systems consist of a topology file (`.prmtop`) and a coordinate file (`.inpcrd` or `.crd`).


In [None]:
from molpy.io import read_amber

# Read both topology and coordinates
frame, ff = read_amber("system.prmtop", "system.inpcrd")

print(f"Atoms: {frame['atoms'].nrows}")
print(f"Box: {frame.metadata.get('box')}")
print(f"Force field: {ff}")


### GROMACS (.gro + .top)

GROMACS systems combine structure files (`.gro`) with topology files (`.top`).


In [None]:
from molpy.io import read_gro

# Read structure file
frame = read_gro("conf.gro")

print(f"Atoms: {frame['atoms'].nrows}")
print(f"Box: {frame.metadata.get('box')}")

# Note: For topology, use read_top() separately if needed
# from molpy.io import read_top
# ff = read_top("topol.top")


**What you get back:**

- `frame` – `Frame` with atomic structure and `Box` in metadata
- `ff` – Force field object (atom types, parameters, etc.) you can integrate with MolPy's force field system

---

## Other Supported Formats

MolPy's I/O layer is extensible. Beyond the formats above, we support:

- **XSF** – XCrysDen Structure Format (materials science)
  - `read_xsf()`, `write_xsf()`
- **XYZ** – Simple Cartesian coordinates
  - `read_xyz()`, `write_xyz()`
- **CSV-like** – Custom tabular formats
  - `Block.from_csv()` for importing columnar data
- **Additional trajectories** – See `molpy.io.trajectory.*` for specialized readers

### Common Pattern

All readers follow the same interface:

```python
frame = read_format(filename)  # Returns a Frame
```

All writers follow:

```python
write_format(filename, frame)  # Writes Frame to disk
```

**Finding the right format:**
- For **visualization**: use PDB
- For **LAMMPS simulations**: use LAMMPS data
- For **trajectories**: use format-specific trajectory readers
- For **interoperability**: check what your downstream tool prefers

**Need more details?**  
Check the module documentation under `molpy.io.*` or look at the unit tests under `tests/io/` for concrete examples.
