# File I/O

Access chemical files with molpy is straightforward. There several types of chemfiles. The files only contains one frame of simulation data (atoms, box, maybe little bit of forcefield parameters) we call them **data**. The files with multiple frames are **trajectories**. Also, some files contain **Forcefield** parameters, **logs** data.

The snapshot of a simulation read by our reader returns a **Frame** object. A Frame contains all the information about the system: atoms, tpology, box, timestep.

**Why standardize on Frame:**
- Consistent interface across all formats
- Easy conversion between formats (just read + write)
- Interoperability with external tools (LAMMPS, GROMACS, VMD, etc.)

This guide provides examples for the most common file formats you'll encounter in molecular simulations.

---

## Data file: let's start with PDB

PDB (Protein Data Bank format) is the go-to format for visualization and sharing coordinates. It's human-readable and universally supported.


In [None]:
import numpy as np
import molpy as mp
from molpy.io import read_pdb, write_pdb

# Build a tiny system
box = mp.Box.cubic(20.0)
frame = mp.Frame()

# Create atoms block with coordinates
# NOTE: PDB reader/writer expect an 'xyz' field (Nx3 array)
atoms = mp.Block()
atoms["xyz"] = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [2.0, 0.0, 0.0]])
atoms["element"] = ["C", "C", "C"]

frame["atoms"] = atoms
frame.metadata["box"] = box

frame["bonds"] = mp.Block({
    "atom_i": [0, 1],
    "atom_j": [1, 2],
})

# Write and read back
write_pdb("example.pdb", frame)
loaded = read_pdb("example.pdb")

print(f"Original: {frame['atoms'].nrows} atoms")
print(f"Loaded: {loaded['atoms'].nrows} atoms")
print(f"Coordinates shape: {loaded['atoms']['xyz'].shape}")

## Trajectory file: LAMMPS Dump Files

LAMMPS dump files are commonly used to store a massive number of simulation frames. To facilitate efficient reading and writing of these large files, molpy establishes a indexing file when first writing a trajectory. This index file enables random access to any frame within the trajectory without the need to load the entire file into memory.

```python
from molpy.io.trajectory.lammps import LammpsTrajectoryReader

reader = LammpsTrajectoryReader("traj.lammpstrj")

# Iterate through frames - only one frame is in memory at a time
for i, frame in enumerate(reader):
    print(f"Frame {i}: {frame['atoms'].nrows} atoms")
    if i >= 9:
        break  # Process first 10 frames only
```


**Memory efficiency tip:**  
Use this iterator pattern when analyzing long trajectories. Only the current frame is held in memory, so you can process multi-GB files without exhausting RAM.

---

## Forcefield file: XML Forcefield Files and more

Molpy has a flexible forcefield class for storing forcefield parameters, and it aims for a general and abstract representation of forcefields. Molpy supports reading and writing forcefield files in various formats, including XML-based forcefield files from OpenMM, and parameters in LAMMPS data files.

```python
from molpy.io import read_xml_forcefield
ff = read_xml_forcefield("oplsaa.xml")
print(ff)

from molpy.io import write_lammps_forcefield
write_lammps_forcefield("oplsaa.data", ff)
```

Sometime chemfiles contain both structure and forcefield parameters, such as LAMMPS data files and ambertools files. Molpy can read and write these files as well.

```python
from molpy.io import read_lammps_data
frame, ff = read_lammps_data("system.data")
print(frame)
print(ff)
```