# Frame & Block — the data backbone

This tutorial explains MolPy's two primary containers for tabular data: `Block` and `Frame`.
Expect short, runnable examples and friendly tips — perfect for interactive exploration.

## 1. Block — columnar tables

A `Block` is a small table: keys are column names and values are NumPy arrays (or convertible sequences).
Columns must have the same number of rows (axis 0). Blocks are great for coordinates, types, masks, and other per-entity data.

In [1]:
import molpy as mp
from molpy.core.frame import Block
import numpy as np

# Create a Block from Python lists (MolPy converts them to NumPy arrays)
data = {
    'x': [0.0, 1.0, 2.0],
    'y': [0.0, 0.0, 0.0],
    'z': [0.0, 0.0, 0.0],
    'element': ['O', 'H', 'H'],
}
atoms = Block(data)

print('Number of atoms:', atoms.nrows)
print('Columns:', list(atoms.keys()))
print('dtype:', atoms['x'].dtype)

Number of atoms: 3
Columns: ['x', 'y', 'z', 'element']
dtype: float64


### Basic access patterns

- Single column: `atoms['x']` returns a NumPy array.
- Multiple columns: `atoms[['x','y','z']]` returns a 2D array shaped `(nrows, ncols)`.
- Row: `atoms[0]` returns a row-like view (convenient for inspection).
- Slice: `atoms[0:3]` returns a new Block with the selected rows.

Below: quick code examples showing these operations and their behaviour.

In [2]:
# Column access
print('x coords:', atoms['x'])

# Multiple columns → (nrows, ncols)
xyz = atoms[['x','y','z']]
print('Coordinates shape:', xyz.shape)

# Row view and slicing
print('First atom (row):', atoms[0])
subset = atoms[0:2]
print('Subset rows:', subset.nrows)

# Add a new column (vectorized)
atoms['r'] = np.sqrt(atoms['x']**2 + atoms['y']**2 + atoms['z']**2)
print('New column r:', atoms['r'])

x coords: [0. 1. 2.]
Coordinates shape: (3, 3)
First atom (row): Block(x: shape=(), y: shape=(), z: shape=(), element: shape=())
Subset rows: 2
New column r: [0. 1. 2.]


### Copy vs view — watch out!

Some operations return views (modifying them changes the original Block) while others return copies. Generally, column access returns an ndarray view where possible — so in-place arithmetic updates the Block. Slicing returns a new Block. If you need a defensive copy, wrap with `np.copy(...)`.

**Tip:** prefer explicit copies for pipelines that expect immutability.

In [3]:
# Example: in-place modification vs copy
x_view = atoms['x']
x_view += 1.0  # modifies atoms in-place
print('atoms.x after in-place += 1:', atoms['x'])

# Defensive copy
x_copy = atoms['x'].copy()
x_copy += 10.0
print('atoms.x unchanged after modifying copy:', atoms['x'])

atoms.x after in-place += 1: [1. 2. 3.]
atoms.x unchanged after modifying copy: [1. 2. 3.]


## 2. Frame — bundle Blocks + metadata

A `Frame` groups Blocks (e.g. `atoms`, `bonds`) and stores auxiliary `metadata` such as `box`, `time`, or provenance information. Frames are your main analysis object when reading files or iterating trajectories.

In [4]:
frame = mp.Frame()
frame['atoms'] = atoms
frame.metadata['box'] = mp.Box.cubic(20.0)
frame.metadata['time'] = 0.0
print(frame)
print('Available blocks:', list(list(frame._blocks.keys())))
print('Box:', frame.metadata['box'])

Frame(
  [atoms] x: shape=(3,)
  [atoms] y: shape=(3,)
  [atoms] z: shape=(3,)
  [atoms] element: shape=(3,)
  [atoms] r: shape=(3,)
)
Available blocks: ['atoms']
Box: <Orthogonal Box: [20. 20. 20.]>


## Tips & best practices

- Prefer vectorized operations on columns (NumPy) for speed.
- Use explicit copies when you need immutable intermediate steps.
- Keep column dtypes consistent across pipelines to avoid casts.
- Use `Frame` to hold per-frame metadata and multiple Blocks (atoms, bonds, etc.).

That's it — Blocks + Frames give you a compact, fast way to manipulate per-entity tabular data. Happy exploring!