add_arrays_batch rejects variable-size atom_properties due to per-atom schema slot_bytes

## Summary

`Database.add_arrays_batch(...)` rejects valid variable-size molecules when custom `atom_properties` are appended across different `n_atoms` values.

The failure happens even when each individual atom property has the correct per-structure shape, e.g. `(natoms, 3)` for a vector per-atom property. The first write locks the schema using the full per-record payload size (`natoms * elem_bytes`) instead of the per-atom slot size (`elem_bytes`), so the next molecule with a different `natoms` fails schema validation.

This affects at least `atompack-db==0.3.0`.

## Minimal reproduction

```python
import tempfile
from pathlib import Path

import atompack
import numpy as np

work = Path(tempfile.mkdtemp())
out = atompack.Database(str(work / "merged.atp"), overwrite=True)

for natoms in [20, 29]:
    positions = np.zeros((1, natoms, 3), dtype=np.float32)
    atomic_numbers = np.ones((1, natoms), dtype=np.uint8)
    energy = np.array([0.0], dtype=np.float64)
    forces = np.zeros((1, natoms, 3), dtype=np.float32)

    out.add_arrays_batch(
        positions,
        atomic_numbers,
        energy=energy,
        forces=forces,
        atom_properties={
            "teacher_forces": np.zeros((1, natoms, 3), dtype=np.float32),
            "hidden_scalar": np.zeros((1, natoms), dtype=np.float32),
        },
    )

out.flush()
```

## Actual result

The second append fails:

```text
ValueError: Invalid data: Schema mismatch for section 'teacher_forces': expected SchemaEntry { type_tag: 4, per_atom: true, elem_bytes: 12, slot_bytes: 240 }, got SchemaEntry { type_tag: 4, per_atom: true, elem_bytes: 12, slot_bytes: 348 }
```

The values correspond to:

- `240 = 20 * 3 * sizeof(float32)`
- `348 = 29 * 3 * sizeof(float32)`

## Expected result

This should succeed. `teacher_forces` is a per-atom vector property with shape `(natoms, 3)` for each molecule. Its schema should be independent of molecule size:

```text
per_atom: true, elem_bytes: 12, slot_bytes: 12
```

Similarly, scalar per-atom properties shaped `(natoms,)` should use `slot_bytes = sizeof(dtype)`, not `natoms * sizeof(dtype)`.

## Diagnosis

The issue appears to be in the Python/Rust batch path, not in SOA decoding itself.

In `atompack-py/src/database_batch.rs`, `extract_vec3_column()` stores the per-record payload size in `BatchSectionColumn.slot_bytes`:

```rust
slot_bytes: expected_rows * 3 * std::mem::size_of::<T>(),
```

Then `schema_section_from_column()` passes that same value into the database schema:

```rust
fn schema_section_from_column(column: &BatchSectionColumn) -> DatabaseSchemaSection {
    schema_section(column.kind, &column.key, column.type_tag, column.slot_bytes)
}
```

For `KIND_ATOM_PROP`, this means schema `slot_bytes` becomes molecule-size dependent. The first molecule locks e.g. `240`; the second molecule with a different `natoms` produces e.g. `348`; schema merge rejects it.

Built-in `forces` do not hit this because the fast path explicitly uses `12` for `TYPE_VEC3_F32`:

```rust
schema_sections.push(schema_section(KIND_BUILTIN, "forces", TYPE_VEC3_F32, 12));
```

The raw SOA/schema parsing path also seems to compute the desired per-atom schema correctly: per-atom arrays and vec3 fields use `elem_bytes` as schema `slot_bytes`.

## Suggested fix direction

Separate the two concepts currently represented by `BatchSectionColumn.slot_bytes`:

1. per-record payload stride used for slicing batch buffers, e.g. `natoms * 3 * sizeof(T)`
2. schema slot bytes used for schema locking, e.g. `3 * sizeof(T)` for per-atom vec3 and `sizeof(T)` for per-atom scalar arrays

For `KIND_ATOM_PROP`, `schema_section_from_column()` should likely emit schema `slot_bytes = type_tag_elem_bytes(type_tag)` for numeric per-atom fields, while preserving the current per-record payload stride for slicing.

## Context

This came up while trying to merge AtomPack shards containing variable-size atomistic structures with cached model outputs:

- built-in DFT `energy` and `forces`
- graph-level teacher energy as molecule metadata
- teacher force predictions as atom property `(natoms, 3)`
- hidden representations as atom properties `(natoms,)` or vector channels

Writing same-`natoms` shards works. Merging/appending across different `natoms` fails on the custom atom-property schema lock.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_arrays_batch rejects variable-size atom_properties due to per-atom schema slot_bytes #33

Summary

Minimal reproduction

Actual result

Expected result

Diagnosis

Suggested fix direction

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

add_arrays_batch rejects variable-size atom_properties due to per-atom schema slot_bytes #33

Description

Summary

Minimal reproduction

Actual result

Expected result

Diagnosis

Suggested fix direction

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions