# Extract and Reconstruct: Scientific Simulation Example

This notebook demonstrates `Packable.extract()` and `reconstruct()` with a realistic scientific computing scenario:

- A CFD simulation with mesh geometry and field data
- Nested Pydantic classes containing Packables (Mesh)
- Content-addressable storage for deduplication

In [1]:
import numpy as np
from typing import Optional, Dict, List
from pydantic import BaseModel, Field, ConfigDict
from meshly import Mesh, Packable

## 1. Define Scientific Data Structures

We'll model a CFD simulation with:
- `FieldData`: Scalar/vector field on the mesh (temperature, velocity, etc.)
- `SimulationSnapshot`: A single timestep with mesh + fields
- `SimulationCase`: Complete case with metadata and multiple snapshots

In [2]:
class FieldData(BaseModel):
    """A field defined on mesh nodes or cells."""
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    name: str = Field(..., description="Field name (e.g., 'temperature', 'velocity')")
    field_type: str = Field(..., description="'scalar', 'vector', or 'tensor'")
    location: str = Field("node", description="'node' or 'cell' centered")
    data: np.ndarray = Field(..., description="Field values")
    units: Optional[str] = Field(None, description="Physical units")


class SimulationSnapshot(BaseModel):
    """A single timestep of simulation data.
    
    Note: This is a regular Pydantic BaseModel (not Packable) that contains
    a Mesh (which IS a Packable). This tests the nested Packable extraction.
    """
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    time: float = Field(..., description="Simulation time")
    iteration: int = Field(..., description="Iteration number")
    mesh: Mesh = Field(..., description="Computational mesh")
    fields: Dict[str, FieldData] = Field(default_factory=dict, description="Field data")
    residuals: Optional[np.ndarray] = Field(None, description="Solver residuals")


class SimulationCase(BaseModel):
    """Complete simulation case with multiple snapshots."""
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    name: str = Field(..., description="Case name")
    description: str = Field("", description="Case description")
    solver: str = Field(..., description="Solver name")
    parameters: Dict[str, float] = Field(default_factory=dict, description="Solver parameters")
    snapshots: List[SimulationSnapshot] = Field(default_factory=list, description="Time snapshots")

print("Data structures defined")

Data structures defined


## 2. Create Sample Simulation Data

Let's create a simple 2D heat transfer simulation on a quad mesh.

In [3]:
# Create a simple 2D quad mesh (5x5 grid = 25 nodes, 16 quads)
nx, ny = 5, 5
x = np.linspace(0, 1, nx)
y = np.linspace(0, 1, ny)
xx, yy = np.meshgrid(x, y)

vertices = np.column_stack([xx.ravel(), yy.ravel(), np.zeros(nx * ny)]).astype(np.float32)

# Create quad indices
quads = []
for j in range(ny - 1):
    for i in range(nx - 1):
        n0 = j * nx + i
        n1 = n0 + 1
        n2 = n0 + nx + 1
        n3 = n0 + nx
        quads.append([n0, n1, n2, n3])

indices = np.array(quads, dtype=np.uint32)

mesh = Mesh(vertices=vertices, indices=indices)
print(f"Created mesh: {mesh.vertex_count} vertices, {len(indices)} quads")

Created mesh: 25 vertices, 16 quads


In [4]:
# Create simulation snapshots at different times
def create_snapshot(time: float, iteration: int, mesh: Mesh) -> SimulationSnapshot:
    """Create a snapshot with temperature and velocity fields."""
    n_nodes = mesh.vertex_count
    coords = mesh.vertices[:, :2]  # x, y coordinates
    
    # Temperature: diffusing heat from center
    center = np.array([0.5, 0.5])
    r = np.linalg.norm(coords - center, axis=1)
    temperature = 300 + 100 * np.exp(-r**2 / (0.1 + time)) 
    
    # Velocity: rotating flow
    vx = -(coords[:, 1] - 0.5)
    vy = (coords[:, 0] - 0.5)
    velocity = np.column_stack([vx, vy, np.zeros(n_nodes)]).astype(np.float32)
    
    # Residuals (solver convergence)
    residuals = np.array([1e-3 / (iteration + 1), 1e-4 / (iteration + 1)], dtype=np.float32)
    
    return SimulationSnapshot(
        time=time,
        iteration=iteration,
        mesh=mesh,
        fields={
            "temperature": FieldData(
                name="temperature",
                field_type="scalar",
                location="node",
                data=temperature.astype(np.float32),
                units="K"
            ),
            "velocity": FieldData(
                name="velocity",
                field_type="vector",
                location="node",
                data=velocity,
                units="m/s"
            )
        },
        residuals=residuals
    )

# Create snapshots at t=0, 0.1, 0.2
snapshots = [
    create_snapshot(0.0, 0, mesh),
    create_snapshot(0.1, 100, mesh),
    create_snapshot(0.2, 200, mesh),
]

print(f"Created {len(snapshots)} snapshots")
for s in snapshots:
    print(f"  t={s.time}: {list(s.fields.keys())}")

Created 3 snapshots
  t=0.0: ['temperature', 'velocity']
  t=0.1: ['temperature', 'velocity']
  t=0.2: ['temperature', 'velocity']


In [5]:
# Create the complete simulation case
case = SimulationCase(
    name="heat_transfer_2d",
    description="2D heat transfer with rotating flow",
    solver="simpleFoam",
    parameters={
        "dt": 0.001,
        "nu": 1e-5,
        "alpha": 1e-4,
    },
    snapshots=snapshots
)

print(f"Simulation case: {case.name}")
print(f"  Solver: {case.solver}")
print(f"  Parameters: {case.parameters}")
print(f"  Snapshots: {len(case.snapshots)}")

Simulation case: heat_transfer_2d
  Solver: simpleFoam
  Parameters: {'dt': 0.001, 'nu': 1e-05, 'alpha': 0.0001}
  Snapshots: 3


## 3. Extract the Simulation Data

`Packable.extract()` recursively processes the nested structure:
- Arrays → `{"$ref": checksum, "$type": "array"}`
- Nested Mesh (Packable) → `{"$ref": checksum, "$type": "packable", ...}`

In [6]:
# Extract the entire simulation case
extracted = Packable.extract(case)

print(f"Extracted data keys: {list(extracted.data.keys())}")
print(f"\nTotal assets: {len(extracted.assets)}")
print(f"\nAsset sizes:")
for checksum, data in extracted.assets.items():
    print(f"  {checksum}: {len(data):,} bytes")

Extracted data keys: ['name', 'description', 'solver', 'parameters', 'snapshots']

Total assets: 8

Asset sizes:
  4e71a79c2d0fa381: 1,467 bytes
  28dc719a0c8c1387: 200 bytes
  59ffdd6bfac7876a: 250 bytes
  0c345962a52e7e2c: 133 bytes
  292cfc23f6777b02: 200 bytes
  17b38a2f2cbdd0a7: 133 bytes
  145838c08771e6ef: 201 bytes
  ea37b2590dba4b31: 132 bytes


In [7]:
# Examine the extracted data structure
import json

# Pretty print the extracted data (it's JSON-serializable!)
print("Extracted data structure:")
print(json.dumps(extracted.data, indent=2)[:2000] + "\n...")

Extracted data structure:
{
  "name": "heat_transfer_2d",
  "description": "2D heat transfer with rotating flow",
  "solver": "simpleFoam",
  "parameters": {
    "dt": 0.001,
    "nu": 1e-05,
    "alpha": 0.0001
  },
  "snapshots": [
    {
      "time": 0.0,
      "iteration": 0,
      "mesh": {
        "$ref": "4e71a79c2d0fa381"
      },
      "fields": {
        "temperature": {
          "name": "temperature",
          "field_type": "scalar",
          "location": "node",
          "data": {
            "$ref": "28dc719a0c8c1387"
          },
          "units": "K"
        },
        "velocity": {
          "name": "velocity",
          "field_type": "vector",
          "location": "node",
          "data": {
            "$ref": "59ffdd6bfac7876a"
          },
          "units": "m/s"
        }
      },
      "residuals": {
        "$ref": "0c345962a52e7e2c"
      }
    },
    {
      "time": 0.1,
      "iteration": 100,
      "mesh": {
        "$ref": "4e71a79c2d0fa381"
      },
 

In [8]:
# Look at the first snapshot's mesh reference
mesh_ref = extracted.data["snapshots"][0]["mesh"]
print(f"Mesh reference: {mesh_ref}")


Mesh reference: {'$ref': '4e71a79c2d0fa381'}


## 4. Asset Deduplication

Since all snapshots share the same mesh, it's only stored once!

In [9]:
# Check mesh references across snapshots
mesh_refs = [s["mesh"]["$ref"] for s in extracted.data["snapshots"]]
print(f"Mesh checksums: {mesh_refs}")
print(f"\nAll same? {len(set(mesh_refs)) == 1}")
print(f"\nThe mesh is stored only ONCE in assets, saving {(len(mesh_refs)-1) * len(extracted.assets[mesh_refs[0]]):,} bytes!")

Mesh checksums: ['4e71a79c2d0fa381', '4e71a79c2d0fa381', '4e71a79c2d0fa381']

All same? True

The mesh is stored only ONCE in assets, saving 2,934 bytes!


## 5. Reconstruct back to SimulationCase

In [10]:
reconstructed_case = Packable.reconstruct(SimulationCase, extracted.data, extracted.assets)
print(f"\nReconstructed case: {reconstructed_case.name} with {len(reconstructed_case.snapshots)} snapshots")

decoded_mesh = Mesh.decode(reconstructed_case.snapshots[0].mesh.encode())
print(f"Decoded mesh from reconstructed case: {decoded_mesh.vertex_count} vertices, {len(decoded_mesh.indices)} indices")


Reconstructed case: heat_transfer_2d with 3 snapshots
Decoded mesh from reconstructed case: 25 vertices, 64 indices


## 6. Lazy Loading with CachedAssetLoader

When working with large datasets, you may want to:
- Load assets on-demand (lazy loading)
- Cache fetched assets to disk for subsequent runs

`Packable.reconstruct()` supports this via `CachedAssetLoader`.

In [11]:
from pathlib import Path
from meshly.packable import CachedAssetLoader
from meshly.data_handler import DataHandler

# Simulate fetching assets from remote storage
fetch_count = [0]

def fetch_from_storage(checksum: str) -> bytes:
    """Simulate fetching from cloud/remote storage."""
    fetch_count[0] += 1
    print(f"  Fetching asset {checksum[:8]}... (fetch #{fetch_count[0]})")
    return extracted.assets[checksum]

# Using a plain callable - lazy loading, assets fetched on field access
print("=== Lazy loading with callable ===")
lazy_case = Packable.reconstruct(SimulationCase, extracted.data, fetch_from_storage)

print(f"\nLazyModel created, no assets fetched yet. Fetch count: {fetch_count[0]}")
print(f"Type: {type(lazy_case)}")

# Access primitive fields - no fetch needed
print(f"\nCase name: {lazy_case.name}")
print(f"Fetch count after accessing name: {fetch_count[0]}")

=== Lazy loading with callable ===

LazyModel created, no assets fetched yet. Fetch count: 0
Type: <class 'meshly.packable.LazyModel'>

Case name: heat_transfer_2d
Fetch count after accessing name: 0


In [12]:
# Access a snapshot - this triggers fetching of nested assets
print("=== Accessing first snapshot ===")
snapshot = lazy_case.snapshots[0]
print(f"Fetch count after accessing snapshots: {fetch_count[0]}")

# The mesh is fetched when we access it
print(f"\nSnapshot time: {snapshot.time}")
print(f"Mesh vertices shape: {snapshot.mesh.vertices.shape}")

# To fully resolve and get the actual Pydantic model:
print("\n=== Resolving to full model ===")
resolved_case = lazy_case.resolve()
print(f"Final fetch count: {fetch_count[0]}")
print(f"Resolved type: {type(resolved_case)}")

=== Accessing first snapshot ===
  Fetching asset 4e71a79c... (fetch #1)
  Fetching asset 28dc719a... (fetch #2)
  Fetching asset 59ffdd6b... (fetch #3)
  Fetching asset 0c345962... (fetch #4)
  Fetching asset 4e71a79c... (fetch #5)
  Fetching asset 292cfc23... (fetch #6)
  Fetching asset 59ffdd6b... (fetch #7)
  Fetching asset 17b38a2f... (fetch #8)
  Fetching asset 4e71a79c... (fetch #9)
  Fetching asset 145838c0... (fetch #10)
  Fetching asset 59ffdd6b... (fetch #11)
  Fetching asset ea37b259... (fetch #12)
Fetch count after accessing snapshots: 12

Snapshot time: 0.0
Mesh vertices shape: (25, 3)

=== Resolving to full model ===
Final fetch count: 12
Resolved type: <class '__main__.SimulationCase'>


### CachedAssetLoader: Persistent Disk Cache

For repeated access, use `CachedAssetLoader` to cache fetched assets to disk:

In [13]:
import tempfile

# Reset fetch counter
fetch_count[0] = 0

with tempfile.TemporaryDirectory() as tmpdir:
    cache_path = Path(tmpdir) / "asset_cache"
    
    # Create cache handler and loader
    cache_handler = DataHandler.create(cache_path)
    loader = CachedAssetLoader(fetch=fetch_from_storage, cache=cache_handler)
    
    print("=== First run: fetching and caching ===")
    lazy1 = Packable.reconstruct(SimulationCase, extracted.data, loader)
    _ = lazy1.resolve()  # Fetch all assets
    print(f"Assets fetched: {fetch_count[0]}")
    
    # Finalize to persist cache
    cache_handler.finalize()
    
    # Second run with same cache location
    print("\n=== Second run: reading from cache ===")
    fetch_count[0] = 0
    cache_handler2 = DataHandler.create(cache_path)
    loader2 = CachedAssetLoader(fetch=fetch_from_storage, cache=cache_handler2)
    
    lazy2 = Packable.reconstruct(SimulationCase, extracted.data, loader2)
    resolved2 = lazy2.resolve()
    print(f"Assets fetched from remote: {fetch_count[0]} (all served from cache!)")
    print(f"Resolved case: {resolved2.name} with {len(resolved2.snapshots)} snapshots")

=== First run: fetching and caching ===
  Fetching asset 4e71a79c... (fetch #1)
  Fetching asset 28dc719a... (fetch #2)
  Fetching asset 59ffdd6b... (fetch #3)
  Fetching asset 0c345962... (fetch #4)
  Fetching asset 292cfc23... (fetch #5)
  Fetching asset 17b38a2f... (fetch #6)
  Fetching asset 145838c0... (fetch #7)
  Fetching asset ea37b259... (fetch #8)
Assets fetched: 8

=== Second run: reading from cache ===
Assets fetched from remote: 0 (all served from cache!)
Resolved case: heat_transfer_2d with 3 snapshots


## Summary

`Packable.extract()` is a **static method** that handles:

| Input | Handling |
|-------|----------|
| Top-level Packable | Expands fields, arrays → refs |
| Nested Packable (in dict/list/BaseModel) | Becomes `{"$ref": ..., "$type": "packable"}` |
| NumPy arrays | Becomes `{"$ref": ..., "$type": "array"}` |
| BaseModel | Preserves structure with `__model_class__` |
| Primitives | Passed through unchanged |

`Packable.reconstruct()` supports three modes:

| AssetProvider | Result | Use Case |
|--------------|--------|----------|
| `Dict[str, bytes]` | `TModel` | Eager loading, all assets in memory |
| `AssetFetcher` | `LazyModel[TModel]` | Lazy per-field loading |
| `CachedAssetLoader` | `LazyModel[TModel]` | Lazy loading with disk cache |

Key benefits for scientific computing:
- **Deduplication**: Shared meshes/arrays stored once
- **Lazy loading**: Load only the fields you need with `LazyModel`
- **Persistent caching**: `CachedAssetLoader` caches fetched assets to disk
- **JSON metadata**: Easy to query/index simulation cases
- **Version control friendly**: Small metadata files, large binary assets