Problem
POST /load in buckaroo.server.handlers.LoadHandler requires path: str — a filesystem path the server can read. For host integrations that produce DataFrames in memory (Rust, Node, Electron renderers, etc.) without going through disk, this forces a temp-file dance:
- Serialize bytes to a temp
.parquet somewhere mutually visible
- POST
/load with that path
- Clean up the temp file later
Concrete pain: the buckaroo-tauri integration shape today is
host (Rust)
├─ produce arrow/parquet bytes
├─ write temp file
├─ invoke buckaroo_load_path({ path: tempfile })
└─ (eventually) delete temp file
Filesystem visibility between renderer/Rust/Python is a per-platform headache (Tauri sandboxes, Electron contextIsolation, Docker mounts, etc.). And for hosts that have already serialized to memory there's no reason to round-trip through disk.
Where this came from
xorq-desktop integration spike — initially planned to send Arrow IPC bytes straight to buckaroo. We worked around it by pointing /load at xorq's snapshot-cache parquet directly (which happens to live on disk because of xorq's ParquetSnapshotCache). That's a convenient coincidence — most hosts won't have it.
Proposal
Extend LoadHandler.post to accept one of:
{ "path": "/abs/path.parquet", ... } // existing
{ "arrow_ipc_b64": "<base64>", ... } // new — Arrow IPC stream bytes
{ "parquet_b64": "<base64>", ... } // new — raw Parquet bytes
Validation: exactly one of path / arrow_ipc_b64 / parquet_b64 required. Error if multiple or none.
Server-side dispatch in _load_file_with_error_handling:
if path:
df = load_file(path)
elif arrow_ipc_b64:
df = load_arrow_bytes(base64.b64decode(arrow_ipc_b64))
elif parquet_b64:
df = load_parquet_bytes(base64.b64decode(parquet_b64))
metadata = get_metadata(df, label or "<inline>")
The "filename" / "path" fields in the response can be a synthesized label (passed in or derived) when bytes are used. Everything downstream (session, ws push, browser window) keeps working unchanged.
Why both arrow_ipc and parquet
Arrow IPC is faster to encode for hosts that already hold an Arrow-backed table (no parquet write overhead). Parquet is what most hosts already have on disk and want to keep compatibility with. Supporting both is a tiny dispatch cost and avoids forcing one transport.
Companion plugin change
buckaroo-tauri::buckaroo_load_path would gain arrow_ipc_b64 / parquet_b64 variants in LoadPathArgs. The plugin already does base64 on the binary frame relay (supervisor.rs::258-323), so the encoding/decoding tooling is in scope.
Out of scope
- Streaming inline data. Single-shot encode/decode is fine for the target use cases (< 100k rows ish).
- WebSocket push of inline data. Same —
/load is the right place.
Happy to draft the buckaroo PR + the buckaroo-tauri companion if direction is acceptable.
Problem
POST /loadinbuckaroo.server.handlers.LoadHandlerrequirespath: str— a filesystem path the server can read. For host integrations that produce DataFrames in memory (Rust, Node, Electron renderers, etc.) without going through disk, this forces a temp-file dance:.parquetsomewhere mutually visible/loadwith that pathConcrete pain: the buckaroo-tauri integration shape today is
Filesystem visibility between renderer/Rust/Python is a per-platform headache (Tauri sandboxes, Electron contextIsolation, Docker mounts, etc.). And for hosts that have already serialized to memory there's no reason to round-trip through disk.
Where this came from
xorq-desktopintegration spike — initially planned to send Arrow IPC bytes straight to buckaroo. We worked around it by pointing/loadat xorq's snapshot-cache parquet directly (which happens to live on disk because of xorq'sParquetSnapshotCache). That's a convenient coincidence — most hosts won't have it.Proposal
Extend
LoadHandler.postto accept one of:{ "path": "/abs/path.parquet", ... } // existing { "arrow_ipc_b64": "<base64>", ... } // new — Arrow IPC stream bytes { "parquet_b64": "<base64>", ... } // new — raw Parquet bytesValidation: exactly one of
path/arrow_ipc_b64/parquet_b64required. Error if multiple or none.Server-side dispatch in
_load_file_with_error_handling:The "filename" / "path" fields in the response can be a synthesized label (passed in or derived) when bytes are used. Everything downstream (session, ws push, browser window) keeps working unchanged.
Why both arrow_ipc and parquet
Arrow IPC is faster to encode for hosts that already hold an Arrow-backed table (no parquet write overhead). Parquet is what most hosts already have on disk and want to keep compatibility with. Supporting both is a tiny dispatch cost and avoids forcing one transport.
Companion plugin change
buckaroo-tauri::buckaroo_load_pathwould gainarrow_ipc_b64/parquet_b64variants inLoadPathArgs. The plugin already does base64 on the binary frame relay (supervisor.rs::258-323), so the encoding/decoding tooling is in scope.Out of scope
/loadis the right place.Happy to draft the buckaroo PR + the buckaroo-tauri companion if direction is acceptable.