Skip to content

Self-packaging #4: embed:// DuckDB FileSystem (with Glob + SeekPosition) #44

@jrosskopf

Description

@jrosskopf

Part of epic #40. Depends on #3 (ArchiveContents shared state).

Goal

Let SQL templates reference bundled files directly:
read_csv('embed://data/cities.csv'). This bypasses IFileProvider
and goes through DuckDB's VirtualFileSystem, so we need a custom
duckdb::FileSystem subclass registered with the DuckDB instance
flapi already owns.

Scope

  • New src/duckdb_embed_fs.{hpp,cpp} subclassing duckdb::FileSystem.
  • Constructor takes the same std::shared_ptr<ArchiveContents> as
    the embedded file provider (Ability to pass default values to query parameters #3) — one decompressed map, two readers.
  • Override:
    • OpenFile — return a custom FileHandle over an in-memory blob.
    • Read (both overloads), Seek, GetFileSize.
    • Globread_csv() expands paths before opening; base
      throws "not implemented" (spike caught this at runtime).
    • SeekPosition — same reason.
  • CanHandleFile checks for embed:// scheme prefix.
  • Register on the DuckDB instance once at startup
    (src/main.cpp, after the database manager exists) — only when a
    bundle is present.

Red tests (test/cpp/duckdb_embed_fs_test.cpp)

  1. Proof-of-life (spike behaviour Feature Request: Support Apache Arrow IPC Streaming over HTTP #9):
    • In-process DuckDB instance.
    • Register EmbeddedFileSystem over a fixture archive containing
      data/cities.csv.
    • Execute SELECT * FROM read_csv('embed://data/cities.csv').
    • Assert the expected rows come back.
  2. Glob('embed://data/*.csv') returns the expected entries
    (catches read_csv glob expansion).
  3. Mixed read: a template that joins embed:// against a regular
    parquet path on disk works (no interference with default
    filesystem).

Green criteria

  • All red tests pass.
  • DuckDB tests in the existing suite still pass.
  • make integration-test still green.

Files

  • New: src/duckdb_embed_fs.{hpp,cpp}, test/cpp/duckdb_embed_fs_test.cpp
  • Modified: src/main.cpp (register on DB instance when bundle exists)

Notes

  • DuckDBVFSProvider already touches DuckDB's internal FileSystem
    API (src/vfs_adapter.cpp:218-397) and has TODOs about that being
    unstable. The embed FS uses the same surface — we accept the same
    trade-off, and the round-trip integration test (Flapi exe is not working for non docker local testing #6) catches drift
    on every CI run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions