Skip to content

feat(self-packaging #43): EmbeddedArchiveFileProvider + factory dispatch + startup wire-up#53

Merged
jrosskopf merged 1 commit into
mainfrom
feature/gh-43-embedded-provider-factory
May 22, 2026
Merged

feat(self-packaging #43): EmbeddedArchiveFileProvider + factory dispatch + startup wire-up#53
jrosskopf merged 1 commit into
mainfrom
feature/gh-43-embedded-provider-factory

Conversation

@jrosskopf
Copy link
Copy Markdown
Contributor

Part of epic #40. Stacked on #52 which is stacked on #51 -- once #51 and #52 merge, GitHub will retarget this to main.

Summary

This is the runtime piece that makes a bundled binary actually serve bundled
content. After this PR, the only thing missing for a working
filesystem-or-bundle binary is the flapi pack subcommand to create the
bundle in the first place (#45) and the DuckDB-side embed:// reader (#44).

  • New src/include/embedded_archive_file_provider.hpp +
    src/embedded_archive_file_provider.cpp -- IFileProvider over a
    shared ArchiveEntries.
  • FileProviderFactory learns about bundles via a process-wide static
    (SetBundleContents / GetBundleContents /
    CreateEmbeddedProvider).
  • main.cpp adds detectAndRegisterEmbeddedBundle() that runs once
    after set_log_level and before initializeConfig.

Dispatch order (the load-bearing change)

                       ┌── remote scheme? ──> DuckDBVFSProvider  (unchanged)
CreateProvider(path) ──┤
                       └── bundle present? ──> EmbeddedArchiveFileProvider  (new)
                            else            ──> LocalFileProvider          (unchanged)

Note that bundle never wins over remote -- s3:// etc still flow to
DuckDB even with a bundle set. That's deliberate: bundled deploys still
need to access cloud data sources at query time.

Why no churn in config_loader / sql_template_processor

The exploration in #40 confirmed both already accept any IFileProvider:

  • ConfigLoader has an overload that takes a custom provider
    (src/include/config_loader.hpp:37-38).
  • SqlTemplateProcessor already routes through
    config_manager->getFileProvider()
    (src/sql_template_processor.cpp:69).

So we change zero call sites. The factory dispatch is the only seam.

Path normalisation

The embedded provider strips, in order:

  1. file:// scheme prefix (parity with LocalFileProvider).
  2. Leading ./ (potentially repeated -- ././foo -> foo).
  3. Leading / (entries are always relative to the bundle root).

So all of flapi.yaml, ./flapi.yaml, file://flapi.yaml, and
/flapi.yaml resolve to the bundle key flapi.yaml.

ListFiles is non-recursive on purpose -- LocalFileProvider uses
directory_iterator which doesn't descend, and we want behavioural
parity. Test case ListFiles is non-recursive enforces this.

Startup wire-up

void detectAndRegisterEmbeddedBundle() {
    auto loc = LocateBundleInSelf();          // #42
    if (!loc.has_value()) return;             // filesystem mode

    try {
        // Read slice [loc->offset, loc->offset + loc->size) from GetSelfPath()
        // ReadArchive(...)                   // #41
        // FileProviderFactory::SetBundleContents(entries);
    } catch (...) {
        // Log WARNING, fall back to filesystem mode (the spike safety net)
    }
}

Failure modes (unreadable binary, truncated slice, malformed ZIP) all
log at WARNING and silently fall back. Spike behaviour #7 (truncate 1 KiB
-> filesystem mode) was the test #42 PR #52 added, and the read-slice
path here honours it.

Tests

11 new test cases in test/cpp/embedded_archive_file_provider_test.cpp:

# Test What it asserts
1 ReadFile on a known entry bytes back unchanged
2 path normalisation ./flapi.yaml, file://..., ./sqls/... all resolve
3 throws on missing entry FileOperationError
4 FileExists present/absent true/false
5 ListFiles with *.yaml, *.sql, * correct results in sorted order
6 ListFiles non-recursive nested sub/b.sql excluded
7 IsRemotePath always false sanity
8 GetProviderName == "embedded" sanity
9 factory picks embedded when bundle set dispatch
10 factory falls back to local when no bundle dispatch
11 factory routes s3:// to DuckDB even with bundle dispatch
14/14 Test #143: FileProviderFactory routes remote paths to DuckDB even with a bundle_test  Passed
100% tests passed, 0 tests failed out of 14

Plus the 3 existing FileProviderFactory tests still pass unchanged.

Test plan

  • 11 new tests pass; 14 total including existing factory tests.
  • All other Catch2 tests still pass; same 2 pre-existing DuckDB
    AddressSanitizer leaks (QueryExecutor type coverage,
    DuckDBResult RAII) on main.
  • Smoke: flapi --validate-config -c examples/flapi.yaml loads
    cleanly with no bundle messages (filesystem mode unchanged).
  • CI cross-platform once stacked PRs land.

Closes #43. Part of #40. Stacked on #52, which is stacked on #51.

Part of #40, depends on #41 (archive_io) and #42 (bundle_locator).
This is the runtime piece that makes configs and SQL templates load
from an appended ZIP without any churn in config_loader, template
processor, or endpoint handlers.

- `src/include/embedded_archive_file_provider.hpp` +
  `src/embedded_archive_file_provider.cpp`:
  - implements IFileProvider against
    `std::shared_ptr<const ArchiveEntries>`
  - path normalisation: strips `file://`, leading `./`, leading `/`
  - `ReadFile` / `FileExists` / `ListFiles` (non-recursive glob,
    parity with LocalFileProvider's directory_iterator semantics)
  - GetProviderName: "embedded"

- `src/vfs_adapter.{hpp,cpp}`:
  - extends FileProviderFactory with a process-wide bundle pointer
    (`SetBundleContents` / `GetBundleContents` /
    `CreateEmbeddedProvider`)
  - new dispatch order in `CreateProvider`:
      1. remote scheme  -> DuckDBVFSProvider (unchanged)
      2. bundle present -> EmbeddedArchiveFileProvider
      3. otherwise      -> LocalFileProvider (unchanged)
  - exposes `using ArchiveEntries = std::map<...>` in the header so
    the factory can refer to bundle contents without dragging in
    archive_io.hpp

- `src/main.cpp`: `detectAndRegisterEmbeddedBundle()` runs after
  `set_log_level`, before `initializeConfig`. Calls
  `LocateBundleInSelf()`, reads the slice from `GetSelfPath()`,
  decompresses with `ReadArchive`, and hands the entries to the
  factory. Any failure -- unreadable self-binary, truncated read,
  bad ZIP -- is logged at WARNING and silently falls back to
  filesystem mode (the spike safety net behaviour, #42 test #7).

Tests (`test/cpp/embedded_archive_file_provider_test.cpp`, 11 cases):
- ReadFile returns bundled bytes
- path normalisation: `./`, `file://`, plain
- throws on missing entry
- FileExists present/absent
- ListFiles with glob (`*.yaml`, `*.sql`, `*`)
- ListFiles is non-recursive
- IsRemotePath always false
- GetProviderName == "embedded"
- factory picks embedded when bundle set
- factory falls back to local when no bundle set
- factory still routes s3:// to DuckDB when bundle set

Verified:
- 14 / 14 embedded-provider + factory tests pass.
- Filesystem-mode smoke test (`flapi --validate-config -c
  examples/flapi.yaml`) loads cleanly with no bundle messages --
  unchanged behaviour for existing operators.
- Same 2 pre-existing DuckDB AddressSanitizer leaks on `main`
  (`QueryExecutor type coverage`, `DuckDBResult RAII`) remain.

Closes #43.
@jrosskopf jrosskopf force-pushed the feature/gh-43-embedded-provider-factory branch from 3e17d55 to b2be88e Compare May 22, 2026 12:48
@jrosskopf jrosskopf marked this pull request as ready for review May 22, 2026 13:38
@jrosskopf jrosskopf merged commit 1b3a599 into main May 22, 2026
17 checks passed
@jrosskopf jrosskopf deleted the feature/gh-43-embedded-provider-factory branch May 22, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-packaging #3: EmbeddedArchiveFileProvider + FileProviderFactory dispatch

1 participant