feat(self-packaging #43): EmbeddedArchiveFileProvider + factory dispatch + startup wire-up#53
Merged
Conversation
This was referenced May 22, 2026
de4fe7f to
6ba63b6
Compare
Part of #40, depends on #41 (archive_io) and #42 (bundle_locator). This is the runtime piece that makes configs and SQL templates load from an appended ZIP without any churn in config_loader, template processor, or endpoint handlers. - `src/include/embedded_archive_file_provider.hpp` + `src/embedded_archive_file_provider.cpp`: - implements IFileProvider against `std::shared_ptr<const ArchiveEntries>` - path normalisation: strips `file://`, leading `./`, leading `/` - `ReadFile` / `FileExists` / `ListFiles` (non-recursive glob, parity with LocalFileProvider's directory_iterator semantics) - GetProviderName: "embedded" - `src/vfs_adapter.{hpp,cpp}`: - extends FileProviderFactory with a process-wide bundle pointer (`SetBundleContents` / `GetBundleContents` / `CreateEmbeddedProvider`) - new dispatch order in `CreateProvider`: 1. remote scheme -> DuckDBVFSProvider (unchanged) 2. bundle present -> EmbeddedArchiveFileProvider 3. otherwise -> LocalFileProvider (unchanged) - exposes `using ArchiveEntries = std::map<...>` in the header so the factory can refer to bundle contents without dragging in archive_io.hpp - `src/main.cpp`: `detectAndRegisterEmbeddedBundle()` runs after `set_log_level`, before `initializeConfig`. Calls `LocateBundleInSelf()`, reads the slice from `GetSelfPath()`, decompresses with `ReadArchive`, and hands the entries to the factory. Any failure -- unreadable self-binary, truncated read, bad ZIP -- is logged at WARNING and silently falls back to filesystem mode (the spike safety net behaviour, #42 test #7). Tests (`test/cpp/embedded_archive_file_provider_test.cpp`, 11 cases): - ReadFile returns bundled bytes - path normalisation: `./`, `file://`, plain - throws on missing entry - FileExists present/absent - ListFiles with glob (`*.yaml`, `*.sql`, `*`) - ListFiles is non-recursive - IsRemotePath always false - GetProviderName == "embedded" - factory picks embedded when bundle set - factory falls back to local when no bundle set - factory still routes s3:// to DuckDB when bundle set Verified: - 14 / 14 embedded-provider + factory tests pass. - Filesystem-mode smoke test (`flapi --validate-config -c examples/flapi.yaml`) loads cleanly with no bundle messages -- unchanged behaviour for existing operators. - Same 2 pre-existing DuckDB AddressSanitizer leaks on `main` (`QueryExecutor type coverage`, `DuckDBResult RAII`) remain. Closes #43.
3e17d55 to
b2be88e
Compare
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of epic #40. Stacked on #52 which is stacked on #51 -- once #51 and #52 merge, GitHub will retarget this to main.
Summary
This is the runtime piece that makes a bundled binary actually serve bundled
content. After this PR, the only thing missing for a working
filesystem-or-bundle binary is the
flapi packsubcommand to create thebundle in the first place (#45) and the DuckDB-side
embed://reader (#44).src/include/embedded_archive_file_provider.hpp+src/embedded_archive_file_provider.cpp-- IFileProvider over ashared
ArchiveEntries.FileProviderFactorylearns about bundles via a process-wide static(
SetBundleContents/GetBundleContents/CreateEmbeddedProvider).main.cppaddsdetectAndRegisterEmbeddedBundle()that runs onceafter
set_log_leveland beforeinitializeConfig.Dispatch order (the load-bearing change)
Note that bundle never wins over remote --
s3://etc still flow toDuckDB even with a bundle set. That's deliberate: bundled deploys still
need to access cloud data sources at query time.
Why no churn in
config_loader/sql_template_processorThe exploration in #40 confirmed both already accept any
IFileProvider:ConfigLoaderhas an overload that takes a custom provider(
src/include/config_loader.hpp:37-38).SqlTemplateProcessoralready routes throughconfig_manager->getFileProvider()(
src/sql_template_processor.cpp:69).So we change zero call sites. The factory dispatch is the only seam.
Path normalisation
The embedded provider strips, in order:
file://scheme prefix (parity with LocalFileProvider)../(potentially repeated --././foo->foo)./(entries are always relative to the bundle root).So all of
flapi.yaml,./flapi.yaml,file://flapi.yaml, and/flapi.yamlresolve to the bundle keyflapi.yaml.ListFilesis non-recursive on purpose --LocalFileProviderusesdirectory_iteratorwhich doesn't descend, and we want behaviouralparity. Test case
ListFiles is non-recursiveenforces this.Startup wire-up
Failure modes (unreadable binary, truncated slice, malformed ZIP) all
log at WARNING and silently fall back. Spike behaviour #7 (truncate 1 KiB
-> filesystem mode) was the test #42 PR #52 added, and the read-slice
path here honours it.
Tests
11 new test cases in
test/cpp/embedded_archive_file_provider_test.cpp:ReadFileon a known entry./flapi.yaml,file://...,./sqls/...all resolveFileOperationErrorFileExistspresent/absentListFileswith*.yaml,*.sql,*ListFilesnon-recursivesub/b.sqlexcludedIsRemotePathalways falseGetProviderName == "embedded"s3://to DuckDB even with bundlePlus the 3 existing FileProviderFactory tests still pass unchanged.
Test plan
AddressSanitizer leaks (
QueryExecutor type coverage,DuckDBResult RAII) onmain.flapi --validate-config -c examples/flapi.yamlloadscleanly with no bundle messages (filesystem mode unchanged).
Closes #43. Part of #40. Stacked on #52, which is stacked on #51.