feat(self-packaging #41): libarchive + archive_io RAII wrapper#51
Merged
Conversation
Foundation for self-packaging (#40). Introduces a thin RAII wrapper around libarchive's C API for reading and writing ZIP archives entirely in memory. - New `src/include/archive_io.hpp` + `src/archive_io.cpp`: - `WriteArchive(entries, options)` and `ReadArchive(buffer)` - Entries iterated in std::map sort order for deterministic output - `archive_write_set_bytes_in_last_block(a, 1)` so the output has no trailing tar-block padding (a spike-caught EOCD-scan requirement) - mtime stamped from `source_date_epoch` for reproducible builds - RAII scope guards around archive*, archive_entry* - vcpkg.json: add `libarchive` - CMakeLists.txt: `find_package(LibArchive)`, add archive_io.cpp to flapi-lib, link `LibArchive::LibArchive` Tests (`test/cpp/archive_io_test.cpp`, 5 cases / 14 assertions): - round-trip preserves entry contents - reproducible output across runs with fixed SOURCE_DATE_EPOCH - deterministic (sorted) wire order regardless of insertion order - malformed input (empty / garbage / truncated) throws ArchiveIOError - EOCD sits exactly 22 bytes from EOF (no tar-block padding) Closes #41
This was referenced May 21, 2026
Merged
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of epic #40. First sub-issue of the self-packaging work.
Summary
src/include/archive_io.hpp+src/archive_io.cpp: thin RAIIC++ wrapper over libarchive's C API for ZIP read/write in memory.
test/cpp/archive_io_test.cpp: 5 cases, 14 assertions.Why
This is the foundation every later self-packaging sub-issue
(#42–#50) builds on. Bundle locator (#42) needs a deterministic ZIP
writer to produce test fixtures; embedded file provider (#43) needs
the reader to decompress entries at startup; pack (#45) needs both.
Doing it as its own PR keeps each layer reviewable in isolation.
What's inside the wrapper
WriteArchive(entries, opts)ArchiveEntries(=std::map<path, bytes>) →std::vector<uint8_t>ZIP.ReadArchive(buffer)std::vector<uint8_t>→ArchiveEntries.ArchiveWriteOptions::source_date_epochArchiveIOErrorTwo design choices worth flagging:
archive_write_set_bytes_in_last_block(a, 1)— the spikecaught that libarchive's default 10240-byte tar-block padding
pushes the EOCD record away from EOF and breaks the reverse-scan
locator (Self-packaging #2: selfpath + bundle_locator (cross-platform EOCD scan) #42 will use). This wrapper bakes the fix in so callers
can't accidentally re-introduce the padding.
std::mapfor entries — deterministic iteration order, whichis half of reproducible builds. The other half is the
source_date_epochmtime stamp. Same input + same epoch ⇒byte-identical output (tested).
Tests (red-then-green TDD)
Read(Write(x)) == xsource_date_epochArchiveIOErrorTest plan
make testbuilds and passes the new tests on Linux x86_64.(
QueryExecutor type coverage,DuckDBResult RAII) areAddressSanitizer leaks in DuckDB internals introduced by
96806ac on main, unrelated to this PR.
on first configure; existing builds will need to re-configure
after merge.
Files
vcpkg.json—+libarchiveCMakeLists.txt—find_package(LibArchive), linkLibArchive::LibArchive, addarchive_io.cpptoflapi-libsrc/include/archive_io.hpp— public APIsrc/archive_io.cpp— implementationtest/cpp/archive_io_test.cpp+test/cpp/CMakeLists.txt— testsCloses #41. Part of #40.