Skip to content

feat(self-packaging #41): libarchive + archive_io RAII wrapper#51

Merged
jrosskopf merged 1 commit into
mainfrom
feature/gh-41-libarchive-archive-io
May 22, 2026
Merged

feat(self-packaging #41): libarchive + archive_io RAII wrapper#51
jrosskopf merged 1 commit into
mainfrom
feature/gh-41-libarchive-archive-io

Conversation

@jrosskopf
Copy link
Copy Markdown
Contributor

Part of epic #40. First sub-issue of the self-packaging work.

Summary

  • Adds libarchive to vcpkg.json (vcpkg pulls it on next configure).
  • New src/include/archive_io.hpp + src/archive_io.cpp: thin RAII
    C++ wrapper over libarchive's C API for ZIP read/write in memory.
  • Tests in test/cpp/archive_io_test.cpp: 5 cases, 14 assertions.

Why

This is the foundation every later self-packaging sub-issue
(#42#50) builds on. Bundle locator (#42) needs a deterministic ZIP
writer to produce test fixtures; embedded file provider (#43) needs
the reader to decompress entries at startup; pack (#45) needs both.

Doing it as its own PR keeps each layer reviewable in isolation.

What's inside the wrapper

Function Purpose
WriteArchive(entries, opts) ArchiveEntries (= std::map<path, bytes>) → std::vector<uint8_t> ZIP.
ReadArchive(buffer) Reverse: std::vector<uint8_t>ArchiveEntries.
ArchiveWriteOptions::source_date_epoch mtime stamp for reproducible builds.
ArchiveIOError Thrown on libarchive failure (truncated / malformed input).

Two design choices worth flagging:

  1. archive_write_set_bytes_in_last_block(a, 1) — the spike
    caught that libarchive's default 10240-byte tar-block padding
    pushes the EOCD record away from EOF and breaks the reverse-scan
    locator (Self-packaging #2: selfpath + bundle_locator (cross-platform EOCD scan) #42 will use). This wrapper bakes the fix in so callers
    can't accidentally re-introduce the padding.

  2. std::map for entries — deterministic iteration order, which
    is half of reproducible builds. The other half is the
    source_date_epoch mtime stamp. Same input + same epoch ⇒
    byte-identical output (tested).

Tests (red-then-green TDD)

# Test What it asserts
1 round-trip preserves entry contents Read(Write(x)) == x
2 reproducible with fixed source_date_epoch Two writes with same input + epoch → identical bytes
3 deterministic sort order Two maps with same logical contents, different insertion order → identical bytes
4 rejects malformed input empty buffer / random garbage / truncated zip → ArchiveIOError
5 no trailing tar-block padding EOCD signature sits exactly 22 bytes from EOF (no padding past it)
$ ctest -V -R "archive_io"
100% tests passed, 0 tests failed out of 5
Total Test time (real) =   3.75 sec

Test plan

  • make test builds and passes the new tests on Linux x86_64.
  • Existing 65 Catch2 tests run; two pre-existing failures
    (QueryExecutor type coverage, DuckDBResult RAII) are
    AddressSanitizer leaks in DuckDB internals introduced by
    96806ac on main, unrelated to this PR.
  • CI (Linux ARM64, macOS, Windows) — vcpkg fetches libarchive
    on first configure; existing builds will need to re-configure
    after merge.

Files

  • vcpkg.json+libarchive
  • CMakeLists.txtfind_package(LibArchive), link
    LibArchive::LibArchive, add archive_io.cpp to flapi-lib
  • src/include/archive_io.hpp — public API
  • src/archive_io.cpp — implementation
  • test/cpp/archive_io_test.cpp + test/cpp/CMakeLists.txt — tests

Closes #41. Part of #40.

Foundation for self-packaging (#40). Introduces a thin RAII wrapper
around libarchive's C API for reading and writing ZIP archives
entirely in memory.

- New `src/include/archive_io.hpp` + `src/archive_io.cpp`:
  - `WriteArchive(entries, options)` and `ReadArchive(buffer)`
  - Entries iterated in std::map sort order for deterministic output
  - `archive_write_set_bytes_in_last_block(a, 1)` so the output has no
    trailing tar-block padding (a spike-caught EOCD-scan requirement)
  - mtime stamped from `source_date_epoch` for reproducible builds
  - RAII scope guards around archive*, archive_entry*

- vcpkg.json: add `libarchive`
- CMakeLists.txt: `find_package(LibArchive)`, add archive_io.cpp to
  flapi-lib, link `LibArchive::LibArchive`

Tests (`test/cpp/archive_io_test.cpp`, 5 cases / 14 assertions):
- round-trip preserves entry contents
- reproducible output across runs with fixed SOURCE_DATE_EPOCH
- deterministic (sorted) wire order regardless of insertion order
- malformed input (empty / garbage / truncated) throws ArchiveIOError
- EOCD sits exactly 22 bytes from EOF (no tar-block padding)

Closes #41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-packaging #1: libarchive dependency + archive_io RAII wrapper

1 participant