feat(self-packaging #42): selfpath + bundle_locator (EOCD reverse scan)#52
Closed
jrosskopf wants to merge 1 commit into
Closed
Conversation
Part of #40, depends on #41. Two small modules that let the running binary discover an appended ZIP bundle. - `src/selfpath.{hpp,cpp}` -- cross-platform self-binary path: - Linux: readlink("/proc/self/exe") - macOS: _NSGetExecutablePath - Windows: GetModuleFileNameW (with buffer growth loop) - `src/bundle_locator.{hpp,cpp}` -- reverse-scan a file for a ZIP End-of-Central-Directory record: - reads tail buffer of 22 + max_comment + 64 KiB padding budget - reverse-scans for the 0x06054b50 signature - validates: single-disk archive, entry counts match, comment length fits in tail, anything after comment must be zero (padding tolerance), central directory must fit before EOCD - returns BundleLocation{offset, size} or nullopt - `LocateBundleInSelf()` convenience wrapper over GetSelfPath() Tests (`test/cpp/bundle_locator_test.cpp`, 7 cases): - locate a ZIP appended to 4 KiB of random leading bytes - tolerate 10 KiB of trailing zero padding (the spike-caught case) - nullopt when no EOCD signature exists in random data - nullopt when an EOCD signature has impossible cd_size / cd_offset - nullopt when the bundle is truncated by 1 KiB from EOF - selfpath returns an existing path - LocateBundleInSelf returns nullopt against the unbundled test binary Fixtures reuse `archive_io::WriteArchive` (#41) to produce real ZIPs. Closes #42.
This was referenced May 21, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of epic #40. Depends on #51 (sets the base branch -- once #51 lands, GitHub will offer a one-click rebase onto main).
Summary
Two small modules that together let the running binary discover whether
a ZIP archive has been appended to it.
src/selfpath.{hpp,cpp}-- cross-platform self-binary path(
/proc/self/exe/_NSGetExecutablePath/GetModuleFileNameW).src/bundle_locator.{hpp,cpp}-- reverse-scan a file for a ZIPEnd-of-Central-Directory record, returning
BundleLocation{offset, size}or
nullopt.Why
This is the runtime-detection half of self-packaging. The next sub-issue
(#43,
EmbeddedArchiveFileProvider) callsLocateBundleInSelf()onceat startup; if it returns a location, the bytes at that range are fed to
archive_io::ReadArchive(PR #51) and become the in-memory config tree.What the locator checks
0x06054b50not present in tailentries_this != entries_totalThe padding tolerance is the load-bearing bit. The spike caught
libarchive's default 10240-byte tar-block rounding pushing the EOCD off
file-EOF. We scan a tail of
22 + 65535 + 65536bytes and accept anyamount of zero padding after the EOCD+comment, up to that budget.
PR #51 already neutralises this on the writer side via
archive_write_set_bytes_in_last_block(a, 1), but the reader isdefensive too.
How
BundleLocationis computedZIP layout in the file:
So
loc.offsetis the byte youseek()to, andloc.sizeis what youread -- the result is a valid standalone ZIP that
ReadArchivecanparse directly without any libarchive offset-shifting magic.
Tests (red-then-green TDD)
offset == 4096,size == zip.size()nulloptcd_size = 0xffffffff→nulloptnulloptGetSelfPathLocateBundleInSelfnulloptFixtures reuse
archive_io::WriteArchivefrom #51 -- so the locator istested against real libarchive output, not synthetic ZIPs.
Test plan
AddressSanitizer leaks in DuckDB internals (
QueryExecutor type coverage,DuckDBResult RAII) from 96806ac on main remain.Closes #42. Part of #40. Stacked on #51.