Releases: abyo-software/s4
v1.2.0 — day-2 operations: prove it, automate it, keep it healthy
v1.2 — day-2 operations: prove it, automate it, keep it healthy.
Four additive features — the savings ledger + s4 savings (measured
$ saved in production, the counterpart to s4 estimate's prediction),
s4 maintain (policy-driven migrate/recompact/storage-class
transitions, one-shot or resident), dictionary day-2 ops
(s4 dict-status + restart-less SIGHUP rotation), and opt-in s4fs
writes (pandas/pyarrow writing gateway-compatible objects without the
gateway) — hardened by a 3-round dual-reviewer audit (findings
17 → 4 → 5 doc-trivia; zero P1 across all rounds). The v1.0 freeze
contract holds: everything below is additive and default-off;
flag-less behavior is bit-for-bit identical to v1.1.0.
Fixed (v1.2 audit round 2 — adversarial verification of the round-1 fix wave)
- P2 Replication replicas no longer carry the
s4-ledgermarker
(they are never ledger-counted, so a gateway-routed delete of a
replica would have re-opened the asymmetric-subtraction bug round 1
closed). Both replication metadata capture sites route through a
marker-stripping snapshot helper. - P2 Ledger add and subtract now resolve the same logical size on
churn: ledger-enabled SSE/versioned multipart re-PUTs stamp
s4-original-size/s4-compressed-size(the exact values the add
used), REPLACE copies stamp the probe-resolved original, and
COPY-directive copies probe the destination for the add. Previously
each add→delete cycle of a sidecar-suppressed multipart object
stranded phantom original bytes (overstated savings). New e2e pins a
versioned-multipart churn returning bucket totals to exactly zero. - P3 Marker-without-add cases (cap-exceeded multipart Complete,
ledger flag toggles, and replicas written between the round-1 and
round-2 fixes — moot for releases since v1.2 ships both) are now
disclosed in the module contract, the report notes, and the
Complete-path WARN — not eliminated (zero-clamp + drift note remain
the guard rails). - P3 Access-point copy sources: REPLACE copies now strip
client-supplieds4-*metadata regardless of the source-addressing
variant (forgeds4-ledger/s4-original-sizevia AP ARN closed),
and AP sources can no longer read reserved keys (.s4index/
.s4dict/).
Fixed (v1.2 audit round 1 — 4 reviewers over v1.1.0..HEAD, 2026-06-12)
- P2 The savings ledger no longer subtracts objects it never added:
gateway writes made with the ledger enabled stamp an unforgeable
internals4-ledgermetadata marker, and deletes/overwrites of
unmarked objects (backend-direct,s4fs-written,migrate/
recompactoutput) skip subtraction with a per-bucket
skipped_unaccountedtally + report note. Ratio and $/month floor at
0 with a drift note. Previously a migrate-baked bucket could report
negative savings after gateway-routed deletes (incl. lifecycle
expiry). - P2
s4 maintaintransition copies are pinned with
x-amz-copy-source-if-match: a concurrent overwrite between the
attribute HEAD and the CopyObject now makes the backend refuse with
412 (countedetag-raced) instead of stamping the old object's
s4-*manifest onto the new bytes (which made the key unreadable
through the gateway until the next rewrite). - P2
CompleteMultipartUploadno longer runs the ledger's
frame-scan accounting when the ledger flag is off (CPU-only
regression on large multiparts; output bytes were unaffected). - P2 (s4fs) a sidecar PUT failure after a successful body write now
raises a typedS4SidecarWriteErrorand still invalidates the
per-path caches (try/finally), so same-instance reads see the new
object; previously stale caches could serve the old manifest against
the new body. - P2 The README SemVer freeze table now states the Python binding
contract as a guaranteed-minimum set plus CHANGELOG-recorded additive
exports (it still claimed "exactly" the v1.0 names while v1.1/v1.2
shipped additive helpers). - P3 Maintain transition re-sends
Expires/
WebsiteRedirectLocation; report notes now state precisely what a
REPLACE-directive class copy changes (backend-SSE re-encryption under
the bucket default, multipart→single-part checksum/ETag change with
sidecar full-read fallback). Residents4 maintainno longer risks
sleeping out a full--intervalwhen SIGTERM lands in the
flag-check gap (notify_onepermit). - P3
s4 dict-status: 10 s HTTP timeout (was unbounded), the
Prometheus text parser no longer panics on multibyte escape
sequences in third-party output, and the cumulative-counter semantics
(post-rotation STALE persistence, removed-prefix series lingering)
are documented.s4 savingsagainst a missing state file now says so
in a note instead of silently reporting zeros. - P3 Ledger internals: Prometheus gauges are stamped inside the
write lock (no transient reordering), SIGUSR1 dumps route through the
ledger's ownflush()(no.tmprace with the event flush), and
.s4dict/propagation objects are explicitly excluded from
accounting (internal keys are never ledger-counted; documented
contract). (s4-codec-py)encode_s4_objectpassthrough CRC32C now
releases the GIL.
Added
-
s4fs write support (opt-in):
S4FileSystem(write_enabled=True)
(orstorage_options={"write_enabled": True}through fsspec/pandas)
enablespipe_file/put_file/open(path, "wb"), writing
gateway-compatible S4 objects directly to the backend:
df.to_parquet("s4://bucket/key")now works without the gateway.
The encoder (s4_codec.encode_s4_object, new in the Python binding
alongsidebind_index+pick_chunk_size) reproduces the gateway's
single-PUT path byte-compatibly — S4F2cpu-zstdframes using the
gateway chunk-size policy (1 MiB / 4 MiB / 16 MiB thresholds), the
five manifest metadata keys (s4-codec/s4-original-size/
s4-compressed-size/s4-crc32c/s4-framed), and, for
multi-frame bodies, a<key>.s4indexsidecar bound to the backend
ETag + size after the body PUT (single-frame overwrites clean up a
stale sidecar). Verified end-to-end against MinIO + the real
gateway binary: gateway GET and Range GET return the original
bytes,s4 verify-sidecarreports OK with the version binding
intact, and pandas/pyarrow round-trip through both s4fs and the
gateway.write_codec="passthrough"stores raw stamped bodies;
everything else (cpu-gzip / cpu-zstd-dict / nvcomp-* / SSE /
append / versioning) is refused with a typed error pointing at the
gateway. Underlying filesystems that cannot stamp S3 user metadata
are refused withS4MetadataUnsupportedError(an unstamped framed
body would be served raw by the gateway); s3fs is supported out of
the box. Default behaviour without the flag is unchanged
(read-only). -
Per-prefix dictionary metrics: the
--zstd-dictPUT branch now
exportss4_dict_put_total{prefix,outcome="win"|"loss"}and
s4_dict_put_bytes_total{prefix,kind="original"|"dict"|"plain"}—
both compression results were already measured per PUT, so the byte
counters are exact on wins and losses alike. Cardinality is bounded
by the configured prefix count; with no dict configuration the
series are never registered (default behaviour bit-for-bit
unchanged). The gateway also self-monitors: a prefix whose rolling
win rate over its last 100 dict-path PUTs drops below 0.5 logs a
stale-dictionary WARN, at most once per prefix per hour. -
s4 dict-status --metrics-url <URL> [--warn-win-rate 0.5] [--format table|json]: scrapes a running gateway's/metrics
(built-in minimal Prometheus text parser, no new dependencies) and
reports per-prefix dictionary win rate, effective compression ratio
(dict bytes / original bytes) and lazys4_dict_fetch_totalerror
counts. Prefixes below the win-rate threshold get a "dictionary may
be stale; consider retraining (s4 train-dict)" warning and the
command exits 1 — cron-able drift monitoring. -
--zstd-dict-map <FILE>+ SIGHUP reload: TOML[mappings]
table ("<bucket>/<prefix>" = "<dict-id>") as the reloadable twin
of repeated--zstd-dictflags — identical validation, boot-time
fetch + fingerprint verification and 1 MiB dictionary cap; a prefix
configured in both places is a boot error. On SIGHUP the file is
re-read, new dictionaries are fetched + verified (already-loaded
ones are reused), and the store is swapped atomically (arc-swap RCU
— in-flight requests finish on the generation they started with),
so rotation iss4 train-dict→ edit map →kill -HUP, no gateway
restart. A failed reload keeps the current mappings live (ERROR logs4_dict_reload_total{result="err"}; success bumps
result="ok"). Without the flag, SIGHUP does not touch dictionary
configuration. New library surface:
S4Service::with_shared_zstd_dicts,
dict::{SharedDictStore, DictWinTracker, parse_zstd_dict_map, merge_dict_entries, build_dict_status, parse_prom_sample}.
-
Docs: README "Operating dictionaries" section (dict-status /
rotation runbook), plus an explicit note that multipart uploads are
out of the dictionary path by design — parts never consult the
dict store, and S3's 5 MiB minimum part size sits far above the
small-object ceiling (default 1 MiB--zstd-dict-max-bytes). -
E2E:
tests/dict_ops_minio.rs(Docker-gated, reals4binary)
— win/loss counters on/metrics,dict-statusexit codes 0/1
with the retrain warning, map-file boot, SIGHUP rotation picking up
a new prefix without a restart, and the fail-safe on a broken map
(previous store keeps serving). -
s4 maintain --policy <FILE> [--execute] [--interval <DUR>] [--format table|json]: policy-driven bucket maintenance. A TOML
file of[[rule]]entries (uniquename,bucket, optional
prefix, commonolder-thanage gate) runs sequentially top to
bottom; `action = "...
v1.1.0 — adoption tooling + small-object compression
v1.1 — adoption tooling + small-object compression. Six additive
features (s4 estimate / s4 migrate / zstd dictionaries +
s4 train-dict / s4fs fsspec adapter / s4 recompact / GPU batched
small-PUT compression) hardened by a 3-round dual-reviewer audit
(Claude ×3 + Codex; findings 20 → 7 → 5, P1/P2 zero at round 3). The
v1.0 freeze contract holds: every change below is additive and
default-off; flag-less PUT/GET behavior is bit-for-bit unchanged.
Fixed (audit round 2 — adversarial verification of the round-1 fix wave)
- P2
CreateMultipartUploadnow strips client-supplieds4-*
metadata likeput_objectdoes — a forgedx-amz-meta-s4-encrypted
could otherwise survive onto a completed multipart object and 5xx a
flag-less GET (multipart re-open of the round-1 PUT fix). - P2
migrate/recompactno longer hard-fail every object when
GetObjectTaggingis denied or unimplemented: such objects skip as
tags-unreadable(data is never rewritten tag-less),NoSuchTagSet
counts as "no tags", and a new--no-tagsflag opts out of tag
inheritance entirely. Transient tagging errors still fail hard. - P2 Version-pinned CopyObject (
?versionId=) probes the pinned
source version — not the latest — for both the REPLACE metadata merge
and cross-bucket dictionary propagation. - P3 Dictionary size cap (1 MiB) is now one consistent contract:
train-dict --max-dict-bytesand--zstd-dictboot preload reject
what a flag-less gateway's lazy fetch would refuse. - P3 Boot-preloaded dictionaries are bucket-scoped, fetched per
(bucket, id)withs4-dict-sha256verification, and the server
refuses to boot when one dict-id resolves to different bytes across
buckets (16-hex prefix collision). - P3
s4 estimateexcludes already-S4 objects (gateway metadata or
S4F2/S4P1/S4E*magic) from sampling so re-estimating a
gateway-operated bucket doesn't measure framed/encrypted bytes as if
they were compressible plaintext (already_s4count + note). - P3 (s4fs) the sidecar staleness check reuses a cached live-info
snapshot instead of issuing a second backend HEAD perinfo().
Trade-off disclosed: external overwrites during one filesystem
instance's lifetime are detected on the nextinvalidate_cache()/
new instance, not per-read (same contract as the metadata cache).
Fixed (audit round 3 — convergence check)
- P3
s4 estimate's already-S4 body detection is structurally
validated (known codec id + payload fits the object forS4F2,
plausible padding length forS4P1) so customer data that merely
starts with the 4-byte magic isn't silently dropped from sampling. - P3 README/CHANGELOG drift from the round-1/2 fixes corrected:
dictionary 1 MiB cap is documented as one three-surface contract,
migrate/recompact sample outputs show the full current skip taxonomy,
--no-tags/tags-unreadable/already-s4estimate exclusions
documented.
Fixed (audit round 1 — 4 reviewers over v1.0.0..HEAD, 2026-06-11)
- P1
s4 migratecould rewrite.s4dict/<id>dictionary objects as
S4F2-framed data, breaking everycpu-zstd-dictobject in the bucket
(lazy fetch fails fingerprint verification). All three bulk tools
(estimate/migrate/recompact) now exclude S4-internal keys:
*.s4index,.s4dict/, and*.__s4ver__/*versioning shadows. - P1 A client-supplied
x-amz-meta-s4-dict-idon a plain PUT made
the subsequent GET fail 5xx even with--zstd-dictunset (default-off
behavior regression). The GET dict branch is now gated on the
gateway-managed manifest codec (cpu-zstd-dict), andput_object
strips client-supplieds4-*metadata keys up front. - P1 (s4fs) SSE-encrypted objects could return AES-GCM ciphertext
bytes silently (passthrough+ SSE). s4fs now refuses with
NotImplementedErrorvia three layers:s4-encryptedmetadata,
sidecar SSE binding, andS4E1–S4E6magic sniff. - P1 (s4fs)
<key>.__s4ver__/<version>shadow objects were not
hidden fromls/find/glob (prefix check instead of infix), so
directory dataset scans could silently include stale versions. - P2
migrate/recompactrewrites dropped the source object's
storage class (silent promotion to STANDARD) and object tags; both
are now inherited. ACLs / Object Lock retention remain uninherited
(stated in report notes). - P2
migratetreated a roundtrip-verify failure as a skip
(exit 0); it is now a hard failure (exit 1), matchingrecompact.
Theskipped_verify_failedJSON field remains (always 0) for shape
compatibility. - P2 Cross-bucket CopyObject of a dict-compressed object now
propagates.s4dict/<id>to the destination bucket (idempotent,
content-addressed); previously the copy succeeded but every GET on
the destination failed 5xx. - P2
.s4dict/joined the reserved-key guard: gateway PUT / DELETE
are rejected withInvalidObjectName(reads still allowed) so a
bucket-wide dictionary can't be destroyed through the data path. - P2 (s4fs)
info()no longer trusts a stale sidecar for object
size (staleness-checked first), and binding-less legacy v1 sidecars
are no longer used for size or partial range reads. - P2 (s4fs) dependency floor corrected to
s4-codec>=1.1.0,<2—
the binding APIs s4fs imports don't exist in the 1.0.0 wheel. - P3
estimateno longer aborts the whole run when a sampled
object 404s mid-run (skip + note); module/report now disclose the
single-stream measurement bias vs the server's 4 MiB chunking. - P3
migrate/recompactenforce--max-body-bytesfrom the
GETContent-Lengthbefore buffering;migratenow also cleans up a
stale multi-frame sidecar when its rewrite comes out single-frame. - P3
recompactno longer auto-promotes backend-written framed
objects that lack gateway metadata (unstamped-framedskip; opt back
in with--assume-unstamped-framed). - P3 Dict hardening:
DictCacheis bucket-scoped,train-dict
stampss4-dict-sha256(full-digest verification when present), and
lazy fetch caps dictionaries at 1 MiB. (s4fs)open()on a framed
object with inexact size raises instead of silently truncating
(allow_inexact_open=Truerestores the old clamp). - P3
nvcomp_batchedvalidates device-reported chunk sizes on the
host before the unsafe copy (typed per-item error instead of a
potential OOB read on driver misbehavior).
Added
--gpu-batch-small-puts(opt-in, requires thenvcomp-gpubuild +
a CUDA-capable GPU at boot — the server refuses to start otherwise) —
batch concurrent small PUTs into a single nvCOMP batched-zstd
kernel launch so the GPU pays its fixed launch + PCIe cost once per
batch instead of once per object. Eligibility: sampling dispatcher
pickedcpu-zstd, no--zstd-dictprefix match, declared
Content-Lengthin[--gpu-batch-floor-bytes (default 4 KiB), --gpu-min-bytes (default 1 MiB)). Companion knobs:
--gpu-batch-max-items(flush at N pending bodies, default 32) and
--gpu-batch-window-ms(flush after T ms, default 4 — also the
worst-case latency the batch path adds to a PUT). Wire format is
unchanged: batched objects are byte-layout-identical standard
nvcomp-zstdbodies (same FCG1 framing +CodecKind::NvcompZstd
manifest as the per-object GPU path; no new codec id, no new
metadata) and the GET path has zero batch awareness — proven by
GPU-gated tests that decompress batch output through the unmodified
per-object path, plus a MinIO e2e (tests/gpu_batch_e2e.rs).
Fail-open semantics: queue full (backpressure), GPU error, or a
batched result that is not smaller than the input all fall back to
the pre-existing cpu-zstd framed path — observable via the new
s4_gpu_batch_total{result="batched"|"fallback"}counter. Measured
on 1000 × 8 KiB log-like objects (RTX 4070 Ti SUPER, nvCOMP
5.2.0.10): batched GPU = 29.7 ms vs 702 ms per-object GPU (~24×) vs
15.7–19.5 ms single-thread cpu-zstd-3; GPU output ~10% smaller
(12.31× vs 11.14× ratio). Honest verdict in README §"GPU small-PUT
batching": this offloads CPU and improves ratio — it does not beat a
free CPU core on raw wall time at 8 KiB. New public surface:
s4_codec::nvcomp_batched::NvcompZstdBatchEncoder(feature-gated),
s4_server::gpu_batch(aggregator +GpuBatchHandle),
S4Service::with_gpu_batch, and thegpu_small_batchbench. Flag
off (default) = bit-for-bit unchanged PUT behaviour.s4 recompact <bucket>[/prefix] --endpoint-url <BACKEND> [--execute]—
rewrite cpu-zstd framed objects at a higher zstd level during a quiet
window (LSM-compaction for S3). The gateway's PUT path favours latency
(--zstd-level, default 3); recompact decodes each S4-framed cpu-zstd
object in-process (sameFrameIterwalk as the GET path — doubles as
an integrity check on the stored frames), re-frames the original bytes
with the samestreaming_compress_to_frames+pick_chunk_sizepair
the PUT path uses at--target-zstd-level(default 19), and overwrites
only when the new frames shrink the stored bytes by
--min-gain-percent(default 3%). Rewritten objects are stamped with
news4-zstd-levelmetadata (recompact-only stamp — the gateway
neither reads nor writes it), making re-runs idempotent
(already-compactedskip) with no checkpoint file.
--older-than <DUR>(30d/12h/45m/90s) restricts the run
to cold objects by backendLastModified. Dry-run by default;
mandatory decompress-roundtrip byte comparison before every write (no
off switch) and a pre-PUT HEAD ETag re-check (narrows, does not close,
the concurrent-writer race). Skip taxonomy:not-s4(runs4 migrate
first) /already-compacted/unsupported-codec(passthrough,
cpu-gzip,nvcomp-*,cpu-zstd-dict— this tool is cpu-zstd →
cpu-zstd only) / `unst...
v1.0.0 — SemVer-stable surface freeze
[1.0.0] — 2026-06-09
v1.0 — SemVer-stable surface freeze. From v1.0 onward the items
enumerated in README.md §"Stability — v1.0 guarantees"
are frozen for the v1.x line; any incompatible change to them ships
in a v2.0.0 release with migration recipes under docs/migration/.
v1.0 is not a marketing claim that "S4 has been battle-tested at
every Fortune 500." It is a contract that downstream consumers can
pin s4-server = "1" (or s4-codec = "1", or s4-config = "1", or
ghcr.io/abyo-software/s4:1) and rely on the surface listed in
README.md. First public production deployment reference is still
being collected — file an issue tagged production-reference if
you are running S4 at TB scale.
Surface freeze — what's in the v1.0 contract
See README.md for the table. Briefly:
- Wire formats:
S4F2framed body,S4P1padding,S4IXv1/v2/v3
sidecars,S4E1/S4E2/S4E3/S4E4/S4E5/S4E6SSE envelopes s4binary subcommands (verify-sidecar,repair-sidecar,
sweep-orphan-sidecars,verify-audit-log, plus the server's
documented--<flag>set)s4_server::repair::*public API (verify/repair/sweep + all
related error / report / policy types)s4_server::service::S4Serviceshape —new(backend, registry, dispatcher)constructor + everypub fn with_*builder signature
(23 of them — exact list in README); + theSharedServicenewtype
ats4_server::service_arc::SharedService; +SigV4aGate/
SigV4aGateError/resolve_range/DEFAULT_MAX_BODY_BYTES/
DEFAULT_REPLICATION_MAX_CONCURRENTs4_server::ssepublic surface (frozen types, functions, constants)s4_server::streamingpublic surface (frozen constants + functions)s4-codeccodec trait + format constants (Codec trait shape;
CodecKind / CodecError / IndexError / FrameError / GpuSelectError /
CompareOp enums all#[non_exhaustive]; index module's pub structs- functions + constants; multipart::FrameHeader layout)
s4-config:CompressionModeenum (#[non_exhaustive]) +
BackendConfig/S4Configstruct field sets- HTTP API surface:
s3s 0.13trait set (S3 wire compatibility) - Container image tags + Helm chart
values.yamlkey set (full
enumeration of 28 top-level keys in README)
Added
- Stability section in
README.md(§"Stability — v1.0 guarantees")
enumerating the v1.0 freeze surface with explicit scope rules. docs/security/cargo-audit-ignores.md— per-advisory rationale +
mitigation + upstream-tracking for the 4 accepted RUSTSEC ignores
(2026-0098 / 2026-0099 / 2026-0104 / 2025-0134), with verification
commands to re-check each fact.- README "Backend compatibility matrix" sub-section inside §Stability
documenting CI-verified state honestly: ✓ gating for MinIO; ⚠ opt-in
for AWS/B2/R2/Wasabi (gate only when operator-configured secrets
are set); ⚠ claimed-but-not-CI-verified for Garage + Ceph RGW with
the specific drift symptoms documented. - README "Modules NOT in the freeze list" sub-section enumerating
the 25s4_server::*modules that exist aspub modfor binary- tests needs but are NOT part of the v1.0 contract.
- README "How to read the freeze table — scope of 'frozen'"
sub-section: items named in the table ARE the v1.0 contract; other
pubitems in those modules are NOT; pin=1.x.yif depending on
unlisted items. - README "v0.x → v1.0 source compatibility note" sub-section listing
all 34 enums annotated#[non_exhaustive](6 s4-codec + 27
s4-server + 1 s4-config) + the mechanical consumer-side fix
(add_ =>arm) for exhaustive matches.
Changed
- 34 public enums on the frozen surface gained
#[non_exhaustive]
for forward-compat additive variants. Source-level breaking
change for downstream code with exhaustivematcharms; fix is
mechanical (add_ =>). See README §"v0.x → v1.0 source
compatibility note" for the full enum list and rationale. pub fn encode_index_v1_for_test(and other_for_testhelpers)
gated out of the v1.0 public API via#[cfg(test)] pub(crate)
visibility +#[doc(hidden)].crates/s4-codec-py/pyproject.tomlPyPI trove classifier bumped
fromDevelopment Status :: 3 - Alpha→5 - Production/Stable
to match the v1.0 frozen-API contract.SECURITY.mdSupported Versions section rewritten from "pre-1.0,
latest commit on main" → "v1.x rolling window of latest minor +
previous minor; patch releases on the affected minor's release
branch".- Backend compat matrix table in
compat-matrix.ymlnow reflects
the round-trip-vs-provisioning gate distinction; Garage and Ceph
round-trips arecontinue-on-errorwith explicit warning steps
documenting the wire-shape drift symptoms. - README disclaimers updated from alpha / early-access / pre-1.0
framing to the v1.0 "surface freeze ≠ production track record"
narrative. - Helm chart
values.yamlkey set is now frozen at v1.0; key shape
changes are v2.0 territory. Chart's ownversionstays in 0.2.x
(Helm-side SemVer, independent of appVersion);appVersionbumps
to1.0.0. crates/s4-codec-py/README.md+ Cargo.toml + pyproject.toml
metadata updated from "GPU/CPU compression" to "CPU compression"
to match what the Python module actually exports in v1.0
(CpuZstd+CpuGziponly; GPU codec classes are intentionally
NOT exposed in v1.0).crates/s4-codec-wasm/README.mdstatus header updated from
"v0.4 #24 — initial cut" to "v1.0 — frozen public API"..github/workflows/ci.ymlsecurity-auditjob comment corrected:
rustls-pemfileis a runtime dep (used by the production HTTPS
listener intls.rs), not "dev-only" as the prior comment claimed.
Fixed
compat-matrix.ymlGarage start step: replaced over-broad
awk '/HEALTHY|UNHEALTHY|NO ROLE/'that matched the
==== HEALTHY NODES ====table header line in
dxflrs/garage:v1.1.0output (producingNODE_ID="===="and a
hard-fail atlayout assign). Now usesgarage node id -q
directly, which returns<hex>@<addr>.compat-matrix.ymlCeph RGW + Garage round-trip steps: marked
continue-on-errorbecausequay.io/ceph/demo:latest-quincyis
unmaintained upstream (XAmzContentSHA256Mismatch) and
dxflrs/garage:v1.1.0rejects current aws-sdk-rust's
STREAMING-AWS4-HMAC-SHA256-PAYLOAD (Invalid payload signature).
Provisioning steps still gate for both.
Roadmap candidates (v1.x, additive only)
- Chunked SSE-KMS envelope (provisional
S4E7) + chunked SSE-C
(provisionalS4E8) for Range GET partial-fetch fast-path. S4F3streaming frame format enabling streaming PUT checksum
verify for multipartupload_part.- 32-bit runtime smoke promoted from advisory to required CI gate.
- Per-action SHA pinning on GHA workflows.
- Cross-region replication promoted from experimental scaffolding
to production-grade with Jepsen-style consistency tests. - Re-introducing Garage + Ceph as
✓ gatingonce upstream signature
/ image issues resolve. - GPU codec exposure in the Python module.
- Streaming decoder API in the WASM module.
- npm publish automation for the WASM package.
- Japanese README (
README.ja.md) brought current to v1.0.
Audit history
7 rounds of dual-reviewer (Opus + Codex) adversarial audit drove
~30 individual findings to closure across this cycle:
- R0 (pre-session, on v1.0 draft README): Opus + Codex, 13 findings
spanning enum non_exhaustive coverage, README freeze accuracy,
s3s 0.13 policy, cargo-audit ignores doc, compat-matrix evidence,
cross-major back-compat caveats. - R1: Cluster A (F1 + F2 + F3 sub-agent parallel fixes) + Cluster B
(main-session README + audit-ignores doc rewrite) + Cluster C
(compat-matrix manual triggers + Garage / Ceph best-effort wrap). - R2: NF-1 —
SharedServicepath correction (s4_server::service→
s4_server::service_arc). - R3 (dual reviewer): 11 new findings → fix wave including
S4Service::defaultfabrication removal, cloud-backend opt-in
honest qualifiers, S4Service builder-param contradiction caveat,
FrameIndex inner-type freeze, v0.x→v1.0 source-break caveat. - R4: 4 P2 + 1 P3 — Python class name correction, enum list
completeness, SECURITY.md update, FrameIndex own-field freeze. - R5 (dual): scope-explicit freeze sub-section + Python exception
enumeration + binding README updates. - R6 (dual, split verdict): Codex P1/P2/P3 closures — Python pkg
GPU marketing removal, PyPI classifier bump, CompressionMode
non_exhaustive. - R7 (dual):
s4 = "1"→s4-server / s4-codec / s4-config = "1",
freeze-scope enum-list wording correction, Python README GPU
build-recipe v1.0 caveat, EOF whitespace, SOCIAL_POSTS.md
historical-artifact banner.
Cut-commit changes
Cargo.toml: workspace.version0.11.0→1.0.0crates/s4-server/Cargo.toml: internal-dep pins
s4-codec,s4-config"0.11"→"1"crates/s4-codec-wasm/Cargo.toml: internal-dep pin
s4-codec"0.11"→"1"crates/s4-codec-py/Cargo.toml: internal-dep pin
s4-codec-rs"0.11"→"1"(already landed in round-7 wave;
noted here for completeness)charts/s4/Chart.yaml:appVersion0.11.0→1.0.0;
chart's ownversion0.2.2→0.2.3(appVersion bump only,
no chart-shape change)
v0.11.0 — polish + maintenance (32-bit + Node 24 + compat matrix, 6-round audit clean)
Third v0.1x-line cut. Polish + maintenance theme — no production code changes, all 9 GHA workflows + docs + composite actions only. Three-theme wave-1 delivery converged by a 6-round integrated audit (4 P2 + 1 P1 real fixes, 2 false-positive rounds caused by Codex review sandbox network limits — documented inline).
Net diff vs v0.10.0: ~12 files / ~1,400 lines across .github/, docs, charts. Published to crates.io as s4-server@0.11.0 + 3 sibling crates. Container images on ghcr.io: ghcr.io/abyo-software/s4:0.11.0 (multi-arch CPU) + :0.11.0-gpu (nvCOMP amd64) — built automatically by the v0.11.0 tag push.
Wave-1 themes
-
#A4 — 32-bit
s4-serverruntime end-to-end PUT/GET smoke (ci.ymli686-runtime-smokejob). The v0.10 #A4--help/--versionsmoke is now a full MinIO-backed PUT/GET round-trip exercising the i686 hyper/rustls listener, aws-sdk-rust SigV4 signer, and CPU-zstd codec paths. The PUT/GET step lands as advisory (continue-on-error: true) so a first-time 32-bit runtime bug surfaces in the job log + uploaded server artifact without flipping CI red; promote to required after a stretch of green main pushes. README §"Supported targets" 32-bit row:⚠️ → ✅. -
#A5 — GitHub Actions Node.js 24 migration. 11 JavaScript actions bumped to their Node 24-ready majors (closing the 2026-09-16 deprecation gate GHA logs have been warning about):
Action v0.10 v0.11 actions/checkout v4 v5 actions/upload-artifact v4 v6 actions/download-artifact v4 v7 actions/github-script v7 v8 codecov/codecov-action v4 v5 docker/build-push-action v5 v7 docker/login-action v3 v4 docker/setup-buildx-action v3 v4 docker/metadata-action v5 v6 aws-actions/configure-aws-credentials v4 v6 azure/setup-helm v4 v5 Unchanged (already Node 24 at floating tag):
Swatinem/rust-cache@v2,benchmark-action/github-action-benchmark@v1,dtolnay/rust-toolchain@stable|@nightly(composite). actionlint clean across all 9 workflows. -
#A7 — Backend compatibility matrix CI (
compat-matrix.yml, weekly schedule + workflow_dispatch). Exercises a PUT/GET + sidecar HEAD round-trip per S3-compatible backend S4 claims support for:- Docker tier (no secrets): MinIO + Garage + Ceph RGW (best-effort, upstream demo image unmaintained)
- Real-cloud tier (operator-provided vars + secrets, silent skip when absent): Backblaze B2 + Cloudflare R2 + Wasabi
Composite local action
.github/actions/compat-roundtrip/action.ymlfactors the per-backend step. README §"How it Compares" gains a 7-row compat matrix (✅ verified /⚠️ best-effort / 🔧 configurable in operator CI).
Audit closeout (v0.10.0..v0.11.0)
| Round | Severity | Fix |
|---|---|---|
| R1 | P2 | 3fceddd — restore SLSA + SBOM on per-arch builds (imagetools create can't retroactively patch) |
| R2 | P2 | c29d69f — restore OCI image labels on per-arch builds + scope compat-matrix TEST_KEY to ${{ github.run_id }} |
| R3 | P2 | 08545ba — propagate test-key to composite action + flavor-independent merge (CPU arm64 fail no longer skips GPU publish) |
| R4 | P1 | 157d7e7 — expected-digest-count guard: refuse partial multi-arch publish (CPU arm64 fail must not overwrite :<version> as amd64-only) |
| R5 / R6 | false-positive | eebc7e2 — action-version policy comment documents the Codex sandbox network limitation that hallucinated "action versions unpublished" twice |
Two false-positive rounds count as effective 2-round clean — every flagged action major (actions/checkout@v5, upload-artifact@v6, download-artifact@v7, github-script@v8, etc.) was verified via gh api /repos/<owner>/<repo>/releases/latest AND every CI run since wave-1 ship (commit 3332f3e) resolves them cleanly.
Cleanup recipe for already-shipped v0.9.0 / v0.10.0 images
The imagetools create shape introduced in v0.10.0 lost OCI labels + SLSA + SBOM. To re-attach them to the existing tags:
gh workflow run docker.yml --ref main \
-f build_ref=v0.10.0 \
-f image_tag_override=0.10.0 \
-f push=true
gh workflow run docker.yml --ref main \
-f build_ref=v0.9.0 \
-f image_tag_override=0.9.0 \
-f push=trueEach per-arch rebuild attaches the labels + attestations now that the build step has them; the merged manifest under each tag overwrites the prior labels-less manifest.
Coverage
- Workspace tests unchanged (~720 pass, 0 fail) — production code untouched.
- New CI workflows: 1 new (
compat-matrix.yml) + 9 modified (Node 24 bumps + i686 PUT/GET). - v0.11.0
compat-matrixfirst weekly fire: Sunday 06:00 UTC.
v0.12+ candidates (deferred)
- Chunked SSE-KMS envelope (provisional
S4E7) + chunked SSE-C (S4E8) → Range GET partial-fetch for those modes. S4F3streaming frame format → streaming PUT checksum verify for multipartupload_part.- 32-bit
s4-serverruntime end-to-end smoke promoted from advisory to required (after green-main stretch observed). - Per-action SHA pinning instead of floating major tags (security hardening).
Full changelog
See CHANGELOG.md for the per-finding detail.
🤖 Generated with Claude Code
v0.10.0 — encryption-aware completion + Docker distribution + hardening (4-round audit clean)
Second v0.10-line cut (= first v0.10). Two-wave delivery of the encryption-aware sidecar completion + Docker image distribution + hardening theme, converged by a 4-round integrated audit (2 P2 fixes, clean R3 + R4).
Net diff vs v0.9.0: ~12 files / ~1,800 lines across s4-server, the Helm chart, the distribution workflows, and the docs.
Published to crates.io as s4-server@0.10.0, s4-codec@0.10.0, s4-config@0.10.0, s4-codec-py@0.10.0. Container images on ghcr.io: ghcr.io/abyo-software/s4:0.10.0 (multi-arch CPU) + ghcr.io/abyo-software/s4:0.10.0-gpu (nvCOMP, amd64) — built automatically by .github/workflows/docker.yml on this tag push. Install via cargo install s4-server or helm install s4 ./charts/s4 --set image.tag=0.10.0 --set backend.endpointUrl=https://....
Wave-1 — encryption-aware completion + Docker distribution
-
s4 repair-sidecar --sse-s4-key <PATH>(--sse-s4-key-rotated id=N,key=PATH) plumbing closes the v0.9EncryptedSidecarUnsupportedreject path. The CLI now decrypts SSE-S4 chunked (S4E6) bodies in-process via the keyring, frame-scans the recovered plaintext, and stamps a v3 sidecar so subsequent Range GETs hit the encryption-aware partial-fetch fast-path. New lib entrys4_server::repair::repair_sidecar_with_keyring;RepairReport::sse_v3_bindingexposes the rebuilt SSE binding.RepairError::SseDecryptFailedfor keyring mismatches. Hardened against attacker-controlled S4E6 header inflation viaSSE_S4_REPAIR_MAX_OVERHEAD_BYTES+SSE_S4_REPAIR_MAX_CHUNK_SLACK_BYTEScaps. -
Official container images on GitHub Container Registry. New
.github/workflows/docker.ymlbuilds + pushesghcr.io/abyo-software/s4:<version>(CPU multi-archlinux/amd64+linux/arm64) andghcr.io/abyo-software/s4:<version>-gpu(nvCOMP GPU, amd64 only — nvCOMP redist x86_64-only) on everyv*.*.*tag push. SLSA build provenance (mode=max) + SPDX SBOM via Buildx. GHA-backed layer cache scoped per flavor. Mutable tags (latest,<major>.<minor>) gated on stable tag-push events only so prereleases (-rc1) and back-fill workflow_dispatch runs can't move them backward.workflow_dispatchsupportsbuild_ref+image_tag_overridefor back-filling images for tags that pre-date the workflow. Helm chart defaultimage.repositoryflipped to ghcr (chartversion0.1.0 → 0.2.1,appVersion→ 0.10.0);docker-compose.{,gpu}.ymladdimage:alongsidebuild:. -
SSE partial-fetch AEAD constraint documentation — new
docs/security/sse-partial-fetch-constraint.mdwalks the AEAD authenticated-encryption contract (NIST SP 800-38D §7.2 quoted), per-mode wire layout, why only S4E6 escapes the constraint (per-chunk nonce + tag), and provisional S4E7 (chunked-KMS) / S4E8 (chunked-SSE-C) roadmap candidates. README §"Server-side encryption — Range GET fast-path matrix" makes the support matrix explicit.
Wave-2 — hardening
-
i686 runtime smoke CI — new
i686-runtime-smokejob in.github/workflows/ci.ymlinstalls gcc-multilib + libc6:i386, runscargo test --target i686-unknown-linux-gnu -p s4-codec -p s4-config --release, builds thes4binary for i686 (continue-on-errorfor the aws-sdk-rust / rustls / ring stack), and invokess4 --help/s4 --versionon the i686 ELF. README §"Supported targets" cell flips from "⚠️ compiles, untested at runtime" to "✅ compiles +--help/--versionsmoke (CI)". -
Docker / Helm distribution smoke CI — new
.github/workflows/docker-smoke.ymlvalidates the v0.10 #B1 distribution surface on every push that touches it (path-filtered tocharts/**,Dockerfile*,docker-compose*.yml, plus the docker / docker-smoke workflow files). Three independent jobs:helm-lint-template(helm lint+ threehelm templateinvocations: default, pinned tag, GPU suffix),docker-compose-config(both compose files + assert ghcr image refs present),image-smoke(docker pull ghcr.io/abyo-software/s4:latest+--help/--version,continue-on-error: trueon pull for the not-yet-published case). -
Streaming PUT checksum coverage matrix doc — new
docs/security/streaming-checksum-coverage.mddocuments the codec-API constraint that limits the v0.9#streaming-checksumtee-into-hasher fast-path to single-PUTcpu-zstd/nvcomp-zstd(Codec::supports_streaming_compress() == true). Same "fundamental contract, not deferred plumbing" framing as the SSE-side#A2-doc. Three preconditions for streaming win (streaming codec + streaming downstream + no full-body framing dependency) + which paths meet how many + roadmap candidates (S4F3streaming frame, streaming nvCOMP, multipart streamingupload_part) with the upstream API blockers for each.
Audit posture
- 6 per-feature audits (15 Codex CLI rounds total): A1 = 5R, B1 = 4R, B2 = 1R, A2-doc = 1R, A3-doc = 0R, A4 = 0R.
- 4-round integrated cross-feature audit on the full v0.9.0..main range. 2 P2 fixes (Dockerfile
s4 s4 --helparg dup in the docker-smoke workflow; docker.yml back-fill:main+:sha-<x>mis-tag from dispatcher ref). Clean R3 + R4 — 2 consecutive convergence rounds. - Zero P1 across all rounds. Both P2 integrated-audit findings caught BEFORE the corresponding image actually shipped (back-fill v0.9.0 image build was in-flight at the time R2 caught the mis-tag; v0.10.0 ships with the fix in place).
cargo auditclean (same 4 documented ignores as v0.9.0 / v0.8.22: RUSTSEC-2026-0098/0099/0104 in the upstream aws-sdk-rust TLS stack, RUSTSEC-2025-0134 unmaintained dev-only rustls-pemfile).
Coverage
- ~720 workspace tests pass, 0 failed (unchanged from v0.9.0).
- v0.9.0 baseline plus: 4 new A1 unit tests in
s4_server::repair, 3 new A1 MinIO E2E tests insidecar_repair_via_minio.rs. Lib unit count inrepairmodule now ~21. - New CI workflows:
docker-smoke.yml(3 jobs),i686-runtime-smoke(added toci.yml).
v0.11+ follow-up (deferred, scope-out)
- Chunked SSE-KMS envelope (provisional
S4E7) + chunked SSE-C (S4E8) → would enable Range GET partial-fetch for those modes. S4F3streaming frame format → would enable streaming PUT checksum verify for multipartupload_part.- Streaming
nvcomp-bitcomp/nvcomp-gdeflate(= GPU codec API rework upstream of S4). - 32-bit
s4-serverruntime: end-to-end PUT/GET smoke (today's smoke is--help/--versiononly). - v0.9.0 ghcr.io back-fill: workflow_dispatch in flight at cut time will publish
ghcr.io/abyo-software/s4:0.9.0+:0.9.0-gpu. The 2 P2 fixes in the v0.10 integrated audit mean future back-fills don't mis-tag:main/:sha-<x>— but the v0.9.0 back-fill ran on the pre-fix workflow and may have published those mis-tags. Operator cleanup recipe: triggergh workflow run docker.yml --ref main -f push=true(no inputs) once the back-fill finishes to refresh:main/:main-gputo current main HEAD content, overwriting the mis-tagged v0.9.0 entries.
Full changelog
See CHANGELOG.md for the per-finding detail.
🤖 Generated with Claude Code
v0.9.0 — six-feature roadmap landing + 7-round integrated audit (clean)
First v0.9 minor cut. Six roadmap items shipped in this release line, followed by a 7-round integrated cross-feature audit that converged on round 7 (clean bill of health). Net diff vs v0.8.22: 26 files / +8,500 lines across s4-codec and s4-server, all behind opt-in flags or new subcommands — no behavioral change on existing CLI surface or default-config deployments.
Published to crates.io as s4-server@0.9.0, s4-codec@0.9.0, s4-config@0.9.0, s4-codec-py@0.9.0. Install via cargo install s4-server.
Headline additions
-
Operator tooling —
s4 verify-sidecar/s4 repair-sidecar/s4 sweep-orphan-sidecarssubcommands close the gap that v0.8.xdocs/orphan-sidecar-recovery.mdleft as a manualaws-clirecipe. Library APIs4_server::repair::{verify_sidecar, repair_sidecar, sweep_orphan_sidecars}available for programmatic use.DeletePolicy::{DryRun, PairBoundOnly, IncludeUndecodable}tiers protect legacy reserved-name user data (the v0.8.17--allow-legacy-reserved-key-readsmigration scenario) from accidental sweep delete. -
Performance regression gate — criterion-based bench targets (~30 bench points across
codec_roundtrip/frame_codec/index_codec) + GitHub-Pages-backed trend chart viabenchmark-action/github-action-benchmark. Bench workflow auto-bootstraps thegh-pagesbranch on first push. -
Encryption-aware sidecar (SSE-S4 chunked / S4E6) — Range GET on
--sse-chunk-size > 0objects now hits a partial-fetch fast path via the new v3 sidecar format (extends v2 with a 30-byte SSE binding block: chunk_size + chunk_count + key_id + salt + plaintext_len + header_bytes). SSE-KMS / SSE-C / SSE-S4 buffered (S4E2) / multipart remain on the v0.8.12 #120 buffered fallback (deferred to v0.10+). -
True streaming PUT checksum verify (tee-into-hasher) for
cpu-zstd/nvcomp-zstdsingle-PUT — closes the v0.8.13 #127 regression that v0.8.14 #129 reverted to a buffered fallback. HonorsContent-MD5+x-amz-checksum-{crc32, crc32c, sha1, sha256, crc64nvme}headers AND SigV4-streamingx-amz-trailerclaims. Multipartupload_partkeeps the buffered per-part verify (bytes are already in memory there for framing). -
Chaos infrastructure — 5 deterministic backend-fault scenarios (mid-stream GET error, HEAD latency timeout, concurrent overwrite, SSE keyring rotation mid-PUT, multipart Complete failure) replace the v0.8.18 P7 scaffold. In-memory mock backend; no Docker dep, no flake.
-
32-bit cross-compile (
i686-unknown-linux-gnu) across every workspace crate. Runtime is NOT claimed —cargo check --targetparity only. Closes the v0.8.21 R5-8 regression where the 5 GiBusizeconst overflowed on 32-bit.
Audit posture
- 6 per-feature audits (11 Codex CLI rounds total) on the roadmap commits.
- 7-round integrated cross-feature audit on the full v0.9 range (
142e50e..main). Catches gaps per-feature audits couldn't see: encrypted-body handling in sidecar tooling, trailer-verify dispatch consistency, OOM hardening, HEAD→GET TOCTOU on the bounded sidecar fetch. - Zero P1 findings across all 18 rounds. 7 P2 + 1 self-review fix in the integrated audit, all landed.
cargo auditclean (same 4 documented ignores as v0.8.22: RUSTSEC-2026-0098/0099/0104 in the upstreamaws-sdk-rustTLS stack, RUSTSEC-2025-0134 unmaintained dev-onlyrustls-pemfile).
Coverage
- ~720 workspace tests pass, 0 failed.
- 17 new lib unit tests in
s4_server::repair(parsing, ETag normalization,Option<&str>equality semantics,DeletePolicy::allowstruth table, status truth table includingMissingHarmless/MissingDivergent/MissingUnknown, body-cap constant pinning,NotFramed/SidecarTooLarge/EncryptedSidecarUnsupported/OverwrittenDuringRepairwire shapes). - 14 new MinIO E2E tests covering verify-clean, repair-after-delete, repair-after-clobber, sweep-finds-and-deletes-orphan, sweep-pair-bound-only-preserves-undecodable, post-PUT race detector (best-effort), MissingHarmless on small single-PUT, encrypted-body reject, P2-R3 NotFramed reject (empty + raw body), P2-R4 verify-side MissingHarmless, P2-R5 oversized-sidecar sweep classification, plus 4 server-side encryption-aware sidecar tests (chunked range-GET uses v3 partial-fetch, round-trip correctness, buffered fallback unchanged, non-SSE PUT still emits v2).
- 6 deterministic chaos scenarios + scaffold smoke.
v0.10 follow-up (deferred, scope-out)
- Encrypted sidecar repair via CLI keyring plumbing (
--sse-s4-key <path>). - Encryption-aware sidecar for SSE-KMS / SSE-C / S4E2 / multipart.
- Streaming PUT checksum verify for multipart
upload_part+ GPU codec non-streaming branch. - 32-bit runtime smoke test of
s4-server(currentlycargo check --targetparity only).
Full changelog
See CHANGELOG.md for the per-finding detail.
🤖 Generated with Claude Code
v0.8.22 — eighth-round convergence (clean bill of health)
Convergence reached. Eight consecutive Codex CLI + Claude
Code review rounds against this codebase, totalling 130+ fixes
across 5 security audit cycles + 1 production-readiness sweep +
3 doc-accuracy sweeps. Round 8 returned clean bill of health — convergence reached.
Skipped intermediate versions: v0.8.20 was never published to
crates.io (Round 6 caught a silent-truncation regression in
v0.8.20 R5-8 → reverted in v0.8.21 R6-1) and v0.8.21 was
never published (Round 7 caught that v0.8.21 R6-6 introduced
a fresh fabrication in the SIGUSR1 grep recipe → fixed in
v0.8.22 R7-1). End users go straight from v0.8.19 → v0.8.22.
Published to crates.io as s4-server@0.8.22, s4-codec@0.8.22,
s4-config@0.8.22, s4-codec-py@0.8.22. Install via
cargo install s4-server.
What converged
Round 7 → v0.8.22 (#200-#202)
- #200 R7-1 — Runbook §1 SIGUSR1 grep target corrected to
"S4 SIGUSR1: dumped attached-manager snapshots"(the real
substring inmain.rs:1830). v0.8.21 R6-6 used a
hand-written string that never matched. - #201 R7-2 — README §roadmap "v0.8.8 released
(2026-05-20)" replaced with a moving-target reference to
CHANGELOG + GitHub Releases. The pinned bullet was 13
patches stale. - #202 R7-3 — Threat-model + runbook "Last reviewed"
stamps both bumped tov0.8.22with a one-line Stamp
policy declaring future cuts bump both in lockstep.
Round 6 → v0.8.21 (#194-#199, rolled into v0.8.22)
- #194 R6-1 — Reverted v0.8.20 R5-8's silent-truncation
regression.--max-body-bytesdefault stays as the bare
5 * 1024 * 1024 * 1024literal, which is a loud compile
error on 32-bit — the correct failure mode. - #195 R6-2 — Runbook trailing "Metric-naming note"
s4_requests_total{status=~\"5...\"}→result=\"err\". - #196 R6-3 — Runbook "Last reviewed" stamp bumped
(R7-3 then re-bumped in lockstep with threat-model). - #197 R6-4 — AWS SigV4 vectors docstring reverted
get-utf8-path→get-utf8(R5-7 walk-back; AWS upstream
name isget-utf8). - #198 R6-5 — Orphan-sidecar roadmap aligned with
README #106 (v0.9s4-tool repair-sidecar/verify). - #199 R6-6 — Runbook §1 SIGUSR1 recipe drops
sleep 1
in favour ofjournalctl ... | grep -m1 ...+sleep 5
fallback. (R7-1 then fixed the grep target itself.)
Round 5 → v0.8.20 (#186-#193, rolled into v0.8.22)
- #186 R5-1 — Runbook §1 "graceful shutdown dumps state"
claim removed. Only SIGUSR1 dumps; shutdown only drains
the access-log buffer. - #187 R5-2 — Runbook §2 / §3 / §7 / §8 metric names
canonicalised. v0.8.19 D-6 only covered §12's dedicated
alert table; the other 4 sections shipped fabricated
names. Real names now:s4_gpu_oom_total,
s4_requests_total{result=\"err\"},
s4_replication_{dropped,replicated,status_swept}_total,
s4_tls_cert_reload_total{result=\"err\"}. - #188 R5-3 — README + SOCIAL_POSTS drop the fabricated
s4_codec_chosen_total{codec}— thecodeclabel lives
on the reals4_requests_totalcounter. - #189 R5-4 —
docs/orphan-sidecar-recovery.mdshell
recipe definesBACKEND_ENDPOINTalongsideENDPOINT
(the recipe used\$BACKEND_ENDPOINTwithout ever
defining it). - #190 R5-5 — Orphan-sidecar stale "v0.8.17 may add"
claim advanced (then R6-5 / R7 re-aligned to v0.9). - #191 R5-6 — Threat-model stamp bumped (R7-3 then
re-bumped in lockstep with runbook). - #192 R5-7 — AWS SigV4 vectors docstring (R6-4 walked
this back since the AWS upstream name isget-utf8, not
get-utf8-path). - #193 R5-8 —
--max-body-bytesdefault throughu64
cast. Reverted in R6-1 — silent truncation on
32-bit was the wrong direction.
Cumulative scope (all 8 audit cycles)
| Round | Issues fixed | Cumulative cuts |
|---|---|---|
| R1 (security cycle 1) | CRIT 5 + HIGH 9 + MED 4 + hotfix | v0.8.11–v0.8.14 |
| R2 (security cycle 2) | HIGH+MED 18 | v0.8.15 |
| R3 (security cycle 3) | follow-up 15 + 5 | v0.8.16, v0.8.17 |
| R4 (production readiness) | P1-P7 + #172 | v0.8.18 |
| R4 (doc audit) | 12 | v0.8.19 |
| R5 (metric fabrication sweep) | 9 | v0.8.20 ⛔ skipped publish |
| R6 (silent-truncation regression) | 6 | v0.8.21 ⛔ skipped publish |
| R7 (fresh-fabrication sweep) | 3 | v0.8.22 ✅ published |
| R8 (convergence check) | 0 — clean | — |
Operator-visible knobs cumulative
--trust-x-forwarded-for (v0.8.11),
--prefer-columnar-gpu (v0.8.13),
--allow-legacy-reserved-key-reads (v0.8.17),
--max-body-bytes (v0.8.19).
Tests
449 lib + 45 integration + 11 AWS SigV4 vectors + 2 server
bolero fuzz + 1 chaos = total target count ≈ 540, all green
under RUSTFLAGS=\"-D warnings\"; cargo clippy --workspace --all-targets clean; cargo fmt --all --check clean; MinIO
E2E + coverage + bench-smoke jobs all green on CI.
Upgrade notes
- No new operator-visible knobs since v0.8.19. The same four
opt-ins above. - The v0.8.20 → v0.8.21 skip on crates.io means end users on
v0.8.19 get every fix in #186 through #202 in a single
upgrade.
Recommended pre-launch reading order
docs/security/threat-model.mddocs/ops/runbook.mdREADME.md- Per-version per-issue notes:
CHANGELOG.md
v0.8.19 — fourth-round doc-accuracy sweep + --max-body-bytes CLI flag
Fourth-round Codex CLI + Claude Code review of v0.8.18 caught
fabrications in the v0.8.18 runbook + threat-model + bolero
module doc (written from memory rather than verified against
the source tree) and one missing CLI flag the threat model
already advertised. v0.8.19 closes all 12 items.
Published to crates.io as s4-server@0.8.19, s4-codec@0.8.19,
s4-config@0.8.19, s4-codec-py@0.8.19. Install via
cargo install s4-server (CPU build).
What's new since v0.8.18
Added (#174)
- #174 D-1 —
--max-body-bytes <BYTES>CLI flag. The cap
was builder-only before v0.8.19 (with_max_body_bytes), but
the threat model already advertised it as an operator-
tunable defence — the doc was right; the missing piece was
the CLI flag. Default5 GiBmatches the AWS S3 single-PUT
max.
Fixed (#175-#185, doc / minor)
- #175 D-2 —
docs/security/threat-model.mdno longer
references a non-existent--state-dir. Replaced with the
per-manager--<x>-state-filelist (versioning,
object_lock, mfa_delete, cors, inventory, notifications,
tagging, replication, lifecycle). - #176 D-3 — Runbook §1 (disk full) rewritten. The pre-D-3
text told operators thatsystemctl reloadwould "stop
accepting new connections" — SIGHUP only rotates TLS
certificates. Mitigation path now correctly says front S4
with a load balancer + drain there, or change
--max-concurrent-connectionsand restart (not reload). - #177 D-4 — Runbook §6 (MFA-Delete recovery) now points
at the--mfa-delete-state-file <PATH>operator-supplied
file, not the fictionalmfa.jsonunder a fictional
--state-dir. - #178 D-5 — Runbook §12 (signals) SIGUSR1 description was
wrong: pre-D-5 it claimed access-log flush; reality is the
v0.8.5 #86 helper atomically dumps every in-memory state
manager (versioning / object_lock / mfa_delete / cors /
inventory / notifications / tagging / replication /
lifecycle) to its--<x>-state-file. Access-log buffer
drains on shutdown, not on SIGUSR1. - #179 D-6 — Runbook metric reference table renamed every
metric to its canonical name in
crates/s4-server/src/metrics.rs. The pre-D-6 table cited
s4_backend_error_total,s4_replication_pending_total,
s4_replication_completed_total,
s4_replication_failed_total,
s4_tls_cert_reload_failed_total,
s4_gpu_compress_oom_total— none of those exist.
Real names:s4_replication_dropped_total,
s4_replication_replicated_total,
s4_tls_cert_reload_total{result=\"err\"},
s4_gpu_oom_total. - #180 D-7 — Runbook PromQL alert syntax corrected:
action=\"s3:Bypass*\"(literal*, never matches) →
action=~\"s3:Bypass.*\"(regex matcher). - #181 D-8 — Runbook §4 SSE-S4 rotation typo
retiredsl
→retired slots. - #182 D-9 —
crates/s4-server/tests/fuzz_bolero.rs
module doc trimmed to the 2 targets actually shipped
(sigv4a_auth_header_bolero,policy_json_bolero). The
pre-D-9 text claimed 4 targets (including a
pub(crate)-re-export-based one that doesn't exist). The
two missing targets are tagged honestly as v0.8.19+
roadmap. - #183 D-10 —
crates/s4-server/tests/chaos.rsplaceholder
smoke test now carries concreteassert_eq!checks so a
future refactor can't accidentally leave the file
compiling-but-useless. - #184 D-11 — AWS SigV4 vectors module doc no longer
claims every vector comes from the AWS-published suite.
Split honestly into AWS-published (4 vectors) and
S3 spec-derived edge vectors (7 vectors, motivated by the
v0.8.16 #150 byte-level fix). - #185 D-12 — Threat-model residual risk #4 (versioned
multipart Range GET fall-back to full read) now includes
the cost note about large multipart objects + range-heavy
workloads.
Tests
449 lib + 45 integration + 11 SigV4 vectors + 2 bolero + 1
chaos = unchanged from v0.8.18; all green under
RUSTFLAGS=\"-D warnings\"; cargo clippy --workspace --all-targets clean; cargo fmt --all --check clean; MinIO
E2E + coverage + bench-smoke jobs all green on CI.
Notes
- v0.8.19 closes the fourth-round audit. Four full audit
cycles (3 security + 1 production-readiness + 1
doc-accuracy) have now run against this codebase. The doc
fabrications (#175–#180) were a reminder that runbooks
written from memory are unreliable; future doc work will be
verified against the source tree before each commit. --max-body-bytesis the only new operator-visible knob
since v0.8.17. The four opt-ins now available are:
--trust-x-forwarded-for(v0.8.11),
--prefer-columnar-gpu(v0.8.13),
--allow-legacy-reserved-key-reads(v0.8.17), and
--max-body-bytes(v0.8.19).
Full per-issue notes: CHANGELOG.md.
v0.8.18 — production-readiness sweep (threat model + runbook + AWS test vectors + server fuzz + coverage CI)
Production-readiness sweep. Three audit cycles (v0.8.11-v0.8.17)
closed every CRIT / HIGH / MED security finding. v0.8.18 lifts the
operational maturity, AWS conformance posture, and
quality-gate infrastructure to match. No code-correctness
changes outside what already shipped in v0.8.17 — this release is
docs, tests, and CI.
Published to crates.io as s4-server@0.8.18, s4-codec@0.8.18,
s4-config@0.8.18, s4-codec-py@0.8.18. Install via
cargo install s4-server (CPU build).
What's new since v0.8.17
Added
- #165 P1 —
docs/security/threat-model.md.
STRIDE-shape threat model covering 5 attack surfaces (public S3
wire, compressed payload at rest, key handling, backend trust
boundary, Object Lock posture). Every mitigation traces to a
shipped issue number from the three audit cycles. Explicit
non-goals + known residual risks (therustls-webpkiCVE
chain etc.) documented so reviewers don't reverse-engineer
them. - #166 P2 —
docs/ops/runbook.md.
12 operational procedures (disk full, GPU OOM, backend 5xx
storm, SSE key rotation, KMS KEK loss, MFA secret loss,
replication backlog, TLS rotation, orphan sweep, legacy
reserved-key migration, audit advisory, graceful shutdown)
— each in Symptom → Diagnose → Mitigate → Recover →
Prevent shape. - #167 P3 — AWS SigV4 canonical-request test vectors
(crates/s4-server/src/routing.rs::aws_sigv4_canonical_vectors).
11 vectors pinning the v0.8.16 #150 byte-level helpers to
AWS-published expected outputs (vanilla / vanilla-query-order
key + value / utf8 / non-UTF8 byte round-trip / reserved-char
encoding / mixed-case percent normalisation / bare key /
unreserved set / S3 ListObjectsV2 / path with spaces). - #168 P4 — server-side bolero fuzz targets
(crates/s4-server/tests/fuzz_bolero.rs):
sigv4a_auth_header_bolero(SigV4a Authorization parser),
policy_json_bolero(IAM bucket-policy JSON parser). Pairs
with the existing 7 codec-layer bolero targets so the fuzz
farm now covers every untrusted parser on the listener edge. - #170 P6 — code coverage CI job (
cargo-llvm-cov+ Codecov
upload, push-to-main only) + bench smoke job (runs the three
examples/bench_*binaries to surface bit-rot; not a
regression gate). - #171 P7 — chaos / fault-injection test scaffold
(crates/s4-server/tests/chaos.rs). Placeholder establishing
the target; backend-method-level fault injection populates
v0.8.19+.
Changed
- #169 P5 — README proptest claim corrected from 38 → 39
properties. - #172 —
.github/workflows/ci.ymlnotify-on-failure
step now deduplicates by SHA prefix before opening an issue;
companion.github/workflows/ci-close-resolved.yml
auto-closes ci-failure issues once a subsequent main commit
lands green. Closes the auto-issue spam observed during the
v0.8.13 / v0.8.14 retry cycle.
Fixed
- Stale
ci-failureGitHub issues #115 / #116 / #117 closed
with the v0.8.13 / v0.8.14 supersession trail.
Tests
449 lib + 45 integration + 11 SigV4 vectors + 2 bolero + 1
chaos scaffold = total test target count climbs from 519 to
~540, all green under RUSTFLAGS=\"-D warnings\"; cargo clippy --workspace --all-targets clean; cargo fmt --all --check clean; MinIO E2E job green on CI; coverage job
green on CI.
Roadmap (deferred from this release)
- criterion regression-tracking benches — needs baseline
storage likebenchmark-action/github-action-benchmark. The
v0.8.18 bench-smoke job is the floor; the regression gate is
the ceiling. - Full chaos scenarios — 5+ tests against backend-method-
level fault injection. Scaffold ships here; scenarios
populate v0.8.19+. - Supply-chain hardening — sigstore release signing,
reproducible builds, SBOM badge.
Upgrade notes
- No new operator-visible knobs since v0.8.17. The three
opt-ins from prior releases (--trust-x-forwarded-for,
--prefer-columnar-gpu,--allow-legacy-reserved-key-reads)
are the entire knob surface. - Recommended pre-launch reading order:
Full per-issue notes: CHANGELOG.md.
v0.8.17 — third-round audit closeout (5 follow-up items on v0.8.16) + migration hatch
Third-round audit closeout. A follow-up Codex CLI + Claude
Code review of v0.8.16 caught 5 residual items (2 MED + 3 LOW).
No CRIT / HIGH after the prior two cycles. This is the version
to target for the Reddit launch — three full multi-agent audit
cycles have closed every CRIT / HIGH / MED finding from the
pre-release review.
Published to crates.io as s4-server@0.8.17, s4-codec@0.8.17,
s4-config@0.8.17, s4-codec-py@0.8.17. Install via
cargo install s4-server (CPU build).
What's new since v0.8.16
Fixed (#160-#162)
- #160 G-1 — F-5 presigned-URL 501 is now unconditional.
The v0.8.16 check ran AFTERlet gate = gate?;, so
deployments without--sigv4a-credentialshad
?X-Amz-Algorithm=AWS4-ECDSA-P256-SHA256URLs silently fall
through to the SigV4 path (which doesn't understand SigV4a
query auth either). The presigned-detect call now runs
before the gate guard, so every deployment emits the
deterministic 501. - #161 G-2 — reserved-name guard extended to 8 adjacent
per-object endpoints:get_object_acl,put_object_acl,
get_object_attributes,get_object_tagging,
put_object_tagging,delete_object_tagging,
restore_object, andupload_part_copy(both source +
destination). The v0.8.16 F-13 fix only covered GET / HEAD /
DELETE — a curious client could still
GetObjectAcl(<key>.s4index)or
PutObjectAcl(<key>.s4index, public-read)to bypass the
read-reject via the backend's public-URL path. New shared
helperS4Service::check_not_reserved_key(...)+
ReservedKeyModeenum so every site uses the same code; the
three pre-existing F-13 sites + the M-1 PUT / Copy /
CreateMultipart sites refactor through the same helper. - #162 G-3 —
post_magic_entropy_highshort-sample guard
is now reachable. The v0.8.16 F-12 check inside the helper
defaulted tofalsefor<= 48-byte samples but the upstream
MIN_SAMPLE_BYTES = 128short-circuit inpick_from_sample
filtered every such sample before it could reach F-12. The
magic-byte arm now runs above the MIN_SAMPLE_BYTES gate, so
a 40-byteBZh:loglog:user log actually hits the post-magic
entropy check and gets routed to the default codec
(compressed) rather than passed through uncompressed. Closes
the v0.8.15 M-7 motivation that v0.8.16 F-12 thought it had
closed.
Added (#163-#164)
- #163 G-4 —
--allow-legacy-reserved-key-readsCLI flag.
Migration escape hatch for operators upgrading from
pre-v0.8.15 deployments that may carry legitimate user-owned
objects whose key ends in.s4index. When set, the
reserved-name guard does NOT block GET / HEAD / DELETE on
.s4indexkeys; writes (PUT / Copy / Create-Multipart /
tagging-write / ACL-write) stay blocked regardless of the
flag so an attacker can't inject into the namespace. Default
falsematches v0.8.16 behaviour; boot-time info-log is
loud when the flag is on so the operator notices the
migration window is open. - #164 G-5 —
docs/orphan-sidecar-recovery.mdoperator
recipe for sweeping the orphan<key>.s4indexartifacts
that v0.8.15 H-g left on versioning-Enabled buckets. v0.8.16
#151 F-7 stopped emitting new orphans by skipping the
sidecar block on versioned multipart Complete; this recipe
handles the one-time cleanup of pre-F-7 leftovers. A future
release may ship as4 admin sweep-orphan-sidecars
subcommand that automates the same loop.
Upgrade notes
--allow-legacy-reserved-key-readsis the only new
operator-visible knob since v0.8.16. The cumulative audit
surface area still totals three opt-ins:
--trust-x-forwarded-for(v0.8.11 CRIT-4),
--prefer-columnar-gpu(v0.8.13 #125), and this v0.8.17
migration hatch.- No behavioural breaks since v0.8.16. The G-2 reserved-name
guard extension closes a leak that wasn't reachable via the
aws s3 cphappy path anyway.
Tests
438 lib + 45 integration tests green under
RUSTFLAGS=\"-D warnings\"; cargo clippy --workspace --all-targets clean; cargo fmt --all --check clean; MinIO
E2E job (cargo test --workspace --release -- --ignored --test-threads=1) green on CI.
Full per-issue notes: CHANGELOG.md.
Recovery recipe for v0.8.15 orphan sidecars:
docs/orphan-sidecar-recovery.md.