v1.2.0 — day-2 operations: prove it, automate it, keep it healthy
v1.2 — day-2 operations: prove it, automate it, keep it healthy.
Four additive features — the savings ledger + s4 savings (measured
$ saved in production, the counterpart to s4 estimate's prediction),
s4 maintain (policy-driven migrate/recompact/storage-class
transitions, one-shot or resident), dictionary day-2 ops
(s4 dict-status + restart-less SIGHUP rotation), and opt-in s4fs
writes (pandas/pyarrow writing gateway-compatible objects without the
gateway) — hardened by a 3-round dual-reviewer audit (findings
17 → 4 → 5 doc-trivia; zero P1 across all rounds). The v1.0 freeze
contract holds: everything below is additive and default-off;
flag-less behavior is bit-for-bit identical to v1.1.0.
Fixed (v1.2 audit round 2 — adversarial verification of the round-1 fix wave)
- P2 Replication replicas no longer carry the
s4-ledgermarker
(they are never ledger-counted, so a gateway-routed delete of a
replica would have re-opened the asymmetric-subtraction bug round 1
closed). Both replication metadata capture sites route through a
marker-stripping snapshot helper. - P2 Ledger add and subtract now resolve the same logical size on
churn: ledger-enabled SSE/versioned multipart re-PUTs stamp
s4-original-size/s4-compressed-size(the exact values the add
used), REPLACE copies stamp the probe-resolved original, and
COPY-directive copies probe the destination for the add. Previously
each add→delete cycle of a sidecar-suppressed multipart object
stranded phantom original bytes (overstated savings). New e2e pins a
versioned-multipart churn returning bucket totals to exactly zero. - P3 Marker-without-add cases (cap-exceeded multipart Complete,
ledger flag toggles, and replicas written between the round-1 and
round-2 fixes — moot for releases since v1.2 ships both) are now
disclosed in the module contract, the report notes, and the
Complete-path WARN — not eliminated (zero-clamp + drift note remain
the guard rails). - P3 Access-point copy sources: REPLACE copies now strip
client-supplieds4-*metadata regardless of the source-addressing
variant (forgeds4-ledger/s4-original-sizevia AP ARN closed),
and AP sources can no longer read reserved keys (.s4index/
.s4dict/).
Fixed (v1.2 audit round 1 — 4 reviewers over v1.1.0..HEAD, 2026-06-12)
- P2 The savings ledger no longer subtracts objects it never added:
gateway writes made with the ledger enabled stamp an unforgeable
internals4-ledgermetadata marker, and deletes/overwrites of
unmarked objects (backend-direct,s4fs-written,migrate/
recompactoutput) skip subtraction with a per-bucket
skipped_unaccountedtally + report note. Ratio and $/month floor at
0 with a drift note. Previously a migrate-baked bucket could report
negative savings after gateway-routed deletes (incl. lifecycle
expiry). - P2
s4 maintaintransition copies are pinned with
x-amz-copy-source-if-match: a concurrent overwrite between the
attribute HEAD and the CopyObject now makes the backend refuse with
412 (countedetag-raced) instead of stamping the old object's
s4-*manifest onto the new bytes (which made the key unreadable
through the gateway until the next rewrite). - P2
CompleteMultipartUploadno longer runs the ledger's
frame-scan accounting when the ledger flag is off (CPU-only
regression on large multiparts; output bytes were unaffected). - P2 (s4fs) a sidecar PUT failure after a successful body write now
raises a typedS4SidecarWriteErrorand still invalidates the
per-path caches (try/finally), so same-instance reads see the new
object; previously stale caches could serve the old manifest against
the new body. - P2 The README SemVer freeze table now states the Python binding
contract as a guaranteed-minimum set plus CHANGELOG-recorded additive
exports (it still claimed "exactly" the v1.0 names while v1.1/v1.2
shipped additive helpers). - P3 Maintain transition re-sends
Expires/
WebsiteRedirectLocation; report notes now state precisely what a
REPLACE-directive class copy changes (backend-SSE re-encryption under
the bucket default, multipart→single-part checksum/ETag change with
sidecar full-read fallback). Residents4 maintainno longer risks
sleeping out a full--intervalwhen SIGTERM lands in the
flag-check gap (notify_onepermit). - P3
s4 dict-status: 10 s HTTP timeout (was unbounded), the
Prometheus text parser no longer panics on multibyte escape
sequences in third-party output, and the cumulative-counter semantics
(post-rotation STALE persistence, removed-prefix series lingering)
are documented.s4 savingsagainst a missing state file now says so
in a note instead of silently reporting zeros. - P3 Ledger internals: Prometheus gauges are stamped inside the
write lock (no transient reordering), SIGUSR1 dumps route through the
ledger's ownflush()(no.tmprace with the event flush), and
.s4dict/propagation objects are explicitly excluded from
accounting (internal keys are never ledger-counted; documented
contract). (s4-codec-py)encode_s4_objectpassthrough CRC32C now
releases the GIL.
Added
-
s4fs write support (opt-in):
S4FileSystem(write_enabled=True)
(orstorage_options={"write_enabled": True}through fsspec/pandas)
enablespipe_file/put_file/open(path, "wb"), writing
gateway-compatible S4 objects directly to the backend:
df.to_parquet("s4://bucket/key")now works without the gateway.
The encoder (s4_codec.encode_s4_object, new in the Python binding
alongsidebind_index+pick_chunk_size) reproduces the gateway's
single-PUT path byte-compatibly — S4F2cpu-zstdframes using the
gateway chunk-size policy (1 MiB / 4 MiB / 16 MiB thresholds), the
five manifest metadata keys (s4-codec/s4-original-size/
s4-compressed-size/s4-crc32c/s4-framed), and, for
multi-frame bodies, a<key>.s4indexsidecar bound to the backend
ETag + size after the body PUT (single-frame overwrites clean up a
stale sidecar). Verified end-to-end against MinIO + the real
gateway binary: gateway GET and Range GET return the original
bytes,s4 verify-sidecarreports OK with the version binding
intact, and pandas/pyarrow round-trip through both s4fs and the
gateway.write_codec="passthrough"stores raw stamped bodies;
everything else (cpu-gzip / cpu-zstd-dict / nvcomp-* / SSE /
append / versioning) is refused with a typed error pointing at the
gateway. Underlying filesystems that cannot stamp S3 user metadata
are refused withS4MetadataUnsupportedError(an unstamped framed
body would be served raw by the gateway); s3fs is supported out of
the box. Default behaviour without the flag is unchanged
(read-only). -
Per-prefix dictionary metrics: the
--zstd-dictPUT branch now
exportss4_dict_put_total{prefix,outcome="win"|"loss"}and
s4_dict_put_bytes_total{prefix,kind="original"|"dict"|"plain"}—
both compression results were already measured per PUT, so the byte
counters are exact on wins and losses alike. Cardinality is bounded
by the configured prefix count; with no dict configuration the
series are never registered (default behaviour bit-for-bit
unchanged). The gateway also self-monitors: a prefix whose rolling
win rate over its last 100 dict-path PUTs drops below 0.5 logs a
stale-dictionary WARN, at most once per prefix per hour. -
s4 dict-status --metrics-url <URL> [--warn-win-rate 0.5] [--format table|json]: scrapes a running gateway's/metrics
(built-in minimal Prometheus text parser, no new dependencies) and
reports per-prefix dictionary win rate, effective compression ratio
(dict bytes / original bytes) and lazys4_dict_fetch_totalerror
counts. Prefixes below the win-rate threshold get a "dictionary may
be stale; consider retraining (s4 train-dict)" warning and the
command exits 1 — cron-able drift monitoring. -
--zstd-dict-map <FILE>+ SIGHUP reload: TOML[mappings]
table ("<bucket>/<prefix>" = "<dict-id>") as the reloadable twin
of repeated--zstd-dictflags — identical validation, boot-time
fetch + fingerprint verification and 1 MiB dictionary cap; a prefix
configured in both places is a boot error. On SIGHUP the file is
re-read, new dictionaries are fetched + verified (already-loaded
ones are reused), and the store is swapped atomically (arc-swap RCU
— in-flight requests finish on the generation they started with),
so rotation iss4 train-dict→ edit map →kill -HUP, no gateway
restart. A failed reload keeps the current mappings live (ERROR logs4_dict_reload_total{result="err"}; success bumps
result="ok"). Without the flag, SIGHUP does not touch dictionary
configuration. New library surface:
S4Service::with_shared_zstd_dicts,
dict::{SharedDictStore, DictWinTracker, parse_zstd_dict_map, merge_dict_entries, build_dict_status, parse_prom_sample}.
-
Docs: README "Operating dictionaries" section (dict-status /
rotation runbook), plus an explicit note that multipart uploads are
out of the dictionary path by design — parts never consult the
dict store, and S3's 5 MiB minimum part size sits far above the
small-object ceiling (default 1 MiB--zstd-dict-max-bytes). -
E2E:
tests/dict_ops_minio.rs(Docker-gated, reals4binary)
— win/loss counters on/metrics,dict-statusexit codes 0/1
with the retrain warning, map-file boot, SIGHUP rotation picking up
a new prefix without a restart, and the fail-safe on a broken map
(previous store keeps serving). -
s4 maintain --policy <FILE> [--execute] [--interval <DUR>] [--format table|json]: policy-driven bucket maintenance. A TOML
file of[[rule]]entries (uniquename,bucket, optional
prefix, commonolder-thanage gate) runs sequentially top to
bottom;action = "migrate" | "recompact"reuse the v1.1 library
paths with the same parameters as their CLI flags (no-tags,
target-zstd-level,min-gain-percent, …), and the new
action = "transition"(storage-class = "GLACIER_IR"etc.)
changes cold objects' storage class via same-key server-side
CopyObject with the<key>.s4indexsidecar always accompanying its
main object into the same class (drift from earlier partial runs is
realigned; sidecars are never moved on their own). Dry-run by
default; policy validation reports every problem in one pass;
--intervalkeeps the command resident (run → sleep → re-run,
structured per-cycle logs, graceful SIGTERM/SIGINT that finishes the
in-flight rule). All three actions are idempotent, so re-runs and
resident cycles skip settled objects
(already-s4/already-compacted/already-target-class). -
Savings ledger (
--savings-ledger-state-file <PATH>, opt-in,
default-off): the gateway maintains measured per-bucket cumulative
counters —original_bytes(logical client-PUT bytes),
stored_bytes(backend bytes actually written: frames + SSE
envelope + sidecars) andobjects— updated on PUT /
CompleteMultipartUpload / CopyObject / DELETE (overwrite = footprint
swap via a best-effort HEAD probe; the extra HEADs exist only with
the flag set). State is loaded with the standard--*-state-file
fault isolation, flushed atomically on every write event, and
re-dumped on SIGUSR1. Scope (honest): gateway-traversing writes only
— backend-direct writes,s4 migrate/s4 recompact,
aborted-multipart part bytes and replication replicas are not
observed. -
s4 savings --state-file <PATH> [--price-per-gb-month 0.023] [--format table|json]: read-only report over the ledger state
file (per-bucket + total original/stored bytes, savings ratio,
$/month at the given price) — the measured twin ofs4 estimate.
Works while the gateway is running; fixed honesty notes are part of
the output. -
Prometheus gauges
s4_ledger_{original_bytes,stored_bytes,objects}{bucket}mirroring
the ledger state file (never registered when the flag is off), plus
a drop-in Grafana dashboard at
contrib/grafana/s4-savings-dashboard.json(saved bytes / savings
ratio / per-bucket split / $-per-month with aprice_per_gb_month
variable; import steps indocs/observability.md).
Fixed
- s4fs:
info()no longer poisons the per-instance live-info
snapshot with the rewritten (decompressed) size — previously any
info()call before a range read made the sidecar source-size
binding check fail, silently disabling the partial-fetch fast-path
(full-object read + warning) until the cache was invalidated. Also
fixed: reading back the gateway's zero-frame body (an empty
object PUT through the gateway, or written by s4fs) raised
S4IoErroron the full-read path because the empty framed body
carries noS4F2magic and fell into the unframed-decode branch;
thes4-framedmetadata stamp now routes it through the frame
parser, which correctly yieldsb"".