Skip to content

v1.2.0 — day-2 operations: prove it, automate it, keep it healthy

Choose a tag to compare

@masumi-ryugo masumi-ryugo released this 11 Jun 13:43
· 9 commits to main since this release

v1.2 — day-2 operations: prove it, automate it, keep it healthy.
Four additive features — the savings ledger + s4 savings (measured
$ saved in production, the counterpart to s4 estimate's prediction),
s4 maintain (policy-driven migrate/recompact/storage-class
transitions, one-shot or resident), dictionary day-2 ops
(s4 dict-status + restart-less SIGHUP rotation), and opt-in s4fs
writes (pandas/pyarrow writing gateway-compatible objects without the
gateway) — hardened by a 3-round dual-reviewer audit (findings
17 → 4 → 5 doc-trivia; zero P1 across all rounds). The v1.0 freeze
contract holds: everything below is additive and default-off;
flag-less behavior is bit-for-bit identical to v1.1.0.

Fixed (v1.2 audit round 2 — adversarial verification of the round-1 fix wave)

  • P2 Replication replicas no longer carry the s4-ledger marker
    (they are never ledger-counted, so a gateway-routed delete of a
    replica would have re-opened the asymmetric-subtraction bug round 1
    closed). Both replication metadata capture sites route through a
    marker-stripping snapshot helper.
  • P2 Ledger add and subtract now resolve the same logical size on
    churn: ledger-enabled SSE/versioned multipart re-PUTs stamp
    s4-original-size/s4-compressed-size (the exact values the add
    used), REPLACE copies stamp the probe-resolved original, and
    COPY-directive copies probe the destination for the add. Previously
    each add→delete cycle of a sidecar-suppressed multipart object
    stranded phantom original bytes (overstated savings). New e2e pins a
    versioned-multipart churn returning bucket totals to exactly zero.
  • P3 Marker-without-add cases (cap-exceeded multipart Complete,
    ledger flag toggles, and replicas written between the round-1 and
    round-2 fixes — moot for releases since v1.2 ships both) are now
    disclosed in the module contract, the report notes, and the
    Complete-path WARN — not eliminated (zero-clamp + drift note remain
    the guard rails).
  • P3 Access-point copy sources: REPLACE copies now strip
    client-supplied s4-* metadata regardless of the source-addressing
    variant (forged s4-ledger/s4-original-size via AP ARN closed),
    and AP sources can no longer read reserved keys (.s4index /
    .s4dict/).

Fixed (v1.2 audit round 1 — 4 reviewers over v1.1.0..HEAD, 2026-06-12)

  • P2 The savings ledger no longer subtracts objects it never added:
    gateway writes made with the ledger enabled stamp an unforgeable
    internal s4-ledger metadata marker, and deletes/overwrites of
    unmarked objects (backend-direct, s4fs-written, migrate /
    recompact output) skip subtraction with a per-bucket
    skipped_unaccounted tally + report note. Ratio and $/month floor at
    0 with a drift note. Previously a migrate-baked bucket could report
    negative savings after gateway-routed deletes (incl. lifecycle
    expiry).
  • P2 s4 maintain transition copies are pinned with
    x-amz-copy-source-if-match: a concurrent overwrite between the
    attribute HEAD and the CopyObject now makes the backend refuse with
    412 (counted etag-raced) instead of stamping the old object's
    s4-* manifest onto the new bytes (which made the key unreadable
    through the gateway until the next rewrite).
  • P2 CompleteMultipartUpload no longer runs the ledger's
    frame-scan accounting when the ledger flag is off (CPU-only
    regression on large multiparts; output bytes were unaffected).
  • P2 (s4fs) a sidecar PUT failure after a successful body write now
    raises a typed S4SidecarWriteError and still invalidates the
    per-path caches (try/finally), so same-instance reads see the new
    object; previously stale caches could serve the old manifest against
    the new body.
  • P2 The README SemVer freeze table now states the Python binding
    contract as a guaranteed-minimum set plus CHANGELOG-recorded additive
    exports (it still claimed "exactly" the v1.0 names while v1.1/v1.2
    shipped additive helpers).
  • P3 Maintain transition re-sends Expires /
    WebsiteRedirectLocation; report notes now state precisely what a
    REPLACE-directive class copy changes (backend-SSE re-encryption under
    the bucket default, multipart→single-part checksum/ETag change with
    sidecar full-read fallback). Resident s4 maintain no longer risks
    sleeping out a full --interval when SIGTERM lands in the
    flag-check gap (notify_one permit).
  • P3 s4 dict-status: 10 s HTTP timeout (was unbounded), the
    Prometheus text parser no longer panics on multibyte escape
    sequences in third-party output, and the cumulative-counter semantics
    (post-rotation STALE persistence, removed-prefix series lingering)
    are documented. s4 savings against a missing state file now says so
    in a note instead of silently reporting zeros.
  • P3 Ledger internals: Prometheus gauges are stamped inside the
    write lock (no transient reordering), SIGUSR1 dumps route through the
    ledger's own flush() (no .tmp race with the event flush), and
    .s4dict/ propagation objects are explicitly excluded from
    accounting (internal keys are never ledger-counted; documented
    contract). (s4-codec-py) encode_s4_object passthrough CRC32C now
    releases the GIL.

Added

  • s4fs write support (opt-in): S4FileSystem(write_enabled=True)
    (or storage_options={"write_enabled": True} through fsspec/pandas)
    enables pipe_file / put_file / open(path, "wb"), writing
    gateway-compatible S4 objects directly to the backend:
    df.to_parquet("s4://bucket/key") now works without the gateway.
    The encoder (s4_codec.encode_s4_object, new in the Python binding
    alongside bind_index + pick_chunk_size) reproduces the gateway's
    single-PUT path byte-compatibly — S4F2 cpu-zstd frames using the
    gateway chunk-size policy (1 MiB / 4 MiB / 16 MiB thresholds), the
    five manifest metadata keys (s4-codec / s4-original-size /
    s4-compressed-size / s4-crc32c / s4-framed), and, for
    multi-frame bodies, a <key>.s4index sidecar bound to the backend
    ETag + size after the body PUT (single-frame overwrites clean up a
    stale sidecar). Verified end-to-end against MinIO + the real
    gateway binary: gateway GET and Range GET return the original
    bytes, s4 verify-sidecar reports OK with the version binding
    intact, and pandas/pyarrow round-trip through both s4fs and the
    gateway. write_codec="passthrough" stores raw stamped bodies;
    everything else (cpu-gzip / cpu-zstd-dict / nvcomp-* / SSE /
    append / versioning) is refused with a typed error pointing at the
    gateway. Underlying filesystems that cannot stamp S3 user metadata
    are refused with S4MetadataUnsupportedError (an unstamped framed
    body would be served raw by the gateway); s3fs is supported out of
    the box. Default behaviour without the flag is unchanged
    (read-only).

  • Per-prefix dictionary metrics: the --zstd-dict PUT branch now
    exports s4_dict_put_total{prefix,outcome="win"|"loss"} and
    s4_dict_put_bytes_total{prefix,kind="original"|"dict"|"plain"}
    both compression results were already measured per PUT, so the byte
    counters are exact on wins and losses alike. Cardinality is bounded
    by the configured prefix count; with no dict configuration the
    series are never registered (default behaviour bit-for-bit
    unchanged). The gateway also self-monitors: a prefix whose rolling
    win rate over its last 100 dict-path PUTs drops below 0.5 logs a
    stale-dictionary WARN, at most once per prefix per hour.

  • s4 dict-status --metrics-url <URL> [--warn-win-rate 0.5] [--format table|json]: scrapes a running gateway's /metrics
    (built-in minimal Prometheus text parser, no new dependencies) and
    reports per-prefix dictionary win rate, effective compression ratio
    (dict bytes / original bytes) and lazy s4_dict_fetch_total error
    counts. Prefixes below the win-rate threshold get a "dictionary may
    be stale; consider retraining (s4 train-dict)" warning and the
    command exits 1 — cron-able drift monitoring.

  • --zstd-dict-map <FILE> + SIGHUP reload: TOML [mappings]
    table ("<bucket>/<prefix>" = "<dict-id>") as the reloadable twin
    of repeated --zstd-dict flags — identical validation, boot-time
    fetch + fingerprint verification and 1 MiB dictionary cap; a prefix
    configured in both places is a boot error. On SIGHUP the file is
    re-read, new dictionaries are fetched + verified (already-loaded
    ones are reused), and the store is swapped atomically (arc-swap RCU
    — in-flight requests finish on the generation they started with),
    so rotation is s4 train-dict → edit map → kill -HUP, no gateway
    restart. A failed reload keeps the current mappings live (ERROR log

    • s4_dict_reload_total{result="err"}; success bumps
      result="ok"). Without the flag, SIGHUP does not touch dictionary
      configuration. New library surface:
      S4Service::with_shared_zstd_dicts,
      dict::{SharedDictStore, DictWinTracker, parse_zstd_dict_map, merge_dict_entries, build_dict_status, parse_prom_sample}.
  • Docs: README "Operating dictionaries" section (dict-status /
    rotation runbook), plus an explicit note that multipart uploads are
    out of the dictionary path by design — parts never consult the
    dict store, and S3's 5 MiB minimum part size sits far above the
    small-object ceiling (default 1 MiB --zstd-dict-max-bytes).

  • E2E: tests/dict_ops_minio.rs (Docker-gated, real s4 binary)
    — win/loss counters on /metrics, dict-status exit codes 0/1
    with the retrain warning, map-file boot, SIGHUP rotation picking up
    a new prefix without a restart, and the fail-safe on a broken map
    (previous store keeps serving).

  • s4 maintain --policy <FILE> [--execute] [--interval <DUR>] [--format table|json]: policy-driven bucket maintenance. A TOML
    file of [[rule]] entries (unique name, bucket, optional
    prefix, common older-than age gate) runs sequentially top to
    bottom; action = "migrate" | "recompact" reuse the v1.1 library
    paths with the same parameters as their CLI flags (no-tags,
    target-zstd-level, min-gain-percent, …), and the new
    action = "transition" (storage-class = "GLACIER_IR" etc.)
    changes cold objects' storage class via same-key server-side
    CopyObject with the <key>.s4index sidecar always accompanying its
    main object into the same class (drift from earlier partial runs is
    realigned; sidecars are never moved on their own). Dry-run by
    default; policy validation reports every problem in one pass;
    --interval keeps the command resident (run → sleep → re-run,
    structured per-cycle logs, graceful SIGTERM/SIGINT that finishes the
    in-flight rule). All three actions are idempotent, so re-runs and
    resident cycles skip settled objects
    (already-s4 / already-compacted / already-target-class).

  • Savings ledger (--savings-ledger-state-file <PATH>, opt-in,
    default-off): the gateway maintains measured per-bucket cumulative
    counters — original_bytes (logical client-PUT bytes),
    stored_bytes (backend bytes actually written: frames + SSE
    envelope + sidecars) and objects — updated on PUT /
    CompleteMultipartUpload / CopyObject / DELETE (overwrite = footprint
    swap via a best-effort HEAD probe; the extra HEADs exist only with
    the flag set). State is loaded with the standard --*-state-file
    fault isolation, flushed atomically on every write event, and
    re-dumped on SIGUSR1. Scope (honest): gateway-traversing writes only
    — backend-direct writes, s4 migrate / s4 recompact,
    aborted-multipart part bytes and replication replicas are not
    observed.

  • s4 savings --state-file <PATH> [--price-per-gb-month 0.023] [--format table|json]: read-only report over the ledger state
    file (per-bucket + total original/stored bytes, savings ratio,
    $/month at the given price) — the measured twin of s4 estimate.
    Works while the gateway is running; fixed honesty notes are part of
    the output.

  • Prometheus gauges
    s4_ledger_{original_bytes,stored_bytes,objects}{bucket}
    mirroring
    the ledger state file (never registered when the flag is off), plus
    a drop-in Grafana dashboard at
    contrib/grafana/s4-savings-dashboard.json (saved bytes / savings
    ratio / per-bucket split / $-per-month with a price_per_gb_month
    variable; import steps in docs/observability.md).

Fixed

  • s4fs: info() no longer poisons the per-instance live-info
    snapshot with the rewritten (decompressed) size — previously any
    info() call before a range read made the sidecar source-size
    binding check fail, silently disabling the partial-fetch fast-path
    (full-object read + warning) until the cache was invalidated. Also
    fixed: reading back the gateway's zero-frame body (an empty
    object PUT through the gateway, or written by s4fs) raised
    S4IoError on the full-read path because the empty framed body
    carries no S4F2 magic and fell into the unframed-decode branch;
    the s4-framed metadata stamp now routes it through the frame
    parser, which correctly yields b"".