Skip to content

image/compress-checksum: maximize CPU + memory use#9758

Merged
igorpecovnik merged 1 commit into
mainfrom
compress-checksum-elastic
May 5, 2026
Merged

image/compress-checksum: maximize CPU + memory use#9758
igorpecovnik merged 1 commit into
mainfrom
compress-checksum-elastic

Conversation

@igorpecovnik
Copy link
Copy Markdown
Member

@igorpecovnik igorpecovnik commented May 4, 2026

Benefits

  • Faster wall time on big boxes alone. A single runner compressing on a 64-core / 128 GB host now uses every core and the strongest xz preset memory will fit, instead of static -T 0 at level 1. Order-of-magnitude reduction in compression time on hosts that were previously CPU-starved by the cap.
  • No more OOM on shared hosts. The previous defaults (xz -T 0) didn't account for memory at all — N peers at level 9 could collectively reserve more RAM than the box has. The new memory budget per-job (MemAvailable × 0.6 / (active+1)) makes that mathematically impossible at any peer count.
  • Smaller nightly images. When memory allows it, xz climbs to -9 (down from -1), shaving ~25% off the compressed size. Critical for staying under GitHub's 2 GB per-asset cap on desktop builds that were previously borderline.
  • Faster stable releases. BETA=no deploys to our own infra (no size cap) and is now pinned to xz -0 — roughly 2× faster than -1 for the same wall-clock budget per release.
  • Recovered memory on small siblings. A 100 MB hyperv.zip at -9 previously reserved ~21 GB for parallelism it couldn't use; the file-size-aware thread cap brings it down to the few hundred MB the file can actually drive. Frees that headroom for whoever's using the box next.
  • Diagnosable from build logs. Four pre-loop alerts and per-file before/after lines mean we can tell exactly why a given build picked level X — useful when the runner is on a server we can't ssh to.

How it works

  • Threads = full nproc. Kernel time-slices when peer xz / zstd jobs contend; we get the cores when they idle.
  • Per-job memory budget = MemAvailable × 0.6 / (active_jobs + 1). Walks xz presets 9 → 6 → 3 → 1 and picks the strongest whose footprint (compress_threads × per_thread_mem) fits.
  • zstd lifts to --ultra -22 when budget ≥ 2 GB, else -19. Always uses --long=27 (128 MB window — exactly the decoder's default cap) so plain zstd -d still works.
  • File-size-aware thread cap (xz only). file_size / block_size, where block size is ~3× the preset's dict (192 MB at -9, 24 MB at -6, 12 MB at -3/4/5, 3 MB at -0/1). zstd auto-scales workers and uses the full budget directly.
  • BETA=no forces xz -0 for stable releases (own infra, speed first).
  • ZSTD_COMPRESSION_LEVEL / IMAGE_XZ_COMPRESSION_RATIO overrides remain honored.

Behaviour matrix

host active peers before after
64c / 157 GB 15 4 threads, -9 64 threads, -3
32c / 99 GB 2 10 threads, -9 32 threads, -6
16c / 76 GB 15 1 thread, -9 16 threads, -6
64c / 157 GB 0 (alone) 64 threads, -9 64 threads, -9

CPU stays maxed; level slides down only when memory genuinely can't absorb the strongest preset across all concurrent peers.

Diagnostic output

Four alerts before the loop:

[ Compression host          ] [ nproc=64 loadavg=2.4/2.0/1.7 MemTotal=160000MB MemAvail=157000MB ]
[ Compression resource share ] [ active_xz/zstd=15 -> threads=64, budget=5887MB; pick xz=-3 zstd=-22 ]
[ Compression xz level walk  ] [ -9:43136MB>budget skip; -6:6016MB>budget skip; -3:2048MB<=budget OK ]
[ Compression input          ] [ 4 file(s), 7240MB total ]

Per file:

[ Compressing with xz ] [ rootfs.img.xz (-3, threads: 64/64, size: 5120MB, block: 12MB, peak: ~2048MB) ]
[ Compressed          ] [ rootfs.img.xz: 5120MB -> 1250MB (24%), 41s, 124 MB/s ]

Test plan

  • Trigger a release-targets build on a box where multiple runners hit the compression phase together; confirm the diagnostic alerts show threads=$(nproc) and a sane level pick.
  • Build with BETA=no and confirm the per-file alert reads xz=-0.
  • Build a desktop image and verify the small sibling artefacts (qcow2, vhdx, hyperv.zip) report a threads_used lower than the budget when the chosen level has a large block size.
  • Confirm output .xz / .zst decompresses cleanly with stock xz -d / zstd -d.

Summary by CodeRabbit

  • Refactor
    • Image compression now adapts to available CPU and memory, selecting compression levels and per-file thread counts for improved efficiency.
    • Per-image overrides are honored; zstd uses explicit worker threads and enables ultra mode for very high levels.
    • Compression runs emit host/load/memory and per-job resource-share logs for better performance visibility.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4c12a52-bb8f-4a6f-9363-dcececb6551e

📥 Commits

Reviewing files that changed from the base of the PR and between afc1e8c and 22ecfcc.

📒 Files selected for processing (1)
  • lib/functions/image/compress-checksum.sh

📝 Walkthrough

Walkthrough

output_images_compress_and_checksum() now probes host CPU/load/memory, counts concurrent compressor jobs, computes a per-job memory budget and total input size, selects elastic xz/zstd compression levels, and applies per-file thread caps and memory-aware compression parameters while logging decisions and compression metrics.

Changes

Resource-Aware Compression Planning and Execution

Layer / File(s) Summary
Resource Budget Calculation
lib/functions/image/compress-checksum.sh
Reads /proc/meminfo (MemAvailable), /proc/loadavg, determines host_threads via nproc, counts concurrent xz/zstd jobs, sums total input MB, computes per-job mem_budget_mb, chooses xz_elastic_level with a walk-trace and derives zstd_elastic_level, and logs host/load/memory and selection details.
Parameter Determination / Overrides
lib/functions/image/compress-checksum.sh
Derives per-image xz_default_ratio from elastic selection but forces -0 when BETA=no and allows IMAGE_XZ_COMPRESSION_RATIO override; sets zstd_level from ZSTD_COMPRESSION_LEVEL or elastic default.
Per-File Thread Capping / Memory Estimation
lib/functions/image/compress-checksum.sh
Caps xz threads per file using file-size thresholds and preset-specific block/per-thread memory estimates; clamps threads between 1 and global compress_threads; estimates peak_mem_mb and logs file size, chosen threads, block size, and estimated peak memory.
Compression Invocation
lib/functions/image/compress-checksum.sh
Replaces fixed invocations with parameterized commands: xz runs with -T ${file_threads} and the computed -${xz_level} (or forced -0); zstdmt runs with explicit -T ${compress_threads}, --long=27, and conditional --ultra for high levels; output suffix set (.xz/.zst) and original file removed on success.
Accounting & Logging
lib/functions/image/compress-checksum.sh
Measures elapsed time, computes compressed size, compression ratio, throughput (MB/s), and emits per-output “Compressed” alert including threads, estimated peak memory, and performance metrics.

Sequence Diagram

sequenceDiagram
    actor Script
    participant Host as "Host (nproc, ps)"
    participant Mem as "/proc/meminfo"
    participant Planner as "Elastic Level Planner"
    participant Images as "Image Files"
    participant xz as "xz"
    participant zstd as "zstd/zstdmt"

    Script->>Host: query nproc & count active xz/zstd jobs
    Script->>Mem: read MemAvailable / MemTotal
    Script->>Planner: compute mem_budget_mb, total input size
    Planner->>Planner: select xz_elastic_level, zstd_elastic_level
    Planner-->>Script: log resource share and selection trace

    loop for each image
        Script->>Images: stat file size
        Script->>Planner: compute file_threads, peak_mem_mb
        Script->>xz: run with -T ${file_threads} and -${xz_level}
        xz-->>Script: compressed chunk
        Script->>zstd: run zstdmt --long=27 -T ${compress_threads} [--ultra?]
        zstd-->>Script: compressed output
        Script->>Script: delete input, compute elapsed & throughput, log result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I counted cores and memory bright,
Picked elastic levels just right,
Threads tucked to file and size,
Compressing dreams in tidy guise,
A hop, a log, compression done tonight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'image/compress-checksum: maximize CPU + memory use' directly and concisely summarizes the main change: making the compression script host-aware to elastically allocate CPU and memory resources rather than using static defaults.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch compress-checksum-elastic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size/large PR with 250 lines or more 05 Milestone: Second quarter release Desktop Graphical user interface Needs review Seeking for review Framework Framework components Documentation Documentation changes or additions BSP Board Support Packages GitHub GitHub-related changes like labels, templates, ... labels May 4, 2026
@igorpecovnik igorpecovnik force-pushed the compress-checksum-elastic branch from cf53f27 to cb2b882 Compare May 4, 2026 19:02
@github-actions github-actions Bot added size/medium PR with more then 50 and less then 250 lines and removed size/large PR with 250 lines or more labels May 4, 2026
@igorpecovnik igorpecovnik force-pushed the compress-checksum-elastic branch from cb2b882 to 4c394dd Compare May 4, 2026 19:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/functions/image/compress-checksum.sh (1)

43-69: ⚡ Quick win

DRY violation: combine level selection and trace building into a single loop.

The xz level selection (lines 43-50) and trace building (lines 56-69) iterate over the same 9:674 6:94 3:32 1:9 list with nearly identical logic. If the memory values are updated later, both loops must be kept in sync.

♻️ Proposed refactor to single loop
-	# Pick the strongest xz preset whose footprint (compress_threads * per_thread_MB)
-	# fits the per-job memory budget. Per-thread mem from xz manpage.
-	declare xz_elastic_level="1"
-	for lvl_mem in 9:674 6:94 3:32 1:9; do
-		lvl="${lvl_mem%:*}"
-		pt="${lvl_mem#*:}"
-		if (( compress_threads * pt <= mem_budget_mb )); then
-			xz_elastic_level="$lvl"
-			break
-		fi
-	done
-
-	# zstd: bump to --ultra -22 only when there's clear memory headroom.
-	declare zstd_elastic_level="19"
-	(( mem_budget_mb >= 2048 )) && zstd_elastic_level="22"
-
-	# Trace which xz levels were considered against the budget; useful when
-	# the picked level seems surprising in a build log.
-	declare xz_walk_trace=""
-	for lvl_mem in 9:674 6:94 3:32 1:9; do
-		lvl="${lvl_mem%:*}"
-		pt="${lvl_mem#*:}"
-		need=$(( compress_threads * pt ))
-		if (( need <= mem_budget_mb )); then
-			xz_walk_trace+="-${lvl}:${need}MB<=budget OK; "
-			break
-		else
-			xz_walk_trace+="-${lvl}:${need}MB>budget skip; "
-		fi
-	done
+	# Pick the strongest xz preset whose footprint (compress_threads * per_thread_MB)
+	# fits the per-job memory budget. Per-thread mem from xz manpage.
+	# Also build a trace of considered levels for diagnostics.
+	declare xz_elastic_level="1" xz_walk_trace=""
+	for lvl_mem in 9:674 6:94 3:32 1:9; do
+		lvl="${lvl_mem%:*}"
+		pt="${lvl_mem#*:}"
+		need=$(( compress_threads * pt ))
+		if (( need <= mem_budget_mb )); then
+			xz_elastic_level="$lvl"
+			xz_walk_trace+="-${lvl}:${need}MB<=budget OK; "
+			break
+		else
+			xz_walk_trace+="-${lvl}:${need}MB>budget skip; "
+		fi
+	done
+
+	# zstd: bump to --ultra -22 only when there's clear memory headroom.
+	declare zstd_elastic_level="19"
+	(( mem_budget_mb >= 2048 )) && zstd_elastic_level="22"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/image/compress-checksum.sh` around lines 43 - 69, The two loops
that iterate over "9:674 6:94 3:32 1:9" should be merged into one so level
selection and trace building stay in sync: in a single for loop over lvl_mem
(used by xz_elastic_level and xz_walk_trace) extract lvl and pt as before,
compute need=$(( compress_threads * pt )), append the appropriate trace fragment
("-${lvl}:${need}MB<=budget OK; " or "-${lvl}:${need}MB>budget skip; "), and
when need <= mem_budget_mb set xz_elastic_level="$lvl" and break; preserve
existing variables compress_threads, mem_budget_mb, and that zstd_elastic_level
logic remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/functions/image/compress-checksum.sh`:
- Around line 43-69: The two loops that iterate over "9:674 6:94 3:32 1:9"
should be merged into one so level selection and trace building stay in sync: in
a single for loop over lvl_mem (used by xz_elastic_level and xz_walk_trace)
extract lvl and pt as before, compute need=$(( compress_threads * pt )), append
the appropriate trace fragment ("-${lvl}:${need}MB<=budget OK; " or
"-${lvl}:${need}MB>budget skip; "), and when need <= mem_budget_mb set
xz_elastic_level="$lvl" and break; preserve existing variables compress_threads,
mem_budget_mb, and that zstd_elastic_level logic remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4aa7a60-39d2-43e3-af62-84471d4ea9db

📥 Commits

Reviewing files that changed from the base of the PR and between a6cb68a and 4c394dd.

📒 Files selected for processing (1)
  • lib/functions/image/compress-checksum.sh

@iav
Copy link
Copy Markdown
Contributor

iav commented May 4, 2026

The default zstd level is raised from -9 to -19, and to -22 --ultra when mem_budget_mb >= 2048. For small ARM build hosts with 2-4 weak cores, this is a clear regression: the final build stage will take noticeably longer and will keep CPU utilization pinned much longer. The code does not make any allowance for weak CPUs: zstd_level depends only on available memory, not on nproc, host class, or current CPU pressure.
How to avoid it: do not raise the default zstd level for the small-host class. A practical option is to keep a safe default such as -9 or -12, and only enable -19/-22 via explicit opt-in or for publish/CI scenarios.

For xz, the default changes from the previous -1 to an elastic level for nightly and user-built images. This is also a regression on weak ARM hosts: a typical 2-core/2GB or 4-core/2GB machine can easily end up with xz -6, and a 2-core/4GB machine may reach xz -9 with this heuristic. That is substantially heavier than the previous -1, while the benefit is mainly relevant where artifact size is critical.

The "resource share" heuristic looks at MemAvailable and the number of already running xz/zstd processes, but it does not really model weak hosts as a separate class. For xz, this is partially mitigated by the level selection logic, but for zstd the minimum default remains too heavy. On small ARM machines, CPU is usually the primary bottleneck, not only RAM, so the current logic will systematically overestimate how aggressive compression can safely be.

@igorpecovnik igorpecovnik force-pushed the compress-checksum-elastic branch from 4c394dd to afc1e8c Compare May 4, 2026 20:39
@igorpecovnik
Copy link
Copy Markdown
Member Author

@iav — good catch, agree the elastic decision was memory-only and over-escalated on weak ARM hosts. Pushed a fix in afc1e8c5d that adds a CPU floor to each level. Each preset now requires both a memory ceiling and a minimum thread count:

preset min threads per-thread mem
xz -9 8 674 MB
xz -6 4 94 MB
xz -3 2 32 MB
xz -1 any 9 MB (final fallback)
zstd -22 --ultra 16 needs 4 GB budget
zstd -19 8 needs 2 GB budget
zstd -12 4
zstd -9 any — (matches old static default)

Behaviour at the cases you raised, plus a few sanity points:

host budget (alone) xz zstd
2c / 2 GB 1228 MB -3 -9
4c / 2 GB 1228 MB -6 -12
2c / 4 GB 2457 MB -3 -9
4c / 4 GB 2457 MB -6 -12
8c / 8 GB 4915 MB -6 -19
16c / 16 GB 9830 MB -6 -22 --ultra
64c / 157 GB 94200 MB -9 -22 --ultra
1c / any (single-core) -1 -9

The 2-core cases now max at xz -3 / zstd -9, and 4-core at -6 / -12 — substantially lighter than the previous -6/-19 they would have hit. Single-core falls through to the old static defaults verbatim (xz -1, zstd -9).

The xz level-walk trace logged before each compression phase now shows both the memory and CPU constraint per preset, e.g.:

[ Compression xz level walk ] [ -9:cpu<8 skip; -6:cpu<4 skip; -3:64MB<=budget OK ]

So when something looks surprising on a given host, the reason is in the build log without an ssh trip.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/functions/image/compress-checksum.sh (1)

50-88: ⚡ Quick win

Preset table embedded twice — divergence risk when updating levels.

The selection loop (lines 50–59) and the trace loop (lines 72–88) parse the same 9:674:8 6:94:4 3:32:2 1:9:1 table and apply identical logic. If a new level or adjusted per-thread memory value is added to one loop, the other silently diverges, and the logged trace will no longer match the actual selection.

Merging into a single pass (trace everything, set xz_elastic_level on first fit) eliminates the duplication:

♻️ Proposed refactor: single combined loop
-	declare xz_elastic_level="1"
-	for entry in 9:674:8 6:94:4 3:32:2 1:9:1; do
-		lvl="${entry%%:*}"
-		rest="${entry#*:}"
-		pt="${rest%%:*}"
-		floor="${rest##*:}"
-		if (( compress_threads >= floor )) && (( compress_threads * pt <= mem_budget_mb )); then
-			xz_elastic_level="$lvl"
-			break
-		fi
-	done
-
-	# ...zstd block stays unchanged...
-
-	declare xz_walk_trace=""
-	for entry in 9:674:8 6:94:4 3:32:2 1:9:1; do
-		lvl="${entry%%:*}"
-		rest="${entry#*:}"
-		pt="${rest%%:*}"
-		floor="${rest##*:}"
-		need=$(( compress_threads * pt ))
-		if (( compress_threads < floor )); then
-			xz_walk_trace+="-${lvl}:cpu<${floor} skip; "
-			continue
-		fi
-		if (( need <= mem_budget_mb )); then
-			xz_walk_trace+="-${lvl}:${need}MB<=budget OK; "
-			break
-		else
-			xz_walk_trace+="-${lvl}:${need}MB>budget skip; "
-		fi
-	done
+	declare xz_elastic_level="1" xz_walk_trace=""
+	for entry in 9:674:8 6:94:4 3:32:2 1:9:1; do
+		lvl="${entry%%:*}"
+		rest="${entry#*:}"
+		pt="${rest%%:*}"
+		floor="${rest##*:}"
+		need=$(( compress_threads * pt ))
+		if (( compress_threads < floor )); then
+			xz_walk_trace+="-${lvl}:cpu<${floor} skip; "
+			continue
+		fi
+		if (( need <= mem_budget_mb )); then
+			xz_walk_trace+="-${lvl}:${need}MB<=budget OK; "
+			xz_elastic_level="$lvl"
+			break
+		else
+			xz_walk_trace+="-${lvl}:${need}MB>budget skip; "
+		fi
+	done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/functions/image/compress-checksum.sh` around lines 50 - 88, The xz preset
table is duplicated across two loops which can diverge; consolidate into a
single loop that iterates the same entries (e.g. "for entry in 9:674:8 6:94:4
3:32:2 1:9:1"), computes lvl/pt/floor and need once, appends the same trace
snippets to xz_walk_trace, and sets xz_elastic_level to lvl on the first entry
that satisfies (( compress_threads >= floor )) && (( need <= mem_budget_mb ))
then break; remove the separate selection loop so xz_elastic_level and
xz_walk_trace are derived from the same pass (use the existing variable names
compress_threads, mem_budget_mb, xz_elastic_level, xz_walk_trace to locate and
update the code).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/functions/image/compress-checksum.sh`:
- Around line 155-161: The unconditional addition of --long=27 to zstd_args
forces a 128MB decoder window for all compressed images; modify the logic in the
compression block (where zstd_args, zstdmt and zstd_level are used) to only
append --long=27 for high compression levels (e.g., when zstd_level >= 19) so
that lower levels (like 9–18) keep the smaller default window; keep the existing
--ultra handling (which already gates levels >=20) and ensure the conditional
mirrors that threshold to avoid increasing decompressor memory requirements for
low/medium levels.

---

Nitpick comments:
In `@lib/functions/image/compress-checksum.sh`:
- Around line 50-88: The xz preset table is duplicated across two loops which
can diverge; consolidate into a single loop that iterates the same entries (e.g.
"for entry in 9:674:8 6:94:4 3:32:2 1:9:1"), computes lvl/pt/floor and need
once, appends the same trace snippets to xz_walk_trace, and sets
xz_elastic_level to lvl on the first entry that satisfies (( compress_threads >=
floor )) && (( need <= mem_budget_mb )) then break; remove the separate
selection loop so xz_elastic_level and xz_walk_trace are derived from the same
pass (use the existing variable names compress_threads, mem_budget_mb,
xz_elastic_level, xz_walk_trace to locate and update the code).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8c6327ce-6623-4783-8da1-4314ba1e386e

📥 Commits

Reviewing files that changed from the base of the PR and between 4c394dd and afc1e8c.

📒 Files selected for processing (1)
  • lib/functions/image/compress-checksum.sh

Comment thread lib/functions/image/compress-checksum.sh
@iav
Copy link
Copy Markdown
Contributor

iav commented May 4, 2026

@igorpecovnik Thanks for the quick turnaround — the CPU-floor table makes the decision logic clear and flexible.

@github-actions github-actions Bot added the Ready to merge Reviewed, tested and ready for merge label May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

✅ This PR has been reviewed and approved — all set for merge!

@github-actions github-actions Bot removed the Needs review Seeking for review label May 4, 2026
Replace static `xz -T 0` / `zstdmt -9` with a CPU- and memory-aware
share so a single runner alone on a big box uses every core, peers
share fairly without OOMing each other, and small sibling artefacts
don't reserve per-thread memory they can't use.

Threads = full nproc (kernel time-slices when peer xz/zstd jobs contend;
we get the cores when they idle). Memory is fair-shared among peers:
budget = MemAvailable * 0.6 / (active+1).

The level walk requires BOTH a memory ceiling and a CPU floor — without
the floor, a 2-core ARM box with a few GB free would have escalated to
xz -6 / zstd -22 --ultra, which is much heavier on weak CPUs than the
old static defaults. Floors map per-thread compress throughput to
roughly equivalent wall-time bands:

  xz -9   needs 8 threads + memory
  xz -6   needs 4 threads + memory
  xz -3   needs 2 threads + memory
  xz -1   any (final fallback)

  zstd -22 --ultra  16 threads, 4 GB budget
  zstd -19           8 threads, 2 GB budget
  zstd -12           4 threads
  zstd -9            any (matches the old static default)

zstd's --long=27 (128 MB matching window) is gated to level >= 19. At
lower levels the size win doesn't justify the 128 MB decoder memory
requirement, and zstd's default windowLog (~4-16 MB) decompresses much
cheaper on small devices. --ultra stays gated at level >= 20.

Per-file thread cap on the xz path: file_size / block_size, where
block_size is ~3x the preset's dict (192 MB at -9, 48 MB at -7/8,
24 MB at -6, 12 MB at -3/4/5, 3 MB at -0/1). Threads beyond that
count sit idle and reserve per-thread memory for nothing — capping
trims ~21 GB on a 100 MB hyperv.zip at -9. zstd auto-scales workers
to input size, so it keeps the full compress_threads.

Stable images (BETA=no) force xz -0 — they deploy to our own infra
without a size cap and benefit most from speed. Nightly/user builds
use the elastic level to fit GitHub's 2 GB per-asset cap as snugly
as memory allows.

Diagnostics, four alerts before the loop:
  Compression host          nproc, loadavg, MemTotal, MemAvail
  Compression resource share  active peers -> threads, budget, picks
  Compression xz level walk   per-level cpu/budget skip/OK trace
  Compression input          file count + total MB

Plus per-file:
  Compressing with xz/zstd  level, threads_used/budgeted, file size,
                            block size, predicted peak MB
  Compressed                input -> output, ratio %, elapsed s, MB/s

Makes it possible to diagnose why a given build picked level X without
ssh to the runner. ZSTD_COMPRESSION_LEVEL / IMAGE_XZ_COMPRESSION_RATIO
overrides remain honored.
@igorpecovnik igorpecovnik force-pushed the compress-checksum-elastic branch from afc1e8c to 22ecfcc Compare May 4, 2026 21:10
@github-actions github-actions Bot added Needs review Seeking for review and removed Ready to merge Reviewed, tested and ready for merge labels May 4, 2026
@igorpecovnik
Copy link
Copy Markdown
Member Author

Follow-up — --long=27 is now also gated, so it only kicks in when the level can actually justify the decoder cost. Pushed in 22ecfcc0a.

zstd level flags emitted
-9 (default on weak hosts) -T<N>
-12 -T<N>
-15 (user override, mid range) -T<N>
-19 -T<N> --long=27
-22 -T<N> --long=27 --ultra

-9 / -12 keep zstd's default window (~4–16 MB), so plain zstd -d decompresses cheap on small devices. --long=27 (128 MB decoder window) only appears at -19+, mirroring the existing --ultra gate as you suggested.

@igorpecovnik igorpecovnik added Ready to merge Reviewed, tested and ready for merge and removed Needs review Seeking for review labels May 4, 2026
@igorpecovnik igorpecovnik merged commit 5644c7f into main May 5, 2026
12 checks passed
@igorpecovnik igorpecovnik deleted the compress-checksum-elastic branch May 5, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

05 Milestone: Second quarter release BSP Board Support Packages Desktop Graphical user interface Documentation Documentation changes or additions Framework Framework components GitHub GitHub-related changes like labels, templates, ... Ready to merge Reviewed, tested and ready for merge size/medium PR with more then 50 and less then 250 lines

Development

Successfully merging this pull request may close these issues.

2 participants