v4.0d — LTS speed-polish release
Speed update for the v4.0 LTS line. Format unchanged from v4.0c — .pjg output is byte-exact, full backward compatibility with all v4.0/4.0a/4.0b/4.0c streams.
What's new
-
Link-Time Optimization —
-fltoadded to default build flags. Cross-translation-unit inlining of small accessors. -
Branch hints on the arithmetic-coder hot path. New
PJG_LIKELY/PJG_UNLIKELYmacros (gcc/clang__builtin_expect, MSVC fallback) annotate the non-escape branch inconvert_int_to_symboland the escape branch inconvert_symbol_to_int. -
New
make pgotarget — two-phase profile-guided build. Phase 1 compiles with-fprofile-generate, runs an encode + decode workload, and emits.gcdaprofile data. Phase 2 rebuilds with-fprofile-uselinking the profile. OverridePGO_WORKLOADto point at a richer corpus. -
New
make nativetarget — adds-march=native -mtune=nativefor max-performance binaries built from source. Distribution releases stay portable.
Benchmark (8.59 MB / 20-file JPEG corpus, mean of 5 runs)
| Build | Encode | Decode | Ratio Δ |
|---|---|---|---|
| v4.0c (baseline) | 5.26 s | 5.04 s | 0 (ref) |
| v4.0d default (LTO + branch hints) | 4.93 s | 4.83 s | 0 ✓ |
v4.0d make pgo (balanced) |
4.50 s | 4.44 s | 0 ✓ |
- Default: -6.3% encode / -4.2% decode, no rebuild ceremony required.
- PGO: -14.4% encode / -11.9% decode, two-phase make.
All variants roundtrip-verified bit-exact 20/20 vs v4.0c.