Skip to content

v4.0d — LTS speed-polish release

Choose a tag to compare

@YadeWira YadeWira released this 06 May 15:33
· 7 commits to master since this release

Speed update for the v4.0 LTS line. Format unchanged from v4.0c — .pjg output is byte-exact, full backward compatibility with all v4.0/4.0a/4.0b/4.0c streams.

What's new

  1. Link-Time Optimization-flto added to default build flags. Cross-translation-unit inlining of small accessors.

  2. Branch hints on the arithmetic-coder hot path. New PJG_LIKELY / PJG_UNLIKELY macros (gcc/clang __builtin_expect, MSVC fallback) annotate the non-escape branch in convert_int_to_symbol and the escape branch in convert_symbol_to_int.

  3. New make pgo target — two-phase profile-guided build. Phase 1 compiles with -fprofile-generate, runs an encode + decode workload, and emits .gcda profile data. Phase 2 rebuilds with -fprofile-use linking the profile. Override PGO_WORKLOAD to point at a richer corpus.

  4. New make native target — adds -march=native -mtune=native for max-performance binaries built from source. Distribution releases stay portable.

Benchmark (8.59 MB / 20-file JPEG corpus, mean of 5 runs)

Build Encode Decode Ratio Δ
v4.0c (baseline) 5.26 s 5.04 s 0 (ref)
v4.0d default (LTO + branch hints) 4.93 s 4.83 s 0 ✓
v4.0d make pgo (balanced) 4.50 s 4.44 s 0 ✓
  • Default: -6.3% encode / -4.2% decode, no rebuild ceremony required.
  • PGO: -14.4% encode / -11.9% decode, two-phase make.

All variants roundtrip-verified bit-exact 20/20 vs v4.0c.