Performance
Major rendering speed improvements across the PNG/JPEG pipeline.
Gradient Strip Precomputation
Instead of computing color interpolation per-pixel, gradients now build a 1D color lookup table along the gradient line and sample from it. This eliminates redundant srgbToLinear/linearToSrgb conversions and interpolateLinearStops calls.
- Linear gradients: 8x faster (49ms → 6ms at component level)
- Radial gradients: 7x faster (54ms → 8ms at component level)
Buffer Pooling (sync.Pool)
Temporary full-viewport RGBA images (3MB+ each at 1200×630) used for opacity compositing, border-radius clipping, overflow:hidden, and border rendering are now pooled and reused instead of allocated fresh each time.
- Up to 56% memory reduction per render
Rounded Mask Cache
Supersampled anti-aliased border-radius masks are cached by (width, height, tl, tr, br, bl). Elements sharing the same dimensions and border-radius reuse a single cached mask instead of recomputing.
Parallel Box Blur
Shadow blur passes now split rows/columns across goroutines with direct Pix slice access, eliminating method call overhead.
- Shadow blur: 4.3x faster (5ms → 1.15ms at component level)
End-to-End Results
Benchmarked on AMD Ryzen 5 5600H, 1200×630 PNG output:
| Fixture | Before | After | Improvement |
|---|---|---|---|
| Linear gradient | 145ms | 100ms | 31% faster |
| Radial gradient | 319ms | 266ms | 17% faster |
| Box shadow | 33ms | 24ms | 28% faster |
| Product pricing | 129ms | 83ms | 36% faster |
| Blog card | 73ms | 53ms | 27% faster |
| Dashboard stat | 24ms / 12MB | 19ms / 5MB | 22% faster, 58% less memory |
Component-level benchmarks added in render/bench_test.go for ongoing tracking.