Make PCF Loop Bounds Constant #22
Labels
Priority: Low
Project: Renderer
Issues relating to the core hair renderer itself.
Type: Optimize
Suggestion to optimize an specific algorithm.
Type: Refactor
Make this thing a bit less shitty, pretty please!
I've found that we can save 3-4ms on the strand self-shadows of the ADSM approach if the PCF kernel radius is known at shader compilation time, it looks like
glslc
is unrolling the loop and making more assumptions. I know this for sure makes a difference on my laptop, I still need to profile the benefits on the AMD-machine (I've heard that performance can sometimes even decrease if the compiler is too aggressive in doing loop unrolling on GCN, so I'll profile to make sure this is worth it). For reference, on my laptop, in the worst case (i.e. hair occupies the entire screen) I get ~9.8ms for rendering the ponytail, without any shadows, I get 5.8ms, and without any shading at all, 4.8ms (an unavoidable cost of rasterizing 1.8M lines). So a performance breakdown gives us:By having constant loop bounds the entire pass instead takes ~6.8ms, and therefore we'll get these results:
For further reference, on the beefy AMD-machine, the unoptimized version (that took 9.8ms on my laptop), takes around 1.5ms for the entire pass (i.e. total cost is 1.5ms). I haven't tested the optimized version yet, but if we assume the speedup translates to AMD-hardware as well, we'll go from 1.5ms to ~1ms. This will leave us more room to do other cool things (and we still need to think about leaving time for the OIT), so I think this optimization is worth trying out.
I'll see if I can make the offending calculation go away, or just assume that we use 3x3 kernels all the time. A way would be to use specialization constants, and re-compile the shaders when the PCF radius is modified.
The text was updated successfully, but these errors were encountered: