Big Picture uses unoptimised shaders (YUV, particle) on Linux and probably Mac #2574

Closed
johndrinkwater opened this Issue Jun 8, 2013 · 7 comments

Comments

Projects
None yet
3 participants
Member

johndrinkwater commented Jun 8, 2013

Reported by @siplus in a pull request #2569

YUV shader

In the YUV shader used by Big Picture on Linux (and, I think, Mac), I replaced scalar operations with vector ones, because vector operations are mostly single instructions and are faster than their unrolled scalar equivalents.

Here is the original code (uniforms, varyings and compiler directives are omitted):

void main (void) 
{
    vec2 texHalf = tex.st/2;
    float y = texture2DRect( Texture0, tex.st ).r;
    float u = texture2DRect( Texture1, texHalf ).r;
    float v = texture2DRect( Texture2, texHalf ).r;

    y = 1.1643*(y-0.0625);
    u = u-0.5;
    v = v-0.5;

    gl_FragColor.r = y+1.5958*v;
    gl_FragColor.g = y-0.39173*u-0.81290*v;
    gl_FragColor.b = y+2.017*u;
    gl_FragColor.a = 1.0;
}

And here is the vectorized code:

const vec3 mulRed = vec3( 1.0, 0.0, 1.5958 );
const vec3 mulGreen = vec3( 1.0, -0.39173, -0.8129 );
const vec3 mulBlue = vec3( 1.0, 2.017, 0.0 );

void main (void) 
{
    vec2 texHalf = tex.st * 0.5;
    vec3 yuv = vec3( 
        1.1643 * texture2DRect( Texture0, tex.st ).r + 0.42723,
        texture2DRect( Texture1, texHalf ).r,
        texture2DRect( Texture2, texHalf ).r, 1.0 ) - 0.5;
    gl_FragColor = vec4( dot( yuv, mulRed ), dot( yuv, mulGreen ), dot( yuv, mulBlue ), 1.0 );
}

As you see, I removed multiple passes for y, u and v. dot also usually takes one instruction (and one cycle), so it is used instead of manual multiply-accumulate.

The number 0.42723 is 0.5 - 1.1643 * 0.0625, precalculated so - 0.5 can be used on the entire vector.

The same unoptimized code (with some very little differences in texture lookups) is also used in fancyquaduber.frag at #elif defined(TEXTURETYPE_YUV), so the same optimization can apply there. Maybe I'll optimize the shader later too, but it's really huge.

Particle shader

The particle shader in the pull request is optimized much higher than the YUV shader.

13 statement lines with branching were replaced with 4 lines without branching.

The original code:

    vec4 texcol = color;

    vec2 uv = tex.st - 0.5;

    float radius = sqrt( dot( uv, uv ) );

    float flSharpRadius = ( clamp( particleSharpness, 0.0, 0.98 ) ) / 2.0; 
    float alpha = 1.0; 
    if ( radius < flSharpRadius )
    {
        alpha = 1.0;
    }
    else
    {
        alpha = clamp( (1.0 - ( (radius - flSharpRadius) / (0.5 - flSharpRadius ) ) ), 0.0, 1.0 );
    }

    gl_FragColor.r = color.r * color.a * alpha;
    gl_FragColor.g = color.g * color.a * alpha;
    gl_FragColor.b = color.b * color.a * alpha;
    gl_FragColor.a = color.a * alpha;

The optimized code:

    vec2 uv = tex.st - 0.5;
    float radius = sqrt( dot( uv, uv ) );
    float flSharpRadius = clamp( particleSharpness, 0.0, 0.98 ) * 0.5; 
    gl_FragColor = color * vec4( color.aaa, 1.0 ) *
        mix( 1.0, clamp( 1.0 - ( radius - flSharpRadius ) / ( 0.5 - flSharpRadius ), 0.0, 1.0 ), step( flSharpRadius, radius ) );

Branching is a very heavy performance dropper, because pixels are processed in batches in parallel. It's much heavier than redundant arithmetic and clamping.

If there's no branching, the GPU can decode and use one instruction for multiple pixels.

If there is branching, the GPU will need to execute different instructions for different pixels.

Triang3l commented Jun 8, 2013

Why doesn't the name say ANYTHING about code being attached?

Please let @alfred-valve choose what to do with the pull request.

Member

johndrinkwater commented Jun 8, 2013

The name doesn’t need to, when the bug is read it’ll be considered, alfred doesn’t work on big picture. I’ve asked you to read our conduct policy, please do so.

@ghost ghost self-assigned this Jun 9, 2013

I added one more commit to the pull request referenced in this issue.

Mostly micro-optimizations (such as 1-argument vector constructors), but I also merged my YUV code with fancyquaduber.frag.

You can test the shaders by overriding them while Steam is running in desktop mode.
Restarting Steam brings back the old shaders.
chattr +i doesn't work, it only makes Steam update infinitely.

Owner

Plagman commented Jul 29, 2013

Hi SiPlus,

Thanks a lot for your interest. Do you have any data showing real performance gains from these changes? As far as the particle shader goes, these aren't "real" branches and the compiler should fold them into a construct similar to your re-factored version. It's just a lot more legible to have the shaders laid out that way, I'm afraid merging this would only make the code harder to grok. Thanks for taking the time to look into it, however.

@Plagman Plagman closed this Jul 29, 2013

Triang3l commented Aug 9, 2013

Never rely on the compiler on OpenGL, because every driver has its own compiler. It's not Direct3D's single FXC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment