Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

video_core: Attempt to remove the accurate multiplication setting #5216

Closed

Conversation

BreadFish64
Copy link
Contributor

@BreadFish64 BreadFish64 commented Apr 16, 2020

Why?

The accurate multiplication setting has very little performance impact, but sanitize_mul doesn't work properly on some GPUs. We obviously want games to work correctly all the time, so ideally we would get rid of this setting. Removing it would also simplify the shader cache and resolve a bug where it grows constantly with accurate multiplication off.

How?

From my understanding, the problems on some GPUS happen because mix with NaN inputs is undefined behavior. The current iteration of this PR doesn't get rid of mix entirely, but if it's necessary, this should definitely work:

    bvec4 condition = equal(ivec4(0), ivec4(isnan(lhs)) & ivec4(isnan(rhs)) & ivec4(not(isnan(product))));
    return vec4(
    condition.x ? 0.0 : product.x,
    condition.y ? 0.0 : product.y,
    condition.z ? 0.0 : product.z,
    condition.w ? 0.0 : product.w
    );

I'm just less confident in some drivers' abilities to optimize this, despite the Mali Compiler claiming it's .1 cycle faster.

In its current state, this PR only enables accurate multiplication by default so we can get user reports if it's not working. If this fixes the problems, the next steps will be

  • Remove the setting
  • Gut any related code from the disk shader cache.

This change is Reviewable

@BreadFish64
Copy link
Contributor Author

BreadFish64 commented Apr 16, 2020

After some performance testing on an Intel UHD 630, I noticed that it's faster to skip the full 0 * infinity = 0 check unless the result is NaN. It could be that the compiler is able to fold multiplication with uniforms more effectively, or the branch just isn't a big deal to the vertex processor. Either way, that brings the performance penalty from ~10% to almost negligible.

@wwylele
Copy link
Member

wwylele commented Apr 17, 2020

Flipping default value in the setting and do partial merge in canary (but not nightly) is going to end badly. Please make a brand new setting for testing.

@FearlessTobi
Copy link
Contributor

FearlessTobi commented Apr 17, 2020

Untagging till then.

@B3n30
Copy link
Contributor

B3n30 commented May 9, 2020

@BreadFish64 what's the status of this PR?

@BreadFish64
Copy link
Contributor Author

BreadFish64 commented May 9, 2020

It's not a priority at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants