Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change pixel processing to use integer arithmetic. #68

Merged
merged 45 commits into from Mar 14, 2014

Conversation

neobrain
Copy link
Member

Flipper uses fixed-point arithmetic for most of the pixel-processing calculations, while Dolphin currently uses floating-point arithmetic all over the place. There are three major issues with this:

  • integer overflows need to be emulated with cumbersome hacks. E.g. value - 2.0 * round(0.5 * value * (256.0/255.0)) * (255.0/256.0) is the code theoretically required to emulate overflows of unsigned 8 bit integers. With this branch, we can just do "value & 0xFF" instead.
  • integers don't have a fractional part, while floats of course do: This is causing huge issues with the alpha testing code. E.g. when the alpha test only succeeds if the test value is EQUAL to the reference value, the current Dolphin code hence checks if the test value is "close enough" to the reference value (with a very badly defined sense of "closeness"). This branch can use the == operator for this just fine.
  • 32 bit floats only have 23 bits of mantissa, hence they cannot be used to describe 24 bit integers properly.

This branch attempts to fix all of these issues by using actual integer arithmetic in our shaders instead of floating-point arithmetic. The most important implications of this are:

  • Vastly improved compatibility: More than 20 games have been reported to be in some way fixed - it's expected that a much larger number of games is in some way affected. Usually the previous code had small mistakes which only showed up slightly (e.g. the goop bubbles in Super Mario Sunshine), other issues were very noticeable (e.g. the pink water highlights in Zelda WW) but harmless, and in quite a few cases objects just completely disappeared (e.g. due to the alpha testing issues pointed out above).
  • Vastly cleaner code. This is because we don't have to use the incredibly confusing code required to emulate integer overflows with floats anymore. Additionally, a lot of magic numbers which were required for proper normalization before are no longer needed, or at least make actual sense now (e.g. the magic numbers used for calculating the texture coordinate output of indirect tev stages were just plain wrong before).
  • Decreased performance: Integers have a small performance penalty compared to floats (tl;dr; GPUs use the same ALUs for both, hence ALUs sometimes need to be double-pumped for proper integer functionality) and hence this branch slows down performance quite a bit. The penalty scales with the IR, so hopefully the slowdown is not too dramatic at native resolution. (Note that this slowdown is only partially due to the reduced integer performance, it's also caused by the more accurate implementation - hence a "floating point" version of this branch would've seen a similar slowdown)
  • Regressions: There is a very small number of known regressions in this branch. First, the Simpsons ( http://code.google.com/p/dolphin-emu/issues/detail?id=4570 ) is completely broken now (more consistent with the software renderer though, so I'd guess we're still lacking understanding of hardware there). Second, regressions in the games NBA Live 2005/2006 have been reported where the floor is very glitchy (making that game unbearable to play). Another game has been reported to be broken slightly, but I already debugged the issue and found it to suffer from the same issue as NBA live. Those latter regressions have been fixed after initial code review :) Only the regression in Simpsons remains, but I consider the number of fixes in this branch to clearly outweigh that issue.

The pipeline stages affected by this change are:

  • The whole tev pipeline: inputs and outputs of color combiners in direct stages; calculated texture coordinates from the indirect stages; konstant colors (sic); alpha bump values; ...
  • alpha testing (which uses the tev output as its input)
  • lighting: Materials and calculated color values are integers now.
  • ConstantManager: Most of the pixel shader inputs are integers now.
  • z textures: depth is converted to int, then z textures are applied, then it's converted back to float.
  • fog: fog color is integer now
  • probably a few other things that I just missed

I hope the code is clear enough for everyone to make sense out of it. I fear actual understanding of it requires understanding of the Flipper GPU, but I'll gladly explain any questions about it.

NOTE: Please don't merge this branch until I give a green light. A blog article is still being worked on, and we want to publish that one roughly at the same time this branch gets merged.

"*1", // SCALE_1
"*2", // SCALE_2
"*4", // SCALE_4
"/ 2", // DIVIDE_2

This comment was marked as off-topic.

This comment was marked as off-topic.

@degasus
Copy link
Member

degasus commented Feb 15, 2014

There are lots of float->integer converstion in the code:
int myint = int(myfloat);
But everytime we've calucate anything with the float, we must round to get the correct result:
int myint = int(round(myfloat));

@neobrain
Copy link
Member Author

@degasus Your suggestion about float->integer conversions actually fixed the outstanding issues in NBA live. Code review ftw! :)

indtevtrans[0] = s * indcoord[1];
indtevtrans[1] = t * indcoord[1];
shift = (17 - scale);
indtevtrans[0] = s * indcoord[1] / 256;

This comment was marked as off-topic.

This comment was marked as off-topic.

@neobrain
Copy link
Member Author

neobrain commented Mar 1, 2014

I addressed all but two of @degasus's comments. Apart from these (for which I'll wait for an answer), are there any other things blocking the merge?

@Sonicadvance1
Copy link
Contributor

Breaks Qualcomm :P

@delroth
Copy link
Member

delroth commented Mar 2, 2014

Onoes, breaks Qualcomm support, let's cancel the plans to merge TFN!!1!111!!1!111!

@neobrain
Copy link
Member Author

neobrain commented Mar 4, 2014

Rebased branch on current master and went through the patches again such that (for the most) all of the patches are regression-free now. I also reworded some of the commit messages, since some of them don't apply anymore after these fixes (e.g. there was a series of three commits which was known to be broken at the time, but likely is not anymore, hence I removed the note from the commit message).

For reference, the old branch was revision 83ba475 .

I'll have this branch tested a final time by JMC, then it should be good to go.

@degasus
Copy link
Member

degasus commented Mar 5, 2014

LGTM

@comex
Copy link
Contributor

comex commented Mar 5, 2014

Not that I'm really paying any attention to Dolphin at the moment, but are you really merging a branch that "slows down performance quite a bit"? For a project which emphasizes rendering at high IR, slowdown at high IR sounds like a big deal.

(Yes, I know, accuracy is usually a good thing. But what kind of regression are we talking about? Any approximate benchmarks on different GPUs?)

@degasus
Copy link
Member

degasus commented Mar 5, 2014

@neobrain Are you sure that this "integer divide by 255" are accurate? I haven't found any accurate way to optimize such a division, neither for software nor hardware, so I doubt the wii gpu do.

@neobrain
Copy link
Member Author

neobrain commented Mar 5, 2014

@degasus How else would I implement linear interpolation between two U8s?

"}\n");

out.Write( "int idot(int4 x, int4 y)\n"
"{\n"

This comment was marked as off-topic.

@degasus
Copy link
Member

degasus commented Mar 5, 2014

Hm, I don't see another way to interpolate. But I still doubt the hardware does it in the current way either.

}
if (cc.clamp)
out.Write(", 0.0, 1.0)");
out.Write(", int3(0,0,0), int3(255,255,255))");

This comment was marked as off-topic.

This comment was marked as off-topic.

@neobrain
Copy link
Member Author

neobrain commented Mar 6, 2014

@comex Ask JMC on IRC to get a picture of how big the regression is. AFAIK it's not "devastating" and the performance penalty scales with the IR, so 1x IR performance is mostly unaffected on reasonably fast GPUs.

That said, future generations of GPUs will likely get better at integer math.

@neobrain
Copy link
Member Author

neobrain commented Mar 6, 2014

@delroth Good enough?

object.Write("lacc.%s += %sdot(ldir, _norm0)) * " LIGHT_COL";\n",
swizzle, chan.diffusefunc != LIGHTDIF_SIGN ? "max(0.0," :"(", LIGHT_COL_PARAMS(lightsName, index, swizzle));
object.Write("lacc.%s += int%s(round(%sdot(ldir, _norm0)) * float%s(" LIGHT_COL")));\n",
swizzle, swizzle_components, chan.diffusefunc != LIGHTDIF_SIGN ? "max(0.0," :"(",

This comment was marked as off-topic.

A few vague lines of comments cannot replace an afternoon reading of how TEV works.
The prefix was just required in the development stage to reduce the risk of regressions.
Most of these weren't even introduced by me, but hey - I'm nice and love wasting my time :p
neobrain added a commit that referenced this pull request Mar 14, 2014
Change pixel processing to use integer arithmetic.
@neobrain neobrain merged commit a9a8c73 into dolphin-emu:master Mar 14, 2014
@lostromb
Copy link

Is it just me, or does this update actually improve speed for a lot of games? I'm running a Geforce 9800 and it seems like suddenly I'm getting full speed in F-Zero, Prime 2, and Mario Sunshine. Maybe it's just in my head...

@neobrain
Copy link
Member Author

@LoganStromberg Might be that your 9800 was always running in low-power mode (because the nvidia driver doesn't recognize Dolphin as a demanding application). Now that GPU usage is up, the driver puts Dolphin into high-perf mode, hence (maybe) the increase in performance for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
8 participants