New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change pixel processing to use integer arithmetic. #68
Conversation
| "*1", // SCALE_1 | ||
| "*2", // SCALE_2 | ||
| "*4", // SCALE_4 | ||
| "/ 2", // DIVIDE_2 |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
There are lots of float->integer converstion in the code: |
|
@degasus Your suggestion about float->integer conversions actually fixed the outstanding issues in NBA live. Code review ftw! :) |
| indtevtrans[0] = s * indcoord[1]; | ||
| indtevtrans[1] = t * indcoord[1]; | ||
| shift = (17 - scale); | ||
| indtevtrans[0] = s * indcoord[1] / 256; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I addressed all but two of @degasus's comments. Apart from these (for which I'll wait for an answer), are there any other things blocking the merge? |
|
Breaks Qualcomm :P |
|
Onoes, breaks Qualcomm support, let's cancel the plans to merge TFN!!1!111!!1!111! |
|
Rebased branch on current master and went through the patches again such that (for the most) all of the patches are regression-free now. I also reworded some of the commit messages, since some of them don't apply anymore after these fixes (e.g. there was a series of three commits which was known to be broken at the time, but likely is not anymore, hence I removed the note from the commit message). For reference, the old branch was revision 83ba475 . I'll have this branch tested a final time by JMC, then it should be good to go. |
|
LGTM |
|
Not that I'm really paying any attention to Dolphin at the moment, but are you really merging a branch that "slows down performance quite a bit"? For a project which emphasizes rendering at high IR, slowdown at high IR sounds like a big deal. (Yes, I know, accuracy is usually a good thing. But what kind of regression are we talking about? Any approximate benchmarks on different GPUs?) |
|
@neobrain Are you sure that this "integer divide by 255" are accurate? I haven't found any accurate way to optimize such a division, neither for software nor hardware, so I doubt the wii gpu do. |
|
@degasus How else would I implement linear interpolation between two U8s? |
| "}\n"); | ||
|
|
||
| out.Write( "int idot(int4 x, int4 y)\n" | ||
| "{\n" |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Hm, I don't see another way to interpolate. But I still doubt the hardware does it in the current way either. |
| } | ||
| if (cc.clamp) | ||
| out.Write(", 0.0, 1.0)"); | ||
| out.Write(", int3(0,0,0), int3(255,255,255))"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@comex Ask JMC on IRC to get a picture of how big the regression is. AFAIK it's not "devastating" and the performance penalty scales with the IR, so 1x IR performance is mostly unaffected on reasonably fast GPUs. That said, future generations of GPUs will likely get better at integer math. |
|
@delroth Good enough? |
| object.Write("lacc.%s += %sdot(ldir, _norm0)) * " LIGHT_COL";\n", | ||
| swizzle, chan.diffusefunc != LIGHTDIF_SIGN ? "max(0.0," :"(", LIGHT_COL_PARAMS(lightsName, index, swizzle)); | ||
| object.Write("lacc.%s += int%s(round(%sdot(ldir, _norm0)) * float%s(" LIGHT_COL")));\n", | ||
| swizzle, swizzle_components, chan.diffusefunc != LIGHTDIF_SIGN ? "max(0.0," :"(", |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
…ll stored as float, though).
A few vague lines of comments cannot replace an afternoon reading of how TEV works.
…d with hardware tests.
The prefix was just required in the development stage to reduce the risk of regressions.
Most of these weren't even introduced by me, but hey - I'm nice and love wasting my time :p
Change pixel processing to use integer arithmetic.
|
Is it just me, or does this update actually improve speed for a lot of games? I'm running a Geforce 9800 and it seems like suddenly I'm getting full speed in F-Zero, Prime 2, and Mario Sunshine. Maybe it's just in my head... |
|
@LoganStromberg Might be that your 9800 was always running in low-power mode (because the nvidia driver doesn't recognize Dolphin as a demanding application). Now that GPU usage is up, the driver puts Dolphin into high-perf mode, hence (maybe) the increase in performance for you. |
Flipper uses fixed-point arithmetic for most of the pixel-processing calculations, while Dolphin currently uses floating-point arithmetic all over the place. There are three major issues with this:
value - 2.0 * round(0.5 * value * (256.0/255.0)) * (255.0/256.0)is the code theoretically required to emulate overflows of unsigned 8 bit integers. With this branch, we can just do "value & 0xFF" instead.This branch attempts to fix all of these issues by using actual integer arithmetic in our shaders instead of floating-point arithmetic. The most important implications of this are:
Second, regressions in the games NBA Live 2005/2006 have been reported where the floor is very glitchy (making that game unbearable to play). Another game has been reported to be broken slightly, but I already debugged the issue and found it to suffer from the same issue as NBA live.Those latter regressions have been fixed after initial code review :) Only the regression in Simpsons remains, but I consider the number of fixes in this branch to clearly outweigh that issue.The pipeline stages affected by this change are:
I hope the code is clear enough for everyone to make sense out of it. I fear actual understanding of it requires understanding of the Flipper GPU, but I'll gladly explain any questions about it.
NOTE: Please don't merge this branch until I give a green light. A blog article is still being worked on, and we want to publish that one roughly at the same time this branch gets merged.