New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement Bounding Box calculation using the software renderer. #1095
Conversation
|
I'll test this. Are there any other known games than the Paper Mario series to check? |
|
Yes, Mickey's Magical Mirror and Disney's Hide & Sneak also use Bounding Box for the character's shadows. |
|
Do I have to do anything special to enable it? In Paper Mario and the Thousand Year Door, the punies were glitched up even worse than master when I went and tried it? I don't have any saves in other areas where there were problems at the moment, so I haven't been able to test them. |
|
One way to implement z testing would be to cache all triangle to be rendered. Then when you want to check the depth of a pixel, you work out which triangles intercept that pixel and calculate their depths at that pixel. Wouldn't even need to run any of Tev for those triangles, unless they have late_depth and ztextures. |
|
@JMC47 that is strange, punies are working for me! It needs EFB to RAM enabled to work, though. @phire I'm not sure I understand your idea. z-testing is harder because it depends on what is rendered before bbox is calculated, and since rendering is done in hardware I'm not really sure how to implement that. Also, I'll be pulling a new commit that allows the code to build on Linux (I forgot to add BoundingBox.cpp to CMake). |
|
I must have done something wrong the first time, I replayed up there and they were fine. Maybe a settings issue! I made sure to be on EFB2Ram the second time. Nevermind my first response. Everything appears to be working correctly, though we probably need to force EFB2Ram in Bounding Box games |
|
@crudelios But just doing the caching and transforming the vertices in software could get quite costly. |
|
There's really absolutely no performance issue in Paper Mario: The Thousand Year Door - I still get well over 200 fps in the game. |
|
In general, how many triangles are generally inside a bounding box query? Because if that number is lots, you might be better of doing a delayed calculation. Sort all the triangles into a spacial data structure until the bounding box is queried and then you should be able to get away with only running the outer fringes of the outer polygons through TEV. |
|
Can you please test the speed? It's rather slow on my aging Intel Core i5 750 at stock speeds. Mario's flip mode is especially prone to slowdowns, as well as entering/exiting pipes. @phire from what I can tell, there aren't many triangles drawn during a bounding box query. I'll confirm it though! |
|
mmhmm; I didn't notice it. During transitions there's a heavy slowdown (to about 80ish fps) for me, but I didn't notice because I was still above full speed. Again, sometimes I don't know what I'm performance testing! So, this does bring a fairly hard hit to speed then. If I disable bounding box emulation, it's closer to 124 - 130 fps during a transition. With it on, I drop to around 78 - 84 fps during transitions. |
| // four loops are run: one for the top pixel, one for the left, one for | ||
| // the bottom and one for the right. As soon as a pixel that is to be | ||
| // drawn is found, the loop breaks. This enables a ~150% speedbost in | ||
| // bbox calculation, albeit at the cost of some ugly repetitive code. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
It looks like you are currently triggering the bounding box calculations during vertex loading, which is delaying the delivery of vertices to the GPU and potentially starving it of things to render. The introduction of Software renderer and Tev into the vertex translation loop also significantly increases the L1 instruction cache usage of this loop (which was already very expensive), delaying the movement of vertices to the GPU even further. There are potential speed increases to be found by spitting this loop into two parts, Transforming the vertices and sending them to the gpu, then while the gpu is busy rendering those vertices you loop over them again and send them through software renderer for the bounding box calculations. From that point (if there were still performance issues), it would be reasonably easy to split the bounding box calculations into a separate thread, to run on another cpu core. |
|
@dolphin-emu-bot rebuild |
|
You should look into those lint errors. |
|
@phire I had no idea includes had to be added alphabetically, hopefully by changing their order the problem will be fixed. It compiles on linux here, so I thought it would build. Guess I was wrong. About your suggestions, I checked the numbers and, in average, ~60 triangles are drawn during a bbox query in Paper Mario (min was 22, max was about 160). Delaying the calculation might actually work, if I manage to figure out a good and cheap way to sort the triangles. Also, I can change the code so vertexes are only checked right after they are flushed to the renderer and see if that improves speed. |
|
@phire I've been trying the suggestions you mentioned and unfortunately for many of them I didn't manage to get any increase in performance. Multi-threading didn't help either, as bbox calculations went haywire (that could probably be fixed with proper thread signalling) and the performance didn't improve. Skipping tev when alpha testing is disabled is a great idea and it does increase performance in various cases (not in Mario's flip mode or pipe transitions, though). |
|
So, we're at the point where this is more accurate, but makes things much slower? I can confirm this fixes all the outstanding glitches in Paper Mario Thousand Year Door that I know about. |
|
Yes, that's it. I have a few small optimizations pending, and I've been trying different approaches without any luck so far. |
|
Small git question, is it ok to rebase my branch (using "git rebase -i upstream/master") so the PR merges? |
|
@crudelios: Once comments are addressed, it should be ok to do so. Try not to rebase when there are still comments pending, or they will be hidden/dropped from the diffs. |
|
This looks good to me now. I tested the latest version and it doesn't dip down as much anymore. I don't drop below 150 - 170 fps during transitions now. |
|
@FioraAeterna if you want, issue 6168 has a savefile for Paper Mario TTYD (NTSC version) where you can test the bounding box performance (use mario's flip mode or enter pipes to trigger bounding box calculation). @BhaaLseN thanks, I won't rebase for now then! |
|
I've been testing pipe positions. Originally in this PR, I was getting around 70 fps during transitions, and now it drops to about 150 fps. I don't know if there was anything to make it faster though. Does internal resolution or anything matter? Or did you really optimize it that much with small optimizations? |
|
Hummm... I hasn't expecting such a speed improvement. I did manage to reduce the number of alpha tests, but nothing major. In my computer I get a small speed difference. However, I noticed that using dual core makes FPS differ from VPS, so the game may seem to go faster when in fact it goes slower. That may be the cause for such a perceived jump in performance. Try disabling dual core, that way FPS will always be equal to VPS and the actual performance hit should be more noticeable. Also, I think Mario's flip mode is better for testing bbox than pipe transitions, because you can use it for as long as you want/where you want. |
|
I'm keeping an eye on the FPS/VPS; I know to watch for it! But, you're right, I'll try it and observe behaviors. If the performance is what I think it is compared to master, then I think this would be okay with the current performance. |
|
@crudelios You are correct about the dualcore thing, that's nasty, it doesn't show up in FPS/VPS, but does show up in gameplay. Turning on Dolphin's FPS counter helps, though. I am also correct; Internal resolution affects the speed. At 1x IR, there is absolutely no speed loss on my computer. at 2x IR, it starts to seriously cause slowdowns (Single Core, but, seems to be about the same as dualcore anyway.) to about 70 fps. at 3x IR it drops to about 34 fps. Is this acceptable? That's up to you and reviewers, really. I think it is. |
|
This patch shouldn't be effected by IR, software renderer has a fixed 1x IR. |
|
@crudelios I never benchmarked Master for some reason. If it's only that little of a tradeoff, I really think it should be okay. It can be optimized later if need be. |
|
@degasus when used, bbox values are typically read once or twice per frame. Bbox itself is only used sparingly in 4 games. I must confess I didn't know about atomicMin/atomicMax, I only checked atomic counters which don't really work. After reading about atomicMin/atomicMax, I feel that using them poses two problems: first, they are slow as they will be called for every drawn pixel when bbox is active, which will stall the gpu as atomicMin/atomicMax cannot be run in paralel. I don't know enough about shaders to be sure, though. |
| @@ -312,11 +313,27 @@ static void BuildBlock(s32 blockX, s32 blockY) | |||
| } | |||
| } | |||
|
|
|||
| inline void PrepareBlock(s32 blockX, s32 blockY) | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I think because this fixes a hang in Super Paper Mario, the amount of slowdown it causes really shouldn't matter that much considering the benchmarks show the old bounding box emulation was quite a bit slower. Considering how many issues there are with the old bounding box solution, this should be okay, right? A roughly 25% slowdown in very few games(4? I know of the two Paper Mario games on Magical Mirror ,but what's the fourth?) should be acceptable to finally get the emulation working properly in them. |
|
The fourth game is Disney's Hide & Sneak for GC, it works like Disney's Magical Mirror. |
|
Ah, okay. So, if my testing is correct and you're correct about it fixing Super Paper Mario's crash, if there anything left to do? Everything looks good for me, just the performance conundrum, but even that I'm not too concerned about. If users are using dualcore, they don't notice the slowdown because the audio is still going full speed... that's just how it works sometimes. |
|
Also, if this doesn't affect performance outside of the 4 bounding box games, could we just remove the option altogether and just have this always on? |
|
Apart from the performance issues and the lack of z-testing (which doesn't seem to be needed anyway, but could be important for the sake of compatibility), there is nothing left to do, as far as I can tell. The option was there from the days when bbox was more of a hack than an actual feature. I guess it's safe to remove the option now, maybe it fixes some issues in other games that might use bbox as well. |
|
You need to rebase, though. I'll do the performance testing to see if we can make another PR to just make Bounding Box always on. |
|
Tested a few games that are GPU heavy, no noticeable performance changes by having Bounding Box on or Off. I did hear some concerns about games enabling bounding box and doing nothing with them, and having them on by default causing issues. I haven't found a scenario like that yet, though. Either way, everything is LGTM at this point! |
|
I'm rebasing the branch so it can be merged. I agree with you that removing the option to disable bbox should be in another PR, so it can be fully tested. |
|
Can this close all the Paper Mario Issues on there once it's merged? The only one I'm not sure on is the viewport one, and I don't have a savefile around there to check. |
|
Which one is the viewport issue? Apart from that, I think all bbox related issues should be gone, but to confirm that the whole game would have to be replayed... |
|
It's during the chapter where you're in the fighting competition place. You sneak into the vents and spy on a guy, and during that you can only see a quarter of the screen. It has almost the same behavior as the posters that you blow off later in the chapter. |
|
Oh, I checked that out a few months ago with an older version of the code and it was working properly, so I'm guessing it still does. I too ave no savefile to confirm it though.. |
|
Okay, I didn't know if the older build fixed it either. That's enough confirmation for me, as I figured the old one would have affected it. |
|
I have a save file before the point where you drop down the vents and spy on Grubba, but I cannot download the build from the buildbot to test it out and I don't know how to manually build it. Here is a link to my memory card: http://www.mediafire.com/download/1l5y5iyxit40o3z/MemoryCardA.USA.raw.zip Unfortunately it is not right before this point as it was saved before the 10th match in the glitz pit (used to test for the yoshi gulp regression), so you will need to play a bit to get to the point where you spy on Grubba. Hope this helps! |
|
thank you so much! |
|
@dolphin-emu-bot rebuild |
|
Confirmed fixed thanks to the save file. The only bug I ran into was that crowd members holding boulders would effectively make every member of that type (toad, koopa, etc.) hold the boulder until you attack them. It was unreproduceable after using safe texture cache, but I felt like mentioning it anyway. This effectively closes every bounding box related issue in Thousand Year Door.. I was able to test a ton of the glitches thanks to the save file being right in the middle of the game. Amazing job. It also maintained full speed on my computer, even with safe texture cache at 1x IR, even though I'm sure that doesn't matter much in terms of mergability, I felt it would be nice to mention. |
Reimplement Bounding Box calculation using the software renderer.
This PR completely changes the implementation of bounding box calculation.
In the current master, bbox is calculated by finding a primitive's edges and using those values as a comparison for the bounding box calculation.
However, that code is innacurate as pixels inside the primitive may not be drawn, mainly because the old code bypassed alpha testing and z-testing (for example, if the primitive's color or texture is transparent, it is not drawn and bounding box regs should not be changed). This caused issues with the punies in Paper Mario: TTYD and also caused a hang in Super Paper Mario's world 6-1.
As far as I can tell, there is no way to find which pixels have been drawn using the hardware renderers, so I resorted to the software renderer to calculate if a pixel is going to be drawn by the hardware renderer. It works by reading the vertex buffer from VertexLoader and passing all the required vertex information (matrices, position, normals, colors and texture coordinates) to the software renderer's SetupUnit.
Since the new code relies heavily on the software renderer, bbox calculation is twice to three times as slow as the old implementation (there isn't any decrease in performance when bbox is not needed, though). However, as far as I can tell, it is nearly fully accurate (with the exception of z-testing, which cannot be trivially implemented) and it fixes all the remaining bounding box bugs.
Please note that this changes the software renderer code a little bit. I'm not entirely happy with the result, and I'm open to comments about how integration with the software renderer could be improved.