New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [Not for Merge] Vertex Ubershaders #3185
Conversation
I know it's not ready for public consumption but I would like to thank you anyway phire, Tatsunoko Vs Capcom stuttering is hugely improved, it still have a little stutter here and there with this PR but compared to master where every single action you do in the game would cause various shader compilation stuttering it's thousands of times better... |
Someone's reports about that game made it a testing candidate for the ubershaders. If you were the one who reported the severe shader generation issues in the game, I thank you for bringing it to our attention. |
I hope you tested the other PR. This one currently has no ubershaders at all enabled. |
Yeah, a majority of the stuttering in TvC is actually pixel shaders. The comment by @mbc07 points toward them using the other PR, as i have the same experience. |
I tested the main Ubershaders PR as well, it also improves TvC (despite some funny colors in some character's faces probably because it's no finished)... |
The funny characters on faces should be fixed. I tried a few random characters and it was working. The rest of the stuttering is vertex shaders being generates, so once it's done the game should NEVER stutter! |
6539c23
to
5d9549e
Compare
The only code which touches xfmem is code which writes directly into uid_data. All the rest now read their parameters out of uid_data. I also simplified the lighting code so it always generated seperate codepaths for alpha and color channels instead of trying to combine them on the off-chance that the same equation works for all 4 channels. As modern (post 2008) GPUs generally don't calcualte all 4 channels in a single vector, this optimisation is pointless. The shader compiler will undo it during the GLSL/HLSL to IR step. Bug Fix: The about optimisation was also broken, applying the color light equation to the alpha light channel instead of the alpha light euqation. But doesn't look like anything trigged this bug.
This frees up 21 bits and allows us to shorten the UID struct by an entire 32 bits. It's not strictly needed (as it's encoded into the length) but I added a bit for per-pixel lighiting to make my life easier in the following commits.
Bug Fix: The normal stage UIDs were randomly overwriting indirect stage texture map UID fields. It was possible for multiple shaders with diffrent indirect texture targets to map to the same UID. Once again, it dpesn't look like this bug was ever triggered.
Bug Fix: It was theoretically possible for a shader with depth writes disabled to map to the same UID as a shader with late depth writes. No known test cases trigger this.
Bug Fix: Previously vertex shaders and geometery shaders didn't track antialaising state in their UIDs, which could cause AA bugs on directx.
As much as possible, the asserts have been moved out of the GetUID function. But there are some places where asserts depend on variables that aren't stored in the shader UID.
Note: It's not 100% perfect, as some of the GPU capablities leak into the pixel shader UID. Currently our UIDs don't get exported, so there is no issue. But someone might want to fix this in the future.
Kind of pointless now that multiple shaders with the same UID are now fundementally impossible.
Or anything else which doesn't use textures (Basically nothing)
This allows a large number of games to be semi-playable. Also fixed up which registers Konst was being written to.
I did this mostly so it would work on llvmpipe and fifoci.
This fixes up the remaining alpha problems, particually in n64 games.
See Source/Core/VideoCommon/DriverDetails.h:140 for the horriable details
Until now we have generated 1 ubershader for one PixelShaderGen UID which meant it would compile excatly as many shaders as before. Oh and these shaders were way more complex than the old shadergen shaders, so we were spending way more time compiling. The llvmpipe based FifoCI's runtime had jumped from 6min to 36min. With this commit we generate much simpiler uber shader uid with only 4 bits of state (16 shaders). We still generate them at runtime when needed, but only ever 16 of them. 16 is not the final number.
So I pulled up a profiler, and found that UberShaders were completly memory bound, because the ColorInput and AlphaInput arrays were stored in main memory. Worked fine at 1xIR, but by 2xIR it was trying to write 2GB every frame to main memory and read back 500MB for a test scene on Wind Waker's Outset island. So we rewrite everything to use switch statements so it compiles to uniform control flow selecting registers instead of indexed reads/writes to main memory. The result is impressivly fast.
They aren't needed and hide errors from DirectX. Oh and mesa gets annoyed if you use the wrong type.
Well OK, I admit it was a pretty major fail. We weren't uploading ksel, so we were using the wrong konsts like all the time. Wind Waker now renders almost perfectly apparet from cell shading. I assume most other games will be pretty close to correct too.
Self-shadowing in Rogue Squadron 2 works.
Swizzling - Charater lighting now shows the correct color (not red) in wind waker. Konsts - Had an off by 2 error so were using the wrong Konsts. Fixes The black boarders around the end of the water in Wind Waker (Yes, that is the only game I'm currently testing.
Now the shader always outputs the second color, as dual source blending is turned on/off in ogl/dx state.
Should fix those single bit errors with pixel colors.
D wasn't getting scaled.
We were indexing the texture coordinates by sampler_num rather than tex_coord. This happened to line up for a lot of games. Fixes the red-tinted videos in Mario Sunshine (and other games) and Clouds in the distance for Wind Waker.
This should also give a nice speedup.
So Rogue Squadron works.
So we can test vertex changes without our incomplete pixel ubershader getting in the way for fifoci.
Currently only for DirectX. Only implements per-vertex colors.
I'm really supprised this works as well as it does. It assumes that each texcoordX is connected to texgenX and all texgens are in the simplest mode.
5d9549e
to
c9a9e00
Compare
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
Closed in favor of #5702 |
This is a second PR for tracking the Vertex Ubershaders, so we can get fifoci runs without pixel ubershaders enabled.
This PR is more or less the same as #3163 but with ShaderGen for pixel shaders.