New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accurately handle the copy filter and gamma for EFB and XFB copies #10466
Conversation
ee3c209
to
b7bbce3
Compare
b7bbce3
to
be26203
Compare
23bf063
to
c65307c
Compare
c65307c
to
ca4caa1
Compare
ca4caa1
to
ebccd12
Compare
ebccd12
to
2b3ab71
Compare
09738c0
to
c4976e9
Compare
Is this ready to go at this point? |
It could probably use a review just for code quality, but it's passed all of my hardware tests. (The hardware tests themselves also need to be merged at some point, but they're finished apart from a review.) |
c4976e9
to
05b60bb
Compare
I've fixed a few issues and re-tested it with all backends.
|
05b60bb
to
41086af
Compare
a116055
to
8577841
Compare
a1c4007
to
a86499b
Compare
107bbf7
to
f1df630
Compare
I've merged the hardware tests (dolphin-emu/hwtests#44) and confirmed that they still pass. It'd be nice to get a review on this, since functionally everything seems good to go. (Of the fifoci results above, the important ones are |
@Pokechu22 - I've been planning on reviewing it, just was waiting for the HLSL thing to merge and then for this to be rebased. I'm a bit behind on my reviews, I plan on looking it over this week. Sorry for the delay! |
@@ -62,7 +62,7 @@ static void WriteHeader(ShaderCode& code, APIType api_type) | |||
" float y_scale;\n" | |||
" float gamma_rcp;\n" | |||
" float2 clamp_tb;\n" | |||
" float3 filter_coefficients;\n" | |||
" uint3 filter_coefficients;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the switch to integers give us some advantage? I only ask because I'd assume there'd be a (slight) performance hit by dividing by 64.0 in the shaders at each pixel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows using combined_rows.rgb >> 6
to divide by 64 and round down (which is the hardware's behavior), and it also makes writing texcol_raw &= 0x1ffu
easier.
I vaguely remember some annoyance about round
having implementation-defined behavior with .5, but looking at it further it's floor
that would be needed here instead and that doesn't have the same issue so I'm not sure if there's a reason why it couldn't be done with floats; it'd just be a bit messier. (Perhaps it was the part with YUV conversion, since that does have rounding?)
|
||
bool TextureCacheBase::CopyFilterCanOverflow(const std::array<u32, 3>& coefficients) | ||
{ | ||
// Normally, the copy filter coefficients will sum to at most 64. If the sum is higher than that, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I recall this overflow behavior but I forget the details I know games can go higher than 64 to induce brightening of the overall image. Can you give any more details - how does the clamping break?
And are there any known occurrences of a game using 128 or higher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the value for a color channel is 512 or higher, it ends up wrapping around to 0 again (so presumably when the clamping to [0, 255] is applied, they use a field that can store [0, 511]). I don't know of any situation where this behavior would be useful, but it's something I encountered in my hardware test, so I implemented it. (It's conditionally enabled so that there's no performance cost for situations where it's impossible for the value to reach 512.)
(I actually don't know of any games that go above 64 to produce brightening, only of ones that go below 64 to produce darkening - they probably exist, though.)
I tested a handful of games. Brawl / Arc Rise / other games that have a sharpening filter work fine. The thing that doesn't work (and it's not your PR, master is also broken) is the brightness settings in many games. I don't know why? I recall it working at one point but I could be misremembering. Maybe the coefficients need to apply to EFB copies as opposed to XFBs. I'm not sure. I will give the code one more pass tomorrow. |
Hmm. Can you give an example of a game where it's not working? I'd like to investigate that further.
That's something that was implemented in #10204 (and that this PR improves - my hardware test mostly uses EFB copies instead of XFB copies because it's a lot easier to work with texture data in EFB copies). |
Pandora's Tower and the Last Story. Possibly Xenoblade. Little King's Story / Monster Lab work but they work even with the copy filter disabled, so probably using a different approach. EDIT:
Completely forgot I reviewed that PR. But yeah, ignore that comment, I'm too sleepy. Time for bed! |
One other possibility is EDIT: If you could record a few fifologs of the same game with the brightness settings at different values, then I could check via the hardware fifoplayer whether it makes a difference. If the brightness doesn't change using the hardware fifoplayer, it's probably one of those registers since those aren't stored in fifologs. |
Thanks @Pokechu22 . I opened issue 12974, interestingly, this ticket did exist before ( 5379 ) but it was coupled with the copy filter issue. The first build of the copy filter didn't work either. Though I suppose it might be worthwhile for me to check on console to make sure what I'm seeing is unexpected. Should I finish my review on this one or do you want to wait until you've investigated the issue? |
The only remaining casts for these types that I know of are in TextureInfo (where format_name is set to the int version of the format, and since that affects filenames and probably would break resource packs, I'm not changing it) and in TextureDecoder_Common's TexDecoder_DrawOverlay, which will be handled separately.
This required adding parentheses to the font used by that.
It was named yuv in 522746b, but hardware testing indicates that that bit does nothing (the intensity format bit enables YUV conversion, instead).
auto_conv is normally always set for EFB copies, but hardware testing indicates that intensity_fmt does nothing if auto_conv is not set.
This also changes the behavior for the invalid gamma value, which was confirmed to behave the same as 2.2. Note that currently, the gamma value is only used for XFB copies, even though hardware testing indicates it also works for EFB copies. This will be changed in a later commit.
This struct is the only one in BPMemory that uses u64 as its base. These fields are to allow viewing it as two u32s instead. It's not used by Dolphin right now, but it is used in the copy of BPMemory.h used by hwtests.
I took a quick look at that issue and it doesn't seem to be related to the copy filter. I'll put more info in a comment on that ticket. |
@Pokechu22 - sounds good. I will finish my review of this then. Thank you for your work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM.
…Shader This will be used for later refactoring for increased accuracy.
f1df630
to
a6e06f3
Compare
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, you can do a / 256 with rounding in slightly less instructions with (texcol_raw.rgb + 128u) >> 8
" dot(u_const, float4(texcol_raw.rgb, 256)),\n" | ||
" dot(v_const, float4(texcol_raw.rgb, 256)));\n" | ||
" // Divide by 256 and round .5 and higher up\n" | ||
" texcol_raw.rgb = (texcol_raw.rgb >> 8) + ((texcol_raw.rgb >> 7) & 1);\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS OpenGL unhappy here, wants & 1u
" dot(u_const, float4(texcol_raw.rgb, 256)),\n" | ||
" dot(v_const, float4(texcol_raw.rgb, 256)));\n" | ||
" // Divide by 256 and round .5 and higher up\n" | ||
" texcol_raw.rgb = (texcol_raw.rgb >> 8) + ((texcol_raw.rgb >> 7) & 1);\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS OpenGL unhappy here, wants & 1u
Followup to dolphin-emu#10466. Resolves the following error: ERROR: 0:85: '&' does not operate on 'uvec3' and 'int'
Followup to dolphin-emu#10466. Resolves the following error: ERROR: 0:85: '&' does not operate on 'uvec3' and 'int'
This PR increases the accuracy of the way the copy filter works, rounding in the same way and emulating an unusual overflow behavior I found in testing. It also implements the gamma option for EFB copies (before, it was only handled with XFB copies). Everything matches the result of dolphin-emu/hwtests#44.
This PR is based on #10439, so fifoci will provide a diff compared to that currently. The actual implementation from that PR isn't directly relevant anymore, though.This PR replaces #10439; however, I do recommend looking at #10439 (comment) for a somewhat more visual representation of how rounding matters here.Note that apart from one intensity-format related change, the software renderer has not been updated; it didn't receive the earlier copy filter update either. This is simply because I don't want to deal with its texture encoding code currently; it will eventually be handled in a later pull request. However, that intensity format change does result in large numbers of differences because it also applies to XFB copies (which was not the case for other backends, as the other backends don't use the encoded XFB for rendering in most cases but instead use an RGB version of it).