New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor various bits of graphics code for readability #10549
Conversation
9c46d47
to
2638d15
Compare
|
The diff for |
|
@Pokechu22 - are you planning on investigating / resolving the fifo change or do you still think this is worthwhile to merge as is? |
|
The FIFO change is caused by "SW/Rasterizer: Use RAS1_IREF::getTexCoord and getTexMap". I've attempted to investigate it further and reproduce the issue on hardware backends, but I haven't been able to. As far as I can tell, the difference is caused by a bug where the software renderer would use the LOD and linear filtering configuration of textures 0-3 when an indirect stage used texture 4-7 (but it would still read texture data from textures 4-7 properly). TLA's menu uses different textures from EFB copies for all 4 quadrants of the screen, and those textures seem to have the texture number shuffled between quadrants and frames. But without porting PR 10204 to the software renderer, it's hard to be 100% sure what impact this had. I think it's fine to merge even with that difference. It would be something to double-check when the software renderer has the copy filter implemented correctly, though. Attempt at reproducing on hardware backends (for my reference)diff --git a/Source/Core/VideoCommon/BPStructs.cpp b/Source/Core/VideoCommon/BPStructs.cpp
index d31bf59f13..a5b03d5a64 100644
--- a/Source/Core/VideoCommon/BPStructs.cpp
+++ b/Source/Core/VideoCommon/BPStructs.cpp
@@ -280,7 +280,7 @@ static void BPWritten(const BPCmd& bp, int cycles_into_future)
destAddr, PE_copy.tp_realFormat(), copy_width, copy_height, destStride, is_depth_copy,
srcRect, PE_copy.intensity_fmt, PE_copy.half_scale, 1.0f, 1.0f,
bpmem.triggerEFBCopy.clamp_top, bpmem.triggerEFBCopy.clamp_bottom,
- bpmem.copyfilter.GetCoefficients());
+ {0, 0, 21, 22, 21, 0, 0});
}
else
{
diff --git a/Source/Core/VideoCommon/PixelShaderGen.cpp b/Source/Core/VideoCommon/PixelShaderGen.cpp
index 114bd2a7e5..d53aa6c7df 100644
--- a/Source/Core/VideoCommon/PixelShaderGen.cpp
+++ b/Source/Core/VideoCommon/PixelShaderGen.cpp
@@ -636,7 +636,7 @@ uint WrapCoord(int coord, uint wrap, int size) {{
if (api_type == APIType::OpenGL || api_type == APIType::Vulkan)
{
- out.Write("\nint4 sampleTexture(uint texmap, in sampler2DArray tex, int2 uv, int layer) {{\n");
+ out.Write("\nint4 sampleTexture(uint texmap, in sampler2DArray tex, uint texmap2, int2 uv, int2 uv2, int layer) {{\n");
}
else if (api_type == APIType::D3D)
{
@@ -674,7 +674,8 @@ uint WrapCoord(int coord, uint wrap, int size) {{
{
out.Write(R"(
uint texmode0 = samp_texmode0(texmap);
- uint texmode1 = samp_texmode1(texmap);
+ uint texmode1 = samp_texmode1(texmap2);
+ uint texmode02 = samp_texmode0(texmap2);
uint wrap_s = {};
uint wrap_t = {};
@@ -690,13 +691,13 @@ uint WrapCoord(int coord, uint wrap, int size) {{
)",
BitfieldExtract<&SamplerState::TM0::wrap_u>("texmode0"),
BitfieldExtract<&SamplerState::TM0::wrap_v>("texmode0"),
- BitfieldExtract<&SamplerState::TM0::mag_filter>("texmode0"),
- BitfieldExtract<&SamplerState::TM0::mipmap_filter>("texmode0"),
- BitfieldExtract<&SamplerState::TM0::min_filter>("texmode0"),
- BitfieldExtract<&SamplerState::TM0::diag_lod>("texmode0"),
- BitfieldExtract<&SamplerState::TM0::lod_bias>("texmode0"),
- // BitfieldExtract<&SamplerState::TM0::max_aniso>("texmode0"),
- BitfieldExtract<&SamplerState::TM0::lod_clamp>("texmode0"),
+ BitfieldExtract<&SamplerState::TM0::mag_filter>("texmode02"),
+ BitfieldExtract<&SamplerState::TM0::mipmap_filter>("texmode02"),
+ BitfieldExtract<&SamplerState::TM0::min_filter>("texmode02"),
+ BitfieldExtract<&SamplerState::TM0::diag_lod>("texmode02"),
+ BitfieldExtract<&SamplerState::TM0::lod_bias>("texmode02"),
+ // BitfieldExtract<&SamplerState::TM0::max_aniso>("texmode02"),
+ BitfieldExtract<&SamplerState::TM0::lod_clamp>("texmode02"),
BitfieldExtract<&SamplerState::TM1::min_lod>("texmode1"),
BitfieldExtract<&SamplerState::TM1::max_lod>("texmode1"));
@@ -760,15 +761,15 @@ uint WrapCoord(int coord, uint wrap, int size) {{
// exist. The GPU may still implement dFdx using coarse derivatives; we just don't have the
// ability to specifically require it.
out.Write(R"(
- float2 uv_delta_x = abs(dFdxCoarse(float2(uv)));
- float2 uv_delta_y = abs(dFdyCoarse(float2(uv)));
+ float2 uv_delta_x = abs(dFdxCoarse(float2(uv2)));
+ float2 uv_delta_y = abs(dFdyCoarse(float2(uv2)));
)");
}
else
{
out.Write(R"(
- float2 uv_delta_x = abs(dFdx(float2(uv)));
- float2 uv_delta_y = abs(dFdy(float2(uv)));
+ float2 uv_delta_x = abs(dFdx(float2(uv2)));
+ float2 uv_delta_y = abs(dFdy(float2(uv2)));
)");
}
}
@@ -884,8 +885,8 @@ ShaderCode GeneratePixelShaderCode(APIType api_type, const ShaderHostConfig& hos
if (api_type == APIType::OpenGL || api_type == APIType::Vulkan)
{
- out.Write("\n#define sampleTextureWrapper(texmap, uv, layer) "
- "sampleTexture(texmap, samp[texmap], uv, layer)\n");
+ out.Write("\n#define sampleTextureWrapper(texmap, texmap2, uv, uv2, layer) "
+ "sampleTexture(texmap, samp[texmap], texmap2, uv, uv2, layer)\n");
}
else if (api_type == APIType::D3D)
{
@@ -1125,7 +1126,7 @@ ShaderCode GeneratePixelShaderCode(APIType api_type, const ShaderHostConfig& hos
"\tint3 comp16 = int3(1, 256, 0), comp24 = int3(1, 256, 256*256);\n"
"\tint alphabump=0;\n"
"\tint3 tevcoord=int3(0, 0, 0);\n"
- "\tint2 wrappedcoord=int2(0,0), tempcoord=int2(0,0);\n"
+ "\tint2 wrappedcoord=int2(0,0), tempcoord=int2(0,0), tempcoord2 = int2(0, 0);\n"
"\tint4 "
"tevin_a=int4(0,0,0,0),tevin_b=int4(0,0,0,0),tevin_c=int4(0,0,0,0),tevin_d=int4(0,0,0,"
"0);\n\n"); // tev combiner inputs
@@ -1196,9 +1197,11 @@ ShaderCode GeneratePixelShaderCode(APIType api_type, const ShaderHostConfig& hos
out.SetConstantsUsed(C_INDTEXSCALE + i / 2, C_INDTEXSCALE + i / 2);
out.Write("\ttempcoord = fixpoint_uv{} >> " I_INDTEXSCALE "[{}].{};\n", texcoord, i / 2,
(i & 1) ? "zw" : "xy");
+ out.Write("\ttempcoord2 = fixpoint_uv{} >> " I_INDTEXSCALE "[{}].{};\n", texcoord & 3, i / 2,
+ (i & 1) ? "zw" : "xy");
- out.Write("\tint3 iindtex{0} = sampleTextureWrapper({1}u, tempcoord, layer).abg;\n", i,
- texmap);
+ out.Write("\tint3 iindtex{0} = sampleTextureWrapper({1}u, {2}u, tempcoord, tempcoord2, layer).abg;\n", i,
+ texmap, texmap & 3);
}
}
@@ -1605,8 +1608,8 @@ static void WriteStage(ShaderCode& out, const pixel_shader_uid_data* uid_data, i
if (stage.tevorders_enable && uid_data->genMode_numtexgens > 0)
{
// Generate swizzle string to represent the texture color channel swapping
- out.Write("\ttextemp = sampleTextureWrapper({}u, tevcoord.xy, layer).{}{}{}{};\n",
- stage.tevorders_texmap, rgba_swizzle[stage.tex_swap_r],
+ out.Write("\ttextemp = sampleTextureWrapper({}u, {}u, tevcoord.xy, tevcoord.xy, layer).{}{}{}{};\n",
+ stage.tevorders_texmap, stage.tevorders_texmap, rgba_swizzle[stage.tex_swap_r],
rgba_swizzle[stage.tex_swap_g], rgba_swizzle[stage.tex_swap_b],
rgba_swizzle[stage.tex_swap_a]);
} |
|
I did a quick and hacky implementation of the copy filter for Hacky implementationdiff --git a/Source/Core/VideoBackends/Software/TextureEncoder.cpp b/Source/Core/VideoBackends/Software/TextureEncoder.cpp
index 2624bf0e04..bb9c4f4afa 100644
--- a/Source/Core/VideoBackends/Software/TextureEncoder.cpp
+++ b/Source/Core/VideoBackends/Software/TextureEncoder.cpp
@@ -273,6 +273,10 @@ static void EncodeRGBA6(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
u32 readStride = 3;
u8* dstBlockStart = dst;
+ const u32 copy_filter_sum = bpmem.copyfilter.w0 + bpmem.copyfilter.w1 + bpmem.copyfilter.w2 +
+ bpmem.copyfilter.w3 + bpmem.copyfilter.w4 + bpmem.copyfilter.w5 +
+ bpmem.copyfilter.w6;
+
switch (format)
{
case EFBCopyFormat::R4:
@@ -470,7 +474,7 @@ static void EncodeRGBA6(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
ENCODE_LOOP_BLOCKS
{
u32 srcColor = *(u32*)src;
- *dst++ = Convert6To8((srcColor >> 6) & 0x3f);
+ *dst++ = (Convert6To8((srcColor >> 6) & 0x3f) * copy_filter_sum) / 64;
src += readStride;
}
ENCODE_LOOP_SPANS
@@ -753,6 +757,9 @@ static void EncodeRGB8(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
s32 tSpan, sBlkSpan, tBlkSpan, writeStride;
u32 readStride = 3;
u8* dstBlockStart = dst;
+ const u32 copy_filter_sum = bpmem.copyfilter.w0 + bpmem.copyfilter.w1 + bpmem.copyfilter.w2 +
+ bpmem.copyfilter.w3 + bpmem.copyfilter.w4 + bpmem.copyfilter.w5 +
+ bpmem.copyfilter.w6;
switch (format)
{
@@ -927,7 +934,7 @@ static void EncodeRGB8(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
SetSpans(sBlkSize, tBlkSize, &tSpan, &sBlkSpan, &tBlkSpan, &writeStride);
ENCODE_LOOP_BLOCKS
{
- *dst++ = src[0];
+ *dst++ = (src[0] * copy_filter_sum) / 64;
src += readStride;
}
ENCODE_LOOP_SPANSNotice that the bottom-left quadrant was brighter on the first frame than the others (most obvious in the animation, where it flickers). That's no longer happening. (These images are with the object range 0-348, hiding the menu text.) |
|
|
||
| constexpr static TevColor All(s16 value) { return TevColor(value, value, value, value); } | ||
|
|
||
| constexpr s16& operator[](int index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr s16& operator[](int index) const ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would need to be constexpr const s16& operator[](int index) const, but it's also assigned to in some cases so that won't work. Theoretically both could be added, but there doesn't seem to be a case where it's needed.
170785e
to
c8f3302
Compare
|
I added one more change to the last commit fixing various XF mistakes: |
46dfeba
to
9bebd0c
Compare
|
The fifoci difference is explained above. NVM. Is this ready for merge now? |
|
I think it's best to wait for an approving review on this one. |
e103b1d
to
675308e
Compare
675308e
to
9a7eff7
Compare
f9d051e
to
5ba6e02
Compare
03a9e6f
to
f91a631
Compare
e2f237e
to
9dcd16e
Compare
9dcd16e
to
1244618
Compare
1244618
to
3eb5060
Compare
3eb5060
to
3359f8a
Compare
3359f8a
to
2eef14a
Compare
2eef14a
to
a4a4267
Compare
a4a4267
to
e983c18
Compare
I search for these somewhat often with control+f, but tend to default to adding a 0x prefix, which means I don't find them until I remove it.
It stores both the konst selection value for alpha and color channels (for two tev stages per ksel), and half of a swap table row (there are 4 total swap tables, which can be used for swizzling the rasterized color and the texture color, and indices selecting which tables to use are stored per tev stage in the alpha combiner). Since these are indexed very differently, the old code was hard to follow.
Mainly, the improvements are passing in the current command ID so that e.g. stage numbers can be directly included, instead of saying 0/1 or even/odd.
* 'hangle' was a typo * Light colors include an alpha value, so they should be 8 characters, not 6 * The XF command format adds 1 to the count internally (so 0 is one word), but we need to subtract that back to produce a valid command * XFMEM_POSTMATRICES was calculating the row by subtracting XFMEM_POSMATRICES (POS vs POST), resulting in incorrect row numbering
e983c18
to
698def6
Compare








This is mainly focused on software renderer changes, combined with some various fifo analyzer changes and typo fixes.