Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor various bits of graphics code for readability #10549

Merged
merged 14 commits into from Sep 8, 2022

Conversation

Pokechu22
Copy link
Contributor

This is mainly focused on software renderer changes, combined with some various fifo analyzer changes and typo fixes.

@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 2 times, most recently from 9c46d47 to 2638d15 Compare April 2, 2022 02:13
@Pokechu22
Copy link
Contributor Author

The diff for tla-menu comes from "SW/Rasterizer: Use RAS1_IREF::getTexCoord and getTexMap" - I'm not 100% sure what's causing it (other than that the image is blue because the software renderer doesn't have #10204), but the game does use textures 4-7 for some parts of the effect.

@iwubcode
Copy link
Contributor

iwubcode commented Apr 2, 2022

@Pokechu22 - are you planning on investigating / resolving the fifo change or do you still think this is worthwhile to merge as is?

@Pokechu22
Copy link
Contributor Author

Pokechu22 commented Apr 3, 2022

The FIFO change is caused by "SW/Rasterizer: Use RAS1_IREF::getTexCoord and getTexMap". I've attempted to investigate it further and reproduce the issue on hardware backends, but I haven't been able to. As far as I can tell, the difference is caused by a bug where the software renderer would use the LOD and linear filtering configuration of textures 0-3 when an indirect stage used texture 4-7 (but it would still read texture data from textures 4-7 properly). TLA's menu uses different textures from EFB copies for all 4 quadrants of the screen, and those textures seem to have the texture number shuffled between quadrants and frames. But without porting PR 10204 to the software renderer, it's hard to be 100% sure what impact this had.

I think it's fine to merge even with that difference. It would be something to double-check when the software renderer has the copy filter implemented correctly, though.

Attempt at reproducing on hardware backends (for my reference)
diff --git a/Source/Core/VideoCommon/BPStructs.cpp b/Source/Core/VideoCommon/BPStructs.cpp
index d31bf59f13..a5b03d5a64 100644
--- a/Source/Core/VideoCommon/BPStructs.cpp
+++ b/Source/Core/VideoCommon/BPStructs.cpp
@@ -280,7 +280,7 @@ static void BPWritten(const BPCmd& bp, int cycles_into_future)
           destAddr, PE_copy.tp_realFormat(), copy_width, copy_height, destStride, is_depth_copy,
           srcRect, PE_copy.intensity_fmt, PE_copy.half_scale, 1.0f, 1.0f,
           bpmem.triggerEFBCopy.clamp_top, bpmem.triggerEFBCopy.clamp_bottom,
-          bpmem.copyfilter.GetCoefficients());
+          {0, 0, 21, 22, 21, 0, 0});
     }
     else
     {
diff --git a/Source/Core/VideoCommon/PixelShaderGen.cpp b/Source/Core/VideoCommon/PixelShaderGen.cpp
index 114bd2a7e5..d53aa6c7df 100644
--- a/Source/Core/VideoCommon/PixelShaderGen.cpp
+++ b/Source/Core/VideoCommon/PixelShaderGen.cpp
@@ -636,7 +636,7 @@ uint WrapCoord(int coord, uint wrap, int size) {{
 
   if (api_type == APIType::OpenGL || api_type == APIType::Vulkan)
   {
-    out.Write("\nint4 sampleTexture(uint texmap, in sampler2DArray tex, int2 uv, int layer) {{\n");
+    out.Write("\nint4 sampleTexture(uint texmap, in sampler2DArray tex, uint texmap2, int2 uv, int2 uv2, int layer) {{\n");
   }
   else if (api_type == APIType::D3D)
   {
@@ -674,7 +674,8 @@ uint WrapCoord(int coord, uint wrap, int size) {{
   {
     out.Write(R"(
   uint texmode0 = samp_texmode0(texmap);
-  uint texmode1 = samp_texmode1(texmap);
+  uint texmode1 = samp_texmode1(texmap2);
+  uint texmode02 = samp_texmode0(texmap2);
 
   uint wrap_s = {};
   uint wrap_t = {};
@@ -690,13 +691,13 @@ uint WrapCoord(int coord, uint wrap, int size) {{
 )",
               BitfieldExtract<&SamplerState::TM0::wrap_u>("texmode0"),
               BitfieldExtract<&SamplerState::TM0::wrap_v>("texmode0"),
-              BitfieldExtract<&SamplerState::TM0::mag_filter>("texmode0"),
-              BitfieldExtract<&SamplerState::TM0::mipmap_filter>("texmode0"),
-              BitfieldExtract<&SamplerState::TM0::min_filter>("texmode0"),
-              BitfieldExtract<&SamplerState::TM0::diag_lod>("texmode0"),
-              BitfieldExtract<&SamplerState::TM0::lod_bias>("texmode0"),
-              // BitfieldExtract<&SamplerState::TM0::max_aniso>("texmode0"),
-              BitfieldExtract<&SamplerState::TM0::lod_clamp>("texmode0"),
+              BitfieldExtract<&SamplerState::TM0::mag_filter>("texmode02"),
+              BitfieldExtract<&SamplerState::TM0::mipmap_filter>("texmode02"),
+              BitfieldExtract<&SamplerState::TM0::min_filter>("texmode02"),
+              BitfieldExtract<&SamplerState::TM0::diag_lod>("texmode02"),
+              BitfieldExtract<&SamplerState::TM0::lod_bias>("texmode02"),
+              // BitfieldExtract<&SamplerState::TM0::max_aniso>("texmode02"),
+              BitfieldExtract<&SamplerState::TM0::lod_clamp>("texmode02"),
               BitfieldExtract<&SamplerState::TM1::min_lod>("texmode1"),
               BitfieldExtract<&SamplerState::TM1::max_lod>("texmode1"));
 
@@ -760,15 +761,15 @@ uint WrapCoord(int coord, uint wrap, int size) {{
         // exist.  The GPU may still implement dFdx using coarse derivatives; we just don't have the
         // ability to specifically require it.
         out.Write(R"(
-  float2 uv_delta_x = abs(dFdxCoarse(float2(uv)));
-  float2 uv_delta_y = abs(dFdyCoarse(float2(uv)));
+  float2 uv_delta_x = abs(dFdxCoarse(float2(uv2)));
+  float2 uv_delta_y = abs(dFdyCoarse(float2(uv2)));
 )");
       }
       else
       {
         out.Write(R"(
-  float2 uv_delta_x = abs(dFdx(float2(uv)));
-  float2 uv_delta_y = abs(dFdy(float2(uv)));
+  float2 uv_delta_x = abs(dFdx(float2(uv2)));
+  float2 uv_delta_y = abs(dFdy(float2(uv2)));
 )");
       }
     }
@@ -884,8 +885,8 @@ ShaderCode GeneratePixelShaderCode(APIType api_type, const ShaderHostConfig& hos
 
   if (api_type == APIType::OpenGL || api_type == APIType::Vulkan)
   {
-    out.Write("\n#define sampleTextureWrapper(texmap, uv, layer) "
-              "sampleTexture(texmap, samp[texmap], uv, layer)\n");
+    out.Write("\n#define sampleTextureWrapper(texmap, texmap2, uv, uv2, layer) "
+              "sampleTexture(texmap, samp[texmap], texmap2, uv, uv2, layer)\n");
   }
   else if (api_type == APIType::D3D)
   {
@@ -1125,7 +1126,7 @@ ShaderCode GeneratePixelShaderCode(APIType api_type, const ShaderHostConfig& hos
             "\tint3 comp16 = int3(1, 256, 0), comp24 = int3(1, 256, 256*256);\n"
             "\tint alphabump=0;\n"
             "\tint3 tevcoord=int3(0, 0, 0);\n"
-            "\tint2 wrappedcoord=int2(0,0), tempcoord=int2(0,0);\n"
+            "\tint2 wrappedcoord=int2(0,0), tempcoord=int2(0,0), tempcoord2 = int2(0, 0);\n"
             "\tint4 "
             "tevin_a=int4(0,0,0,0),tevin_b=int4(0,0,0,0),tevin_c=int4(0,0,0,0),tevin_d=int4(0,0,0,"
             "0);\n\n");  // tev combiner inputs
@@ -1196,9 +1197,11 @@ ShaderCode GeneratePixelShaderCode(APIType api_type, const ShaderHostConfig& hos
       out.SetConstantsUsed(C_INDTEXSCALE + i / 2, C_INDTEXSCALE + i / 2);
       out.Write("\ttempcoord = fixpoint_uv{} >> " I_INDTEXSCALE "[{}].{};\n", texcoord, i / 2,
                 (i & 1) ? "zw" : "xy");
+      out.Write("\ttempcoord2 = fixpoint_uv{} >> " I_INDTEXSCALE "[{}].{};\n", texcoord & 3, i / 2,
+                (i & 1) ? "zw" : "xy");
 
-      out.Write("\tint3 iindtex{0} = sampleTextureWrapper({1}u, tempcoord, layer).abg;\n", i,
-                texmap);
+      out.Write("\tint3 iindtex{0} = sampleTextureWrapper({1}u, {2}u, tempcoord, tempcoord2, layer).abg;\n", i,
+                texmap, texmap & 3);
     }
   }
 
@@ -1605,8 +1608,8 @@ static void WriteStage(ShaderCode& out, const pixel_shader_uid_data* uid_data, i
   if (stage.tevorders_enable && uid_data->genMode_numtexgens > 0)
   {
     // Generate swizzle string to represent the texture color channel swapping
-    out.Write("\ttextemp = sampleTextureWrapper({}u, tevcoord.xy, layer).{}{}{}{};\n",
-              stage.tevorders_texmap, rgba_swizzle[stage.tex_swap_r],
+    out.Write("\ttextemp = sampleTextureWrapper({}u, {}u, tevcoord.xy, tevcoord.xy, layer).{}{}{}{};\n",
+              stage.tevorders_texmap, stage.tevorders_texmap, rgba_swizzle[stage.tex_swap_r],
               rgba_swizzle[stage.tex_swap_g], rgba_swizzle[stage.tex_swap_b],
               rgba_swizzle[stage.tex_swap_a]);
   }

@Pokechu22
Copy link
Contributor Author

I did a quick and hacky implementation of the copy filter for EFBCopyFormat::B8, which allows seeing what is actually changing with this.

Hacky implementation
diff --git a/Source/Core/VideoBackends/Software/TextureEncoder.cpp b/Source/Core/VideoBackends/Software/TextureEncoder.cpp
index 2624bf0e04..bb9c4f4afa 100644
--- a/Source/Core/VideoBackends/Software/TextureEncoder.cpp
+++ b/Source/Core/VideoBackends/Software/TextureEncoder.cpp
@@ -273,6 +273,10 @@ static void EncodeRGBA6(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
   u32 readStride = 3;
   u8* dstBlockStart = dst;
 
+  const u32 copy_filter_sum = bpmem.copyfilter.w0 + bpmem.copyfilter.w1 + bpmem.copyfilter.w2 +
+                              bpmem.copyfilter.w3 + bpmem.copyfilter.w4 + bpmem.copyfilter.w5 +
+                              bpmem.copyfilter.w6;
+
   switch (format)
   {
   case EFBCopyFormat::R4:
@@ -470,7 +474,7 @@ static void EncodeRGBA6(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
     ENCODE_LOOP_BLOCKS
     {
       u32 srcColor = *(u32*)src;
-      *dst++ = Convert6To8((srcColor >> 6) & 0x3f);
+      *dst++ = (Convert6To8((srcColor >> 6) & 0x3f) * copy_filter_sum) / 64;
       src += readStride;
     }
     ENCODE_LOOP_SPANS
@@ -753,6 +757,9 @@ static void EncodeRGB8(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
   s32 tSpan, sBlkSpan, tBlkSpan, writeStride;
   u32 readStride = 3;
   u8* dstBlockStart = dst;
+  const u32 copy_filter_sum = bpmem.copyfilter.w0 + bpmem.copyfilter.w1 + bpmem.copyfilter.w2 +
+                              bpmem.copyfilter.w3 + bpmem.copyfilter.w4 + bpmem.copyfilter.w5 +
+                              bpmem.copyfilter.w6;
 
   switch (format)
   {
@@ -927,7 +934,7 @@ static void EncodeRGB8(u8* dst, const u8* src, EFBCopyFormat format, bool yuv)
     SetSpans(sBlkSize, tBlkSize, &tSpan, &sBlkSpan, &tBlkSpan, &writeStride);
     ENCODE_LOOP_BLOCKS
     {
-      *dst++ = src[0];
+      *dst++ = (src[0] * copy_filter_sum) / 64;
       src += readStride;
     }
     ENCODE_LOOP_SPANS
Old images

00000000_2022-04-05_16-29-43
00000000_2022-04-05_16-29-47
00000000_2022-04-05_16-29-51
old

New images

00000000_2022-04-05_16-34-00
00000000_2022-04-05_16-34-05
00000000_2022-04-05_16-34-08
new

Notice that the bottom-left quadrant was brighter on the first frame than the others (most obvious in the animation, where it flickers). That's no longer happening. (These images are with the object range 0-348, hiding the menu text.)


constexpr static TevColor All(s16 value) { return TevColor(value, value, value, value); }

constexpr s16& operator[](int index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr s16& operator[](int index) const ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would need to be constexpr const s16& operator[](int index) const, but it's also assigned to in some cases so that won't work. Theoretically both could be added, but there doesn't seem to be a case where it's needed.

@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 3 times, most recently from 170785e to c8f3302 Compare April 11, 2022 21:50
@Pokechu22
Copy link
Contributor Author

I added one more change to the last commit fixing various XF mistakes: XFMEM_POSTMATRICES was calculating the row by subtracting XFMEM_POSMATRICES (POS vs POST), resulting in incorrect row numbering.

@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 2 times, most recently from 46dfeba to 9bebd0c Compare April 11, 2022 22:01
@JMC47
Copy link
Contributor

JMC47 commented Apr 14, 2022

The fifoci difference is explained above. NVM. Is this ready for merge now?

@Pokechu22
Copy link
Contributor Author

I think it's best to wait for an approving review on this one.

@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 2 times, most recently from e103b1d to 675308e Compare April 23, 2022 04:01
@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 2 times, most recently from f9d051e to 5ba6e02 Compare May 19, 2022 22:53
@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 4 times, most recently from 03a9e6f to f91a631 Compare June 3, 2022 02:17
@Pokechu22 Pokechu22 force-pushed the sw-tev-enum-map branch 2 times, most recently from e2f237e to 9dcd16e Compare June 25, 2022 02:49
I search for these somewhat often with control+f, but tend to default to adding a 0x prefix, which means I don't find them until I remove it.
It stores both the konst selection value for alpha and color channels (for two tev stages per ksel), and half of a swap table row (there are 4 total swap tables, which can be used for swizzling the rasterized color and the texture color, and indices selecting which tables to use are stored per tev stage in the alpha combiner).  Since these are indexed very differently, the old code was hard to follow.
Mainly, the improvements are passing in the current command ID so that e.g. stage numbers can be directly included, instead of saying 0/1 or even/odd.
* 'hangle' was a typo
* Light colors include an alpha value, so they should be 8 characters, not 6
* The XF command format adds 1 to the count internally (so 0 is one word), but we need to subtract that back to produce a valid command
* XFMEM_POSTMATRICES was calculating the row by subtracting XFMEM_POSMATRICES (POS vs POST), resulting in incorrect row numbering
@Pokechu22 Pokechu22 merged commit 2dfe913 into dolphin-emu:master Sep 8, 2022
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants