Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fifo analyzer improvements, part 3 #9718

Merged
merged 23 commits into from Dec 20, 2021

Conversation

Pokechu22
Copy link
Contributor

Features:

  • GP opcodes are decoded in a single place, OpcodeDecoding.cpp, instead of also being decoded for the FIFO player and recorder and in the FIFO analyzer.
  • The new decoder uses a callback which has different methods for each type of decoded command. The callback also has a function that gets a reference to the current CP state (and updates that for CP commands), so that it can properly decode vertices. It also can handle display lists decently well (by re-using the same callback).
  • Commands after the final object (the copy to the XFB) is now shown. Other copies also create new objects. Unfortunately, this affects both the object range and the analyzer view; since the objects for copies don't have any vertices, excluding or including them in the range has no effect. But it makes analyzing a bit easier.
  • Object info for each frame is now stored in a vector of structs containing the start position, offset to primitive data, size, and CP memory state. This, combined with the extra final object, simplifies the playback code a fair bit.

This is currently a draft to make sure this builds on all platforms. I still need to do a lot of cleanup, and may use EnumMap in a few more places and/or convert some other things (e.g. the BP and XF enums) to enum classes. Feedback is still appreciated (in particular, I have mixed feelings about what I needed to do to get nested EnumMaps in the various VertexLoaders, and opinions on the whole OpcodeDecoding callback structure would also be useful).

Copy link
Member

@lioncash lioncash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very preliminary review. Feel free to disregard any that would otherwise be addressed during cleanup

Source/Core/Common/EnumMap.h Outdated Show resolved Hide resolved
Source/Core/Core/FifoPlayer/FifoPlayer.cpp Outdated Show resolved Hide resolved
Source/Core/Core/FifoPlayer/FifoRecorder.h Outdated Show resolved Hide resolved
Source/Core/Core/FifoPlayer/FifoRecorder.h Outdated Show resolved Hide resolved
Source/Core/Core/FifoPlayer/FifoRecorder.h Outdated Show resolved Hide resolved
Source/Core/VideoCommon/CPMemory.h Show resolved Hide resolved
Source/Core/VideoCommon/VertexLoaderManager.h Outdated Show resolved Hide resolved
Source/Core/VideoCommon/VertexLoaderManager.cpp Outdated Show resolved Hide resolved
Source/Core/VideoCommon/VertexLoaderManager.cpp Outdated Show resolved Hide resolved
Source/Core/Core/FifoPlayer/FifoRecorder.h Outdated Show resolved Hide resolved
@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch 4 times, most recently from fbd7f3c to 9786b35 Compare May 17, 2021 00:08
@Pokechu22
Copy link
Contributor Author

One other thing I did was switch from

template <auto last_member, typename T = decltype(last_member),
          size_t size = static_cast<size_t>(last_member) + 1,
          std::enable_if_t<std::is_enum_v<T>, bool> = true>
class EnumFormatter {}

to

template <auto last_member>
class EnumFormatter
{
  using T = decltype(last_member);
  static_assert(std::is_enum_v<T>);
}

but it turns out that that causes issues with GCC prior to version 8; it treats different enums with the same numeric value as the same and then can't find the right things in the template (minimal case). I've fixed this by changing it to template <auto last_member, typename = decltype(last_member)> (godbolt), but this is a bit of an awkward fix. The problem is solved on GCC 8 (though I can't find a specific bug report for it; updating to GCC 8 would also remove all of the misleading "'vertex_shader_uid_data::::texgentype' is too small to hold all values of 'enum class TexGenType'" warnings (bug 61414).

Source/Core/Core/FifoPlayer/FifoRecorder.h Outdated Show resolved Hide resolved
Source/Core/DolphinQt/FIFO/FIFOAnalyzer.cpp Outdated Show resolved Hide resolved
Source/Core/DolphinQt/FIFO/FIFOAnalyzer.cpp Outdated Show resolved Hide resolved
// TODO: Is this really needed? Couldn't we just copy all of the frame memory updates? We're
// not associating memory updates with commands...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially not.

Though in my PR, I modify this loop to pull-out and apply display-list memory updates to a temporary memory mirror so I can analyze the display lists too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, this TODO: should be changed to describe the current issue with memory update timings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test to see what happens if this is changed to just copy all memory updates at the same time, and fifoci shows no differences. Note that this is used during analysis (when converting FifoFrameInfo into AnalyzedFrameInfo), not playback. I don't think the memory update timings issue comes from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on further inspection, I've realized that there's no reason for AnalyzedFrameInfo to have its own copy of memoryUpdates, as the frame's memory updates are identical. I've removed it.


offset += cmd_size;

if (analyzer.efb_copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to handle other BP triggers like tmem DMAs and cache flushes the same way.

Source/Core/Core/FifoPlayer/FifoPlayer.cpp Show resolved Hide resolved
Source/Core/Core/FifoPlayer/FifoPlayer.cpp Outdated Show resolved Hide resolved
const u32 array_start = m_cpmem.array_bases[array_index];
const u32 array_size = m_cpmem.array_strides[array_index] * (max_index + 1);

FifoRecorder::GetInstance().UseMemory(array_start, array_size, MemoryUpdate::VERTEX_STREAM);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out aloud here.

This is going to result in creating individual memory update for every single triangle strip or other primitive group out of the same vertex buffer.

It might be smarter to simply cache the min_index and max_index from function and make the memory as used the next time CPmem is updated to point at new vertex buffers.

@phire
Copy link
Member

phire commented May 17, 2021

One thing I'm slightly worried about is a performance impact from rewriting OpcodeDecoder to be more generic, especially in single-threaded mode. It's a quite hot bit of code.

@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch from e0e8660 to 19f09a6 Compare May 18, 2021 06:16
@Pokechu22
Copy link
Contributor Author

Pokechu22 commented May 18, 2021

I've done a bit of inline hackery to tell the compiler to inline everything in the main case for OpcodeDecoder (it generates OpcodeDecoder::RunFifo<0> for the main fifo and then OpcodeDecoder::RunCallback<0>::OnDisplayList for the display list call, checked using Visual Studio's disassembler (alt+8 while debugging)). The new uses in the fifo player/recorder/analyzer don't have the same aggressive inlining, but I don't think they need to be as optimized. I haven't done any actual performance testing (as my machine isn't really well suited to that), though.

Oddly it doesn't seem to be linking on the windows buildbot, though it does work on my machine. I'm not sure why.

@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch 7 times, most recently from 5e54994 to 88a3588 Compare May 22, 2021 22:26
void FifoRecordAnalyzer::Initialize(const u32* cpMem)
{
s_DrawingObject = false;

FifoAnalyzer::LoadCPReg(VCD_LO, cpMem[VCD_LO], s_CpMem);
FifoAnalyzer::LoadCPReg(VCD_HI, cpMem[VCD_HI], s_CpMem);
for (u32 i = 0; i < CP_NUM_VAT_REG; ++i)
FifoAnalyzer::LoadCPReg(CP_VAT_REG_A + i, cpMem[CP_VAT_REG_A + i], s_CpMem);

const u32* const bases_start = cpMem + ARRAY_BASE;
const u32* const bases_end = bases_start + s_CpMem.arrayBases.size();
std::copy(bases_start, bases_end, s_CpMem.arrayBases.begin());

const u32* const strides_start = cpMem + ARRAY_STRIDE;
const u32* const strides_end = strides_start + s_CpMem.arrayStrides.size();
std::copy(strides_start, strides_end, s_CpMem.arrayStrides.begin());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this code failed to load CP_VAT_REG_B and CP_VAT_REG_C, which caused errors when recording for some games where the initial CP state matters (for instance, Need for Speed: Most Wanted was affected). This has been fixed with the new code, which also made this kind of loading happen in only one place.

@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch 2 times, most recently from 0961002 to d3d9e6d Compare May 30, 2021 22:29
@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch 4 times, most recently from 0086cf8 to 36f1d22 Compare June 13, 2021 21:29
@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch from 36f1d22 to 47743ac Compare July 11, 2021 22:28
@Pokechu22 Pokechu22 force-pushed the better-fifo-analyzer-part-3 branch 2 times, most recently from 4d31b39 to f28d724 Compare July 21, 2021 22:11
This also adds the commands after the last primitive data but before the next frame as a unique object; this is mainly just the XFB copy.  It's nice to have these visible, though disabling the object does nothing since only primitive data is disabled and there is no primitive data in this case.
Previously, EFB copies would be in the middle of other objects, as objects were only split on primitive data.  A distinct object for each EFB copy makes them easier to spot, but does also mean there are more objects that do nothing when disabled (as disabling an object only skips primitive data, and there is no primitive data for EFB copies).
Videocommon also depends on core, which resulted in linking errors (though I'm not sure why).  Ideally, dolphintool woudln't depend on videocommon... but some stuff in core does.
@dolphin-emu-bot
Copy link
Contributor

FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:

  • DKCR-Char on ogl-lin-mesa: diff
  • inverted-depth-range on ogl-lin-mesa: diff
  • mp3-bloom on ogl-lin-mesa: diff
  • nfsu-reflections on ogl-lin-mesa: diff
  • DKCR-Char on sw-lin-mesa: diff
  • inverted-depth-range on sw-lin-mesa: diff
  • mp3-bloom on sw-lin-mesa: diff
  • nfsu-reflections on sw-lin-mesa: diff
  • soa-black on sw-lin-mesa: diff
  • DKCR-Char on ogl-lin-radeon: diff
  • inverted-depth-range on ogl-lin-radeon: diff
  • mp3-bloom on ogl-lin-radeon: diff
  • nfsu-reflections on ogl-lin-radeon: diff
  • DKCR-Char on uberogl-lin-radeon: diff
  • inverted-depth-range on uberogl-lin-radeon: diff
  • mp3-bloom on uberogl-lin-radeon: diff
  • nfsu-reflections on uberogl-lin-radeon: diff

automated-fifoci-reporter

@JMC47
Copy link
Contributor

JMC47 commented Dec 20, 2021

This has been tested locally and has seen a lot of reviews from trusted developers.

@JMC47 JMC47 merged commit 32fed91 into dolphin-emu:master Dec 20, 2021
9 of 10 checks passed
@shehzman
Copy link

shehzman commented Jan 13, 2022

Ever since this branch, Mario Galaxy water shaders don't seem to load properly in Sea Slide Galaxy and Loopdeswoop/Loopdeloop Galaxy. Attached is the dump file and a screenshot of what the issue looks like.
Screenshot (5)
bad_ps_5_0_D3D_0.txt

@Pokechu22
Copy link
Contributor Author

Thanks; I've created #10366 which should fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants