New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fifo analyzer improvements, part 3 #9718
Fifo analyzer improvements, part 3 #9718
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a very preliminary review. Feel free to disregard any that would otherwise be addressed during cleanup
fbd7f3c
to
9786b35
Compare
|
One other thing I did was switch from template <auto last_member, typename T = decltype(last_member),
size_t size = static_cast<size_t>(last_member) + 1,
std::enable_if_t<std::is_enum_v<T>, bool> = true>
class EnumFormatter {}to template <auto last_member>
class EnumFormatter
{
using T = decltype(last_member);
static_assert(std::is_enum_v<T>);
}but it turns out that that causes issues with GCC prior to version 8; it treats different enums with the same numeric value as the same and then can't find the right things in the template (minimal case). I've fixed this by changing it to |
| // TODO: Is this really needed? Couldn't we just copy all of the frame memory updates? We're | ||
| // not associating memory updates with commands... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially not.
Though in my PR, I modify this loop to pull-out and apply display-list memory updates to a temporary memory mirror so I can analyze the display lists too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, this TODO: should be changed to describe the current issue with memory update timings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick test to see what happens if this is changed to just copy all memory updates at the same time, and fifoci shows no differences. Note that this is used during analysis (when converting FifoFrameInfo into AnalyzedFrameInfo), not playback. I don't think the memory update timings issue comes from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on further inspection, I've realized that there's no reason for AnalyzedFrameInfo to have its own copy of memoryUpdates, as the frame's memory updates are identical. I've removed it.
|
|
||
| offset += cmd_size; | ||
|
|
||
| if (analyzer.efb_copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to handle other BP triggers like tmem DMAs and cache flushes the same way.
| const u32 array_start = m_cpmem.array_bases[array_index]; | ||
| const u32 array_size = m_cpmem.array_strides[array_index] * (max_index + 1); | ||
|
|
||
| FifoRecorder::GetInstance().UseMemory(array_start, array_size, MemoryUpdate::VERTEX_STREAM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out aloud here.
This is going to result in creating individual memory update for every single triangle strip or other primitive group out of the same vertex buffer.
It might be smarter to simply cache the min_index and max_index from function and make the memory as used the next time CPmem is updated to point at new vertex buffers.
|
One thing I'm slightly worried about is a performance impact from rewriting OpcodeDecoder to be more generic, especially in single-threaded mode. It's a quite hot bit of code. |
e0e8660
to
19f09a6
Compare
|
I've done a bit of inline hackery to tell the compiler to inline everything in the main case for OpcodeDecoder (it generates Oddly it doesn't seem to be linking on the windows buildbot, though it does work on my machine. I'm not sure why. |
5e54994
to
88a3588
Compare
| void FifoRecordAnalyzer::Initialize(const u32* cpMem) | ||
| { | ||
| s_DrawingObject = false; | ||
|
|
||
| FifoAnalyzer::LoadCPReg(VCD_LO, cpMem[VCD_LO], s_CpMem); | ||
| FifoAnalyzer::LoadCPReg(VCD_HI, cpMem[VCD_HI], s_CpMem); | ||
| for (u32 i = 0; i < CP_NUM_VAT_REG; ++i) | ||
| FifoAnalyzer::LoadCPReg(CP_VAT_REG_A + i, cpMem[CP_VAT_REG_A + i], s_CpMem); | ||
|
|
||
| const u32* const bases_start = cpMem + ARRAY_BASE; | ||
| const u32* const bases_end = bases_start + s_CpMem.arrayBases.size(); | ||
| std::copy(bases_start, bases_end, s_CpMem.arrayBases.begin()); | ||
|
|
||
| const u32* const strides_start = cpMem + ARRAY_STRIDE; | ||
| const u32* const strides_end = strides_start + s_CpMem.arrayStrides.size(); | ||
| std::copy(strides_start, strides_end, s_CpMem.arrayStrides.begin()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this code failed to load CP_VAT_REG_B and CP_VAT_REG_C, which caused errors when recording for some games where the initial CP state matters (for instance, Need for Speed: Most Wanted was affected). This has been fixed with the new code, which also made this kind of loading happen in only one place.
0961002
to
d3d9e6d
Compare
0086cf8
to
36f1d22
Compare
36f1d22
to
47743ac
Compare
4d31b39
to
f28d724
Compare
This also adds the commands after the last primitive data but before the next frame as a unique object; this is mainly just the XFB copy. It's nice to have these visible, though disabling the object does nothing since only primitive data is disabled and there is no primitive data in this case.
3576347
to
d4a9c7a
Compare
d4a9c7a
to
72bbb27
Compare
Previously, EFB copies would be in the middle of other objects, as objects were only split on primitive data. A distinct object for each EFB copy makes them easier to spot, but does also mean there are more objects that do nothing when disabled (as disabling an object only skips primitive data, and there is no primitive data for EFB copies).
Videocommon also depends on core, which resulted in linking errors (though I'm not sure why). Ideally, dolphintool woudln't depend on videocommon... but some stuff in core does.
72bbb27
to
ffa512f
Compare
|
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
|
This has been tested locally and has seen a lot of reviews from trusted developers. |
|
Ever since this branch, Mario Galaxy water shaders don't seem to load properly in Sea Slide Galaxy and Loopdeswoop/Loopdeloop Galaxy. Attached is the dump file and a screenshot of what the issue looks like. |
|
Thanks; I've created #10366 which should fix that. |

Features:
OpcodeDecoding.cpp, instead of also being decoded for the FIFO player and recorder and in the FIFO analyzer.This is currently a draft to make sure this builds on all platforms. I still need to do a lot of cleanup, and may use
EnumMapin a few more places and/or convert some other things (e.g. the BP and XF enums) toenum classes. Feedback is still appreciated (in particular, I have mixed feelings about what I needed to do to get nestedEnumMaps in the various VertexLoaders, and opinions on the whole OpcodeDecoding callback structure would also be useful).