New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use magic numbers for RAM sizes/masks in PPCCache #11764
Do not use magic numbers for RAM sizes/masks in PPCCache #11764
Conversation
| @@ -168,11 +171,11 @@ void Cache::Invalidate(u32 addr) | |||
| if (valid[set] & (1U << way)) | |||
| { | |||
| if (addrs[set][way] & CACHE_VMEM_BIT) | |||
| lookup_table_vmem[(addrs[set][way] >> 5) & 0xfffff] = 0xff; | |||
| lookup_table_vmem[(addrs[set][way] >> 5) & (memory.GetFakeVMemMask() >> 5)] = 0xff; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps make local helpers like:
static u32 GoodName(const Memory::MemoryManager& memory)
{
return memory.GetFakeVMemMask() >> 5;
}
static u32 GoodName2(const Memory::MemoryManager& memory)
{
return memory.GetExRamMask() >> 5;
}
static u32 GoodName3(const Memory::MemoryManager& memory)
{
return memory.GetRamMask() >> 5;
}so all the repeated shifting in 8 places isn't necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the symmetry of the statement is pleasing. If helpers were necessary, though, I believe a single helper which performs the right shift by 5 to align to an eight-word boundary would be all we need. But then, any decent name describing this action will undoubtedly make the statement even wider than it already is.
|
Can't this cause out of bounds accesses to these arrays? dolphin/Source/Core/Core/PowerPC/PPCCache.h Lines 37 to 41 in 1a2dcc5
IMO it'd be better to get rid of them entirely like I tried to do in #10826 - and now that the data cache is implemented I can get decent performance data without needing to run it on pure interpreter for icache. |
|
Whoops, you are right. Swapping |
2cbef2f
to
d688a73
Compare
|
I think the lookup table sizes are meant to match the emulator-allocated size of memory, not the real size of memory. The original magic numbers would suggest such. Is this correct? |
2a0b4ac
to
3e90a03
Compare
|
isn't this code in a hot path? |
|
A very hot path, though I didn't notice a severe decrease in emulation speed. Certainly nothing more significant than what Write-Back Cache emulation already causes. The only optimization I could offer would be more functions which return RAM sizes/masks that are already right-shifted by five, since the RAM sizes never changes after MemoryManager::Init(). Are you requesting that? |
|
This change probably saves a single instruction per PPCCache method execution, so the difference is completely imperceptible to me. Still, an instruction saved is an instruction earned. |
2d5f5f7
to
acd823f
Compare
|
Why not just store the masks in the Cache struct itself? |
|
Storing them in the Cache / InstructionCache classes meant having two copies of the pre-calculated values, which I felt could mislead someone to believe there is a difference between the data and instruction cache eight-word ram masks. I really wanted to store them in a relevant class which encompassed both the dCache and iCache, but none I knew of were easily accessible from the functions that need the values. |
|
Is this final solution satisfactory? |
e5a3a17
to
20c6c0b
Compare
|
Yeah I like that better personally, @Pokechu22 and @lioncash, opinions? |
|
(though I don't really understand why they're called |
|
An L1 cache block contains eight words (32 bytes). All of this right-shifting by five is to convert the size/mask in bytes to the size/mask in cache blocks. |
|
To be clear: I don't think that change makes a difference in terms of perf, the compiler can just perform the AND before the right shift as they're both unsigned and shift amount is the same - making the logical code identical in either case. The overhead of this PR was/is the indirect mem access to fetch the mask since it's no longer an immediate. The change moves the indirect access to |
Memory Override + Write-Back Cache emulation = game crash
20c6c0b
to
9ba2667
Compare
|
Thank you, shuffle2, for having the first good suggestion so far. I was totally blind to that potential logic optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some measurements with this patch:
Patch
diff --git a/Source/Core/Core/HW/DVD/DVDThread.cpp b/Source/Core/Core/HW/DVD/DVDThread.cpp
index d861bba2eb..036b60a0ec 100644
--- a/Source/Core/Core/HW/DVD/DVDThread.cpp
+++ b/Source/Core/Core/HW/DVD/DVDThread.cpp
@@ -332,6 +332,13 @@ void DVDThread::DVDThreadMain()
{
Common::SetCurrentThreadName("DVD thread");
+ m_file_logger.m_time_to_open = 0;
+ m_file_logger.m_time_to_boot = 0;
+ m_file_logger.m_time_to_intro = 0;
+ m_file_logger.m_time_to_menu = 0;
+ m_file_logger.m_time_to_attract = 0;
+ m_file_logger.m_timer.Start();
+
while (true)
{
m_request_queue_expanded.Wait();
diff --git a/Source/Core/Core/HW/DVD/FileMonitor.cpp b/Source/Core/Core/HW/DVD/FileMonitor.cpp
index 6e08006a6a..1efd711662 100644
--- a/Source/Core/Core/HW/DVD/FileMonitor.cpp
+++ b/Source/Core/Core/HW/DVD/FileMonitor.cpp
@@ -88,6 +88,29 @@ void FileLogger::Log(const DiscIO::Volume& volume, const DiscIO::Partition& part
// Update the last accessed file
m_previous_partition = partition;
m_previous_file_offset = file_offset;
+
+ if (path == "opening.bnr")
+ {
+ m_time_to_open = m_timer.ElapsedMs();
+ }
+ else if (path == "data/nintendo.szs")
+ {
+ m_time_to_boot = m_timer.ElapsedMs();
+ }
+ else if (path == "data/Entrance.thp")
+ {
+ m_time_to_intro = m_timer.ElapsedMs();
+ }
+ else if (path == "data/particle.szs")
+ {
+ m_time_to_menu = m_timer.ElapsedMs();
+ }
+ else if (path == "data/autodemoA.thp")
+ {
+ m_time_to_attract = m_timer.ElapsedMs();
+ SuccessAlertFmt("Open: {}\nBoot: {}\nIntro: {}\nMenu: {}\nAttract: {}", m_time_to_open,
+ m_time_to_boot, m_time_to_intro, m_time_to_menu, m_time_to_attract);
+ }
}
} // namespace FileMonitor
diff --git a/Source/Core/Core/HW/DVD/FileMonitor.h b/Source/Core/Core/HW/DVD/FileMonitor.h
index 38de351460..84016af971 100644
--- a/Source/Core/Core/HW/DVD/FileMonitor.h
+++ b/Source/Core/Core/HW/DVD/FileMonitor.h
@@ -5,6 +5,7 @@
#include "Common/CommonTypes.h"
#include "DiscIO/Volume.h"
+#include "Common/Timer.h"
namespace FileMonitor
{
@@ -16,6 +17,13 @@ public:
void Log(const DiscIO::Volume& volume, const DiscIO::Partition& partition, u64 offset);
+ u64 m_time_to_open;
+ u64 m_time_to_boot;
+ u64 m_time_to_intro;
+ u64 m_time_to_menu;
+ u64 m_time_to_attract;
+ Common::Timer m_timer;
+
private:
DiscIO::Partition m_previous_partition;
u64 m_previous_file_offset = 0;
diff --git a/Source/Core/Core/PowerPC/MMU.cpp b/Source/Core/Core/PowerPC/MMU.cpp
index a1597b96a5..09c4eccb66 100644
--- a/Source/Core/Core/PowerPC/MMU.cpp
+++ b/Source/Core/Core/PowerPC/MMU.cpp
@@ -1099,7 +1099,9 @@ void MMU::StoreDCacheLine(u32 address)
if (m_ppc_state.msr.DR)
{
- auto translated_address = TranslateAddress<XCheckTLBFlag::Write>(address);
+ // Note: per ppc_750cl.pdf, dcbst is treated as a load, not a store
+ // (presumably because it doesn't change the value in memory that the CPU sees)
+ auto translated_address = TranslateAddress<XCheckTLBFlag::Read>(address);
if (translated_address.result == TranslateAddressResultEnum::DIRECT_STORE_SEGMENT)
{
return;
@@ -1107,7 +1109,7 @@ void MMU::StoreDCacheLine(u32 address)
if (translated_address.result == TranslateAddressResultEnum::PAGE_FAULT)
{
// If translation fails, generate a DSI.
- GenerateDSIException(address, true);
+ GenerateDSIException(address, false);
return;
}
address = translated_address.address;
@@ -1145,15 +1147,22 @@ void MMU::FlushDCacheLine(u32 address)
if (m_ppc_state.msr.DR)
{
- auto translated_address = TranslateAddress<XCheckTLBFlag::Write>(address);
+ // Note: per ppc_750cl.pdf, dcbf is treated as a load, not a store
+ // (presumably because it doesn't change the value in memory that the CPU sees)
+ auto translated_address = TranslateAddress<XCheckTLBFlag::Read>(address);
if (translated_address.result == TranslateAddressResultEnum::DIRECT_STORE_SEGMENT)
{
return;
}
if (translated_address.result == TranslateAddressResultEnum::PAGE_FAULT)
{
- // If translation fails, generate a DSI.
- GenerateDSIException(address, true);
+ // XXX Based on ppc_750cl.pdf, a DSI exception should occur if translation fails.
+ // However, on startup, Super Mario Sunshine calls OSProtectRange(0,0,0x80000000,0) and
+ // OSProtectRange(1,0x83000000,0x7d000000,0) on startup, which in turn call
+ // DCFlushRange(0,0x80000000) and DCFlushRange(0x83000000,0x7d000000), which then runs
+ // dcbf for every block from 0 to 0x7fffffe0 and 0x83000000 to 0xffffffe0.
+ // Since this does not crash, presumably no exception should be generated.
+ // GenerateDSIException(address, false);
return;
}
address = translated_address.address;
@@ -1169,15 +1178,18 @@ void MMU::TouchDCacheLine(u32 address, bool store)
if (m_ppc_state.msr.DR)
{
- auto translated_address = TranslateAddress<XCheckTLBFlag::Write>(address);
+ // Note: per ppc_750cl.pdf, dcbt and dcbtst are treated as loads, not stores
+ // (presumably because it doesn't change the value in memory that the CPU sees)
+ auto translated_address = TranslateAddress<XCheckTLBFlag::Read>(address);
if (translated_address.result == TranslateAddressResultEnum::DIRECT_STORE_SEGMENT)
{
return;
}
if (translated_address.result == TranslateAddressResultEnum::PAGE_FAULT)
{
- // If translation fails, generate a DSI.
- GenerateDSIException(address, true);
+ // Per ppc_750cl.pdf: "Additionally, no exception occurs in the case of a translation fault or
+ // protection violation."
+ // GenerateDSIException(address, false);
return;
}
address = translated_address.address;This patch adds some rough time measuring using the file monitor log system (for which the corresponding log type must be enabled) to measure how long it takes for various parts of Super Mario Sunshine to load. However, Super Mario Sunshine fails to boot with the cache enabled on master, and this patch also includes a workaround for that (which I'll PR in a bit once I'm more confident of it).
Specifically, I measured how long it took for the game to boot (via the GC IPL), then play through the airplane cutscene (since that video is decompressed on the CPU), then idle on the title screen, then start the attract mode cutscene. I used the null video backend and had the speed limit at unlimited.
Originally, it took between 230.6-233.5 seconds for this to happen on my machine. Now it takes between 228.9-231.3 seconds. (I'm not sure if this difference is statistically significant with my sample size of 5.) Note that with the cache disabled, at 100% speed, it takes about 160.5 seconds. Still, this seems to be faster, or at least not slower, than it was before, and for a bugfix that's useful.
Raw data:
Master:
---------------------------
Information
---------------------------
Open: 1860
Boot: 15099
Intro: 23967
Menu: 181464
Attract: 231831
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1870
Boot: 15039
Intro: 23778
Menu: 180367
Attract: 230788
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1805
Boot: 14955
Intro: 23746
Menu: 179866
Attract: 230558
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1894
Boot: 15278
Intro: 24269
Menu: 182882
Attract: 233477
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1867
Boot: 15121
Intro: 23906
Menu: 180851
Attract: 230978
---------------------------
OK
---------------------------
PR:
---------------------------
Information
---------------------------
Open: 1858
Boot: 15248
Intro: 24218
Menu: 181120
Attract: 231344
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1809
Boot: 15040
Intro: 23859
Menu: 179272
Attract: 229530
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1830
Boot: 14969
Intro: 23858
Menu: 178834
Attract: 228870
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1806
Boot: 14923
Intro: 23736
Menu: 180230
Attract: 230363
---------------------------
OK
---------------------------
---------------------------
Information
---------------------------
Open: 1773
Boot: 15164
Intro: 23895
Menu: 179846
Attract: 229977
---------------------------
OK
---------------------------
I also tried removing the lookup tables entirely or switching to std::unordered_map, but both of those seemed to have worse performance. The test wasn't super scientific though. (I was doing other stuff on my PC at the time, specifically watching YouTube videos, but I did try to make sure the load was about the same in all cases).
Memory Override + Write-Back Cache emulation = game crash