Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit64: dcbf + dcbi #2663

Merged
merged 3 commits into from Aug 25, 2015
Merged

Jit64: dcbf + dcbi #2663

merged 3 commits into from Aug 25, 2015

Conversation

degasus
Copy link
Member

@degasus degasus commented Jun 26, 2015

Both dcbx instructions are called quite often and usually NOPs. So inline to first check within the JIT itself to skip over them much faster.

@degasus degasus added the WIP / do not merge Work in progress (do not merge) label Jun 26, 2015
ABI_PushRegistersAndAdjustStack({}, 0);
MOV(32, R(ABI_PARAM1), R(addr));
MOV(32, R(ABI_PARAM2), Imm32(32));
MOV(32, R(ABI_PARAM3), Imm32(0));

This comment was marked as off-topic.

@JMC47
Copy link
Contributor

JMC47 commented Jun 26, 2015

Rogue Squadron 3 crashes with idleskipping off with this pull request.

@degasus
Copy link
Member Author

degasus commented Jun 26, 2015

@JMC47 sounds like a synchronization issue. This shouldn't affect idle skipping. Does it also crash on single core?

@JMC47
Copy link
Contributor

JMC47 commented Jun 26, 2015

@degasus it doesn't crash with just "sync on idle skipping" disabled. It only crashes with full idle skipping disabled.

Not a sync issue.

@@ -67,7 +68,6 @@ class ValidBlockBitSet final
};
std::unique_ptr<u32[]> m_valid_block;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@degasus
Copy link
Member Author

degasus commented Aug 6, 2015

@JMC47 Is Rogue Squadron 3 fixed now?

@JMC47
Copy link
Contributor

JMC47 commented Aug 6, 2015

Yeah, it's fixed.

@JMC47
Copy link
Contributor

JMC47 commented Aug 6, 2015

Seems to randomly crash when playing various games now, though.

@JMC47
Copy link
Contributor

JMC47 commented Aug 7, 2015

Summer Games II (Virtual Console) crashes on this PR but not on master.

@degasus degasus changed the title Jit64: dcbf + dcbi Jit64: dcbf + dcbi + icbi Aug 8, 2015
// dcbi
if (inst.SUBOP10 == 470)
{
TEST(16, M(&DSP::g_dspState), Imm16(1 << 9));

This comment was marked as off-topic.

@degasus
Copy link
Member Author

degasus commented Aug 8, 2015

flacs> i like it

@degasus degasus removed the WIP / do not merge Work in progress (do not merge) label Aug 8, 2015
@JMC47
Copy link
Contributor

JMC47 commented Aug 8, 2015

Summer Games II fixed, now Super Mario 64 is crashing.

@degasus degasus changed the title Jit64: dcbf + dcbi + icbi Jit64: dcbf + dcbi Aug 23, 2015
@degasus
Copy link
Member Author

degasus commented Aug 23, 2015

I've dropped the ICBI commit, seems like I was wrong with the icache.
@JMC47 Are there still some broken games?

@JMC47
Copy link
Contributor

JMC47 commented Aug 24, 2015

Super Mario 64 no longer crashing.

@@ -145,7 +145,7 @@ struct ARAMInfo

// STATE_TO_SAVE
static ARAMInfo g_ARAM;
static UDSPControl g_dspState;
UDSPControl g_dspState;

This comment was marked as off-topic.

degasus and others added 3 commits August 24, 2015 18:33
- dynamically allocate third scratch register instead of forcing ECX
- use LEA as 3 operand add if possible
- use BT,JC instead of SHR,TEST,JNZ
- merge MOV,TEST
- use appropriate ABI function (no asm change)
@degasus
Copy link
Member Author

degasus commented Aug 24, 2015

btw, I think it's worth to also implement this in another way, so a very common code seems to be:

dcbi r0, r3
addi r3, r3, 32
bdnz+

dcbf r0, r3
addi r3, r3, 32
bdnz+

dcbst r0, r3
addi r3, r3, 32
bdnz+

dcbz r0, r3
addi r3, r3, 32
bdnz+

dcbt r0, r3
dcbst r0, r3
addi r3, r3, 32
bdnz+

@degasus
Copy link
Member Author

degasus commented Aug 24, 2015

ready to merge

degasus added a commit that referenced this pull request Aug 25, 2015
@degasus degasus merged commit 24cb650 into dolphin-emu:master Aug 25, 2015
@degasus degasus deleted the dcbx branch February 3, 2019 22:17
smurf3tte added a commit to smurf3tte/dolphin that referenced this pull request Dec 6, 2020
PR dolphin-emu#2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.

The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.

The Jit64 implementation is retained, though all it does now is emit a call. This may be marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation is removed, as I am not set up for testing that platform. I don't expect this to make a meaningful difference.

The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.

https://bugs.dolphin-emu.org/issues/12133
smurf3tte added a commit to smurf3tte/dolphin that referenced this pull request Dec 6, 2020
…ath to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.

The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.

The Jit64 implementation is retained, though all it does now is emit a call. This may be marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation is removed, as I am not set up for testing that platform. I don't expect this to make a meaningful difference.

The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.

https://bugs.dolphin-emu.org/issues/12133
smurf3tte added a commit to smurf3tte/dolphin that referenced this pull request Dec 6, 2020
PR dolphin-emu#2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.

The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.

The Jit64 implementation is retained, though all it does now is emit a call. This may be marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation is removed, as I am not set up for testing that platform. I don't expect this to make a meaningful difference.

The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.

https://bugs.dolphin-emu.org/issues/12133
smurf3tte added a commit to smurf3tte/dolphin that referenced this pull request Dec 20, 2020
PR dolphin-emu#2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.

The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.

The Jit64 implementation is retained, though all it does now is emit a call. This may be marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation is removed, as I am not set up for testing that platform. I don't expect this to make a meaningful difference.

The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.

https://bugs.dolphin-emu.org/issues/12133
smurf3tte added a commit to smurf3tte/dolphin that referenced this pull request Jan 23, 2021
PR dolphin-emu#2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.

The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.

The Jit64 implementation is retained, though all it does now is emit a call. This is marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation has also been fixed.

The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.

https://bugs.dolphin-emu.org/issues/12133
Miksel12 pushed a commit to Miksel12/dolphin that referenced this pull request Jan 31, 2021
PR dolphin-emu#2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.

The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.

The Jit64 implementation is retained, though all it does now is emit a call. This is marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation has also been fixed.

The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.

https://bugs.dolphin-emu.org/issues/12133
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants