Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "DSPLLE: Add assertion for bad DMA alignment" #11505

Merged
merged 1 commit into from Jan 29, 2023

Conversation

Pokechu22
Copy link
Contributor

This reverts commit e140516. This assert triggers for AX and AXWii uCode games (including the Wii System Menu) for various addresses that seem to be 4-byte aligned. Worse still, if the DSP thread is in use (i.e. for DSP LLE recompiler, but not for DSP LLE interpreter), Dolphin completely hangs after the panic alert. Perhaps the data DMA has fewer restrictions compared to the instruction DMA?

The change to DSPTool (e391a28) has not been reverted, as it still fixes broken behavior for DSPSpy at -O0 on real hardware.

This reverts commit e140516. This assert triggers for AX and AXWii uCode games (including the Wii System Menu) for various addresses that seem to be 4-byte aligned. Worse still, if the DSP thread is in use (i.e. for DSP LLE recompiler, but not for DSP LLE interpreter), Dolphin completely hangs after the panic alert. Perhaps the data DMA has fewer restrictions compared to the instruction DMA?

The change to DSPTool (e391a28) has not been reverted, as it still fixes broken behavior for DSPSpy at -O0 on real hardware.
@delroth delroth merged commit 19d16b1 into dolphin-emu:master Jan 29, 2023
14 checks passed
@shuffle2
Copy link
Contributor

maybe on dspspy you hit some cache issue instead?

@Pokechu22
Copy link
Contributor Author

I think I remembered the other issue I was having more clearly: some kind of infinite loop with mail (while also messing with external interrupts). The cache is flushed correctly:

if (mail == 0x8071feed)
{
// DSP ready for task. Let's send one.
// First, prepare data.
for (int n = 0; n < 32; n++)
dspbufC[0x00 + n] = dspreg_in[n];
DCFlushRange(dspbufC, 0x2000);
// Then send the code.
DCFlushRange((void*)dsp_code[curUcode], 0x2000);
// DMA ucode to iram base, entry point is just after exception vectors...0x10
// NOTE: for any ucode made by dsptool, the block length will be 8191
real_dsp.SendTask((void*)MEM_VIRTUAL_TO_PHYSICAL(dsp_code[curUcode]), 0,
sizeof(dsp_code[curUcode]) - 1, 0x10);
runningUcode = curUcode + 1;

And for what it's worth, dspbufC is aligned as it's the same as dspbuffer, declared like this:

// Used for communications with the DSP, such as dumping registers etc.
u16 dspbuffer[16 * 1024] __attribute__((aligned(0x4000)));

dspbufP = (u16*)MEM_VIRTUAL_TO_PHYSICAL(dspbuffer); // physical
dspbufC = dspbuffer; // cached
dspbufU = (u32*)(MEM_K0_TO_K1(dspbuffer)); // uncached

I also found explicit confirmation of the alignment requirements, both in yagcd and in https://patents.google.com/patent/US6606689 (control+F for "DSPA: DSp dma dsP memory Address High DSPaddress 0xFFCD" - no idea why the capitalization is like that; it's the same way in the PDF and original application as far as I can tell). I'll still try to make a hardware test to confirm these though.

@shuffle2
Copy link
Contributor

shuffle2 commented Feb 4, 2023

I think I explained before but I would really not take either yagcd nor the patents at face value. I also wouldn't be surprised if some stuff in this area slightly changes between gc and wii

@Pokechu22
Copy link
Contributor Author

I'm more willing to trust the patents saying it must be 4-byte aligned to my own complete guess of 32-byte aligned (which had nothing significant to back it up other than I knew that that worked properly), but it definitely does need to be tested further.

I think they're at least a good starting point for behaviors to check, though it definitely can't be assumed to be perfect (especially since they go out of their way to be weaselly yet extremely specific with "for example" and the like). Regarding GC/Wii changes, almost certainly the main memory address behaves differently since (to my understanding) the DSP can access MEM1 and MEM2 now, but I'm doing my testing on a Wii so that shouldn't make much of a difference.

For what it's worth, the patent also has these paragraphs:

The data RAM is organized as 4 pages, each page being 1 kword in size. The data ROM is organized as 1 page having a size of 2 kword. One data RAM page is made up of four copies of 256×16-bit synchronous one way dual port SRAM and the data ROM page is made up of a copy of 2048×16-bit synchronous single port ROM. Each page is independent of the other pages so that each page has its own data, address busses and read, write control signals to connect to the three requesters. Data in/out ports for DSP buses 1 and 2 are 16 bits wide and the data in/out ports for DSP DMA 819 are 64 bits. In this arrangement, up to three pages can be active simultaneously for three requesters.
In this example system, each SRAM page can be accessed by one read or one write or one read and one write, but cannot be accessed by two reads or two writes. The reads could be DSP bus 1 or 2 or DSP DMA read and the writes could be DSP bus 1 or 2 or DSP DMA write. The ROM page can only be accessed by one read and the read can be a DSP bus 1 or 2 read. DSP DMA 819 cannot read the data ROM. If a page is being read by DSP DMA, DSP 811 can still write the page or read/write other pages. If a page is being written by DSP DMA 819, DSP 811 can still read the page or read/write other pages. To avoid hardware conflicts, the DSP read and the DMA write or the DSP write and DMA read should not occur on the same address location. DSP 811 is not allowed to read the page that the DMA is reading and the DSP is not allowed to write the page to which the DMA is writing.

This describes something dolphin implemented (see LD in 503bf54, and also 4d5eba2), but it seems like we don't handle all of the edge-cases for it (just the case of two loads from the same DRAM page, though I don't think there are any instructions that can do two stores at the same time, only a load and a store; we don't handle two reads from ROM and we do DMAs instantly).

Another edge case I accidentally found while trying to test the DMA alignment behavior is that bus 2 can't actually read the IFX registers; LRI $AR0, #0 LRI $AR3, #DSBL NX'LD : $AX0.L, $AX1.L, @$AR0 will end up loading the value 0xbeef into $AX1.L for some reason. The patent text itself doesn't mention this, but figure 6 only shows bus 1 being connected to the accelerator and decoder, but both bus 1 and bus 2 are connected to data memory (which I take to mean only bus 1 is connected to IFX in general, as there's no block for that).

I'm not going to investigate those in detail for now, because even this DMA behavior was a side investigation from the thing I actually wanted to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants