New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/DSP: Trivial adjustment to BLOOP{,I} sub-operation order #11106
Conversation
|
@Pokechu22 You probably know what to do with this? |
|
I checked the old docs, and it seems like this code snippet has always been like that (which doesn't necessarily have to mean it was correct before). The change makes sense, but obviously a DSP Test to confirm this would be nice. |
|
I assume by "using DSPSpy with $st{0..3} unmasked" you mean changing this code to eliminate the The actual change makes sense. There are two differences in behavior: This is consistent with how we implement it in the interpreter and (I think) in the JIT. I do have one change I'd like you to make: please update the version number and version history table in the manual. |
1ba2bbc
to
0815b43
Compare
While I am familiar-enough with DSPSpy to run code, I am not quite sure how such test should behave: should the DSP code detect a failure or should it just dump registers and let something else (visibly not DSPSpy itself, maybe something else checking a file with all dumps) check if the content is as expected ? Checking Could you describe a bit how tests should be written (maybe DSPSpy is not even the correct place to look at) ? In any case, I can implement such test. I'm thinking about a 2-iteration loop dumping registers right after BLOOP{,I}, and some code checking the value of
Correct. I have a patch dropping this condition that I hesitated to submit. For reference, the condition comes from b0bb4e6 which indeed hides them because of the noise they cause. Personally (as a DSP neophyte, running DSPSpy interactively) I found them useful for telling quickly which dump call produced a given result (with the return stack tip) and in which loop I am (with the other registers).
This is exactly what I observed.
Done. Tangential topics:
|
|
One other thing to note is that Dolphin doesn't handle the stacks quite right; it allows more to be pushed onto them than actually should be possible. This is compounded by the fact that DSPSpy loads initial values into the stack registers, i.e. they already have one thing pushed into them at the start (you can edit the initial register values, though I don't know if you can do that with the stack registers when they're hidden). But on real hardware with DSPSpy, you can only nest loops 3 deep (presumably 4 deep if DSPSpy didn't push initially), while on Dolphin it lets you nest further. I haven't investigated this further because nothing seems to rely on the stack overflow interrupts, though.
There isn't any consistency here :) When I wrote tests, I tried to sometimes detect failures if it was easy, but if that seemed like it would be more difficult, I instead opted to always send back the results so that the correct results could be saved on real hardware using DSPSpy's save functionality, and then that could be (manually) compared with results saved from dolphin.
Probably they would be more useful if we included symbols of some sort with the assembled DSP binaries. There's some vague code in DSPTool to support this, but I don't think it ever was fully implemented (and I don't think DSPSpy uses it in any way). If you find them useful, it's probably fine to revert it. (I mainly use DSPSpy interactively too, but from what I've seen the result dumps still include the stack registers, so it's not like hiding them from the UI is helpful for automation.)
That link seems to have gotten broken (I think it's supposed to be jamchamb/gc-memcard-adapter#5?). In any case, @xperia64 previously did some investigation into those formats in #10766 while reverse-engineering devolution. That PR never got fully finished, though. Note that Dolphin currently cheats at implementing memory card unlocking; it pretends that cards are always unlocked, so the card uCode never actually runs. I have an incomplete branch where I partially implemented it, but I never finished it either. |
Are they actually pushed ? My (unverified) mental model for the stack registers is that the instructions control the push/pop (so the address of the stack top) and the registers only show & modify that cell. I'll try to check this later (got to go to work):
Woops, forgot to check the link before posting. You are absolutely correct. With that patch I can unlock official memory cards on a raspberry pi. My intent being to avoid having to purchase a memory card with an exploit and swiss on it, but I ended up purchasing a Wii to step through the DSP code and ended up creating the memory card this way... If maybe not the sanest path from a financial point of view, it was at least a lot of fun. And now I have a Wii.
I spotted this trick, yes. Which makes perfect sense, it seems very unlikely anything would depend on the actual unlocking process. |
The way Dolphin implements it is that reads pop and writes push, which of course doesn't mean that's how it actually works, but it matches what I remember seeing on real hardware a while ago. The manual doesn't mention which way it works, though. |
You are correct and my mental model is wrong:
So loads and stores have side effects on at least Which brings me back to:
...and what I observed may have been a mirage. If I run:
But if I run: then Modifying So, somehow, calling within a My plan for tomorrow:
|
|
A somewhat similar DSP (μPD77210, used by the Wii Speak, see this) requires that branch instructions not happen within 3 instructions of a loop's end (see this, page 75 and this, page 111 (these refer to the text page numbers; they're PDF pages 77 and 113 respectively)). I think that if you add an extra EDIT: The Zelda uCode (24B22038 specifically, used by the NTSC GameCube IPL) has a function at 0470 (CMD02?) which uses a BLOOP that has a lot of CALLs, including one at the almost-last instruction (followed by 2 NOPs). (There are also cases where it pads with several NOPs even though it's not using any calls (00ae and 00d0, which interact with the accelerator - this might be a timing thing of some sort, or just jank from an early revision).) |
The screen real-estate is already reserved, the values are dumped and
restored by the on-DSP code, why not make something out of these values ?
Allows following:
- where exactly send_back was called from ($st1)
- the boundaries and progress of the innermost BLOOP{,I} ($st0, 2 and 3)
up to send_back's call
Noticed while tracing in a BLOOP using DSPSpy with $st{0..3} unmasked.
BLOOPI assumed to follow the pattern.
0815b43
to
099e6c9
Compare
I added 2 Then I checked Then I re-enabled all which confirmed my assumed sub-operation order: I added a test for this, which in turn means that I included my change un-hiding |
Ooh, so that is the meaning of setting the accelerator address's MSb. I wondered, but never got around to testing and instead just focused on replicating the code (in python) and then comparing the output (and intermediate steps) with the firmware's. I am also guessing that I triggered exception 3 because I did not pay enough attention to the order in which I was configuring the accelerator before writing to/reading from D3. I ended up re-assembling the disassembled relevant IROM functions into IRAM, so that I could insert
Oh, I also started assuming the accelerator reads were fetching bytes, but the output would only take so few possible values that I thought some bruteforce would be enough to unlock memory cards (as I found on the PS1/PS2 USB memory card reader for the PS3, almost). About the length, I assumed that the length of the input value would always be 8 bytes (which I got reading About the memory cards being detected as corrupted, could this be because of the first 12 bytes read from the card during the unlocking procedure ? These are XOR'ed with a keystream computed on the PPC (...at least in the case of |
Just to clarify, the limit of 3 was for an unrelated DSP (albeit one that has a lot of similar features, including a loop stack and extended opcodes of a sort), not the one on the GameCube/Wii. I think the limit is 2 in this case, as in this would also be legal: but I haven't actually tested that (this is just roughly what I saw on the Zelda uCode).
Just as a note, the dolphin/Source/DSPSpy/main_spy.cpp Lines 66 to 70 in 487a11f
But good to see that this is consistent with my understanding of it.
Yeah, I think this is true in practice for games unlocking memory cards (at least I haven't seen a counterexample), and I think that's the only place where that function is used in practice.
My WIP branch only implemented the DSP part of it, and doesn't do anything with it on the memory card itself; I haven't looked into what actually happens on the memory card or how it produces the same key, or where the PPC side of it gets its values. That part does sound plausible though. |
While it is still fresh in my mind, and in case you would be interested here are some pointers in libogc2: There are two read operations from the card which influence the keystream, with 3 inputs:
The third parameter only influences the keystream after the card sends the 12 bytes which I suspect influence the console into declaring the card as corrupt: it is the data returned by the second read, before it knows how many bytes will be read beyond the first fixed 20 bytes (containing the 12 bytes I'm talking about, followed by the 8 bytes which get fed to the DSP). Those 12 bytes, once decrypted, become From there, it is easy to decode the stored serial using the stored time and check that it matches the decrypted first 12 bytes... If it actually does this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. (I haven't tried the test out on my own console, but I expect it works correctly.)
Noticed while tracing in a BLOOP using DSPSpy with $st{0..3} unmasked. BLOOPI assumed to follow the pattern.