Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSP unit testing concept #15

Open
JayFoxRox opened this issue Aug 4, 2018 · 1 comment
Open

DSP unit testing concept #15

JayFoxRox opened this issue Aug 4, 2018 · 1 comment

Comments

@JayFoxRox
Copy link
Owner

JayFoxRox commented Aug 4, 2018

Created from a XboxDev Discord log where I explained the concept.

The actual code to be written is quite simple - it's mostly about setting regs, stepping, and then returning regs (for all unit tests, including GPU or APU).

In case of DSP (part of APU) it's generating assembly for a program like that, so basically a string of format:

"""
<reg setup>
<code to test>
<reg readback>
"""

where we have to hardcode setup and readback, and then only write lines like: unit_test(<code to test>).

We probably would generate that unit_test call by just emitting every instruction we encounter during XQEMU execution. So for each DSP instruction XQEMU encounters we generate a unit_test call.

We'd then run the unit tests on XQEMU and real xbox and compare. then fix all XQEMU bugs and record a new script to make calls to unit_test (as it might see different instructions once the DSP is less buggy).

That sounds somewhat complicated and tedious. As long as it's more tedious than complicated, I'm in!

It's actually very simple!

Imagine this (constructed high level example):

  • We know the x register is 32 bits, and y register is 32 bits. we take this from the DSP datasheet or existing DSP emulation code

  • Let's say our DSP instruction to be tested is x = 5; (but it could also do stuff like y = 10; - really anything).

  • then our reg setup is:

    uint32_t x = rand();
    uint32_t y = rand();
  • then our reg readback is:

    printf("x: 0x%08X", x);
    printf("y: 0x%08X", y);

so when we generate code we get:

// reg setup
srand(seed); // Make sure the rand() will always return random, but known values
uint32_t x = rand();
uint32_t y = rand();

// unit test 1 instruction
x = 5;

// reg readback
printf("x: 0x%08X", x);
printf("y: 0x%08X", y);

aka sprintf(code, "%s\n\n%s\n\n%s", reg_setup, instruction_to_test, reg_readback); or because reg_setup and reg_readback might be constant , we can directly put them into the string

when we execute it we get readback like:

x = 5
y = 182172414;

When implementin the actual DSP test, instead of our reg setup, instruction and reg readback being in C code, they'll be in assembly. but the same principle still applies. We also still use python to generate the assembly code strings.

Filling reg_setup with value

we'd run it with different seed [or dumped / special values instead of rand()] so we definitely know what happens. if we get a different result on XQEMU and a real DSP we know our emu code is bad.

instead of calling uint32_t x = rand(); we'll probably have something like move $x:000000, x0 ; Load value for x0 from memory address 0 (bank x) in the DSP code, and our script which loads the DSP code we will have: xbox.write_apu('x', 0x000000, value_for_x0)

so we first setup memory on the CPU, then reg setup will load the memory into DSP registers and then we step forward, then we move the values from DSP registers back into CPU accessible memory, the CPU will read it and printf()

alternatively we simple generate code like move #%08X, x0 ; Move 32bit immediate value into x0, so we don't have to prepare memory from the CPU - instead the values would be stored in the code; generator code would then look like this:

def unit_test(value_for_x0, value_for_y0, instruction_to_test):
  reg_setup = ""
  reg_setup += "move #%06X, x0\n" % value_for_x0
  reg_setup += "move #%06X, y0\n" % value_for_y0
  ...

That sounds interesting, if not fun.

yeah, it's pretty simple. you just have to read the datasheets to know which registers exist [and their size] and then figure out if they are accessible from the CPU or if you need to access them differently

for example, there are 24 but also 8 bit registers. and some registers can not be accessed using a standard move (such as the flags register which is used to branch)

Special cases we have to skip during tests

similarily we might not be able to accept everything for instruction_to_test: for example, we don't want to test branching instructions which would jump randomly (as they could skip the reg_readback or jump backwards and result in an endless loop) [we probably want to specialize tests for them so we can tell if the branch was taken or not]. we also don't want to test instructions which might talk to other hardware [OR we have to ensure the hardware is in a state where the test can run properly]

we can detect most of these bad unit tests manually though - just try running all of them, and when one fails, we can check why it failed (and then ban it, either manually or automatically detect them + create specialized test / setup if necessary)

we can't just blindly reg_setup ALL registers either. imagine that there's a register which just turns off the hardware. if we'd set that register, suddenly all registers would always read-back 0xFFFFFFFF, independent of the instruction [because the hardware just turns off]. this would be easy to emulate (and emu and hw would react the same), but our test would be utterly broken because we would no longer test what we try to test. similar issues exist when a register controls endianess, bitwidth, where the program executes from, ...

Requirements

a56 is optional, because aimed with a DSP datasheet we could also write our own assembler or use raw output from XQEMU (without having to worry what the actual instruction is - we just need it's binary representation). I strongly recommend using a56 for bootstrapping though.

We already know that XQEMU is enough to at least run a handful of DSP scripts, so we can already start working with XQEMU, without a real xbox:

  • With XQEMU we can develop the unit testing framework and already log XQEMU output.
  • Once we have access to a real Xbox, we can re-run the tool on it to confirm the framework works.
  • We can then also verify XQEMU works correctly (by comparing the output).

On a real xbox you definitely want a debug bios though - because nxdk-rdt is so unstable that it has to be rebooted often (although the unit test could automate this step).
On XQEMU using nxdk-rdt is fine as you can boot straight into it (and you will reboot XQEMU quite often, which can also be annoying; so you might still want to automate this).

Generalization

GPU VS Stateshaders (partially done in nv2a-re)

The same method can be used to observe vertex shaders.

GPU FS and per-pixel operations and rendering (planned for nv2a-trace)

We can apply the same to draw calls. In this case we'd prepare an empty canvas image and prepare renderstates, then step using a draw-call (which is the "instruction" we are testing) and then we observe the actual output image.

GPU FIFO and GPU PGRAPH methods (partially done in nv2a-trace)

See screenshot and code in JayFoxRox/nv2a-trace#1 :

  • 0x1D90: 0xFF00FF00 = set color to clear to pink instruction to test
  • 0x1D94: 0x000000F0 = clear (color buffers for R,G,B,A) with clear color [nv2a-trace shows the current surface state from before] reg readback
  • 0x1D94: 0x00000000 = clear (no buffers) again so we can see the output of previous command [nv2a-trace shows the current surface state from before]

in the 0x1D90 you can see the same thing happen on a lower level too: PGRAPH state is dumped (reg setup alternative where we keep the current value, but remember what it was) and comparison starts; then the instruction runs and we dump PGRAPH again and see that the PGRAPH register at address 0x0040186C was changed from 0x00000000 [black] to 0xFF00FF00 [pink] (PGRAPH comparison is then finished)

so we know that GPU command 0x1D90 modifies the PGRAPH register at 0x0040186C. the reason why we want a more flexible reg_setup phase is because if we reran 0x1D90: 0xFF00FF00 then 0x0040186C would not be changed [because it already contains 0xFF00FF00]. Similarily, it's possible that it does not set (=) the value, but instead it might OR (|=) the value

so in order to catch all cases, we want to set 0x0040186C to 0xFFFFFFFF and run the command [if it does OR, nothing happens. if it does SET, it will change, if it does AND it would also change]

then we'd repeat it after manually setting 0x0040186C to 0x00000000 [AND wouldn't do anything this time, but OR or SET might change it]

we'd also probably do it for random values of 0x0040186C and for various arguments to catch as many special cases as possible

(this also shows that 0x1D90 might also affect other registers, but we are not aware of it yet btw - it might also set 0x00401337 to 0xDEADBEEF every time - but nothing ever sets 0x00401337 to anything else, so we simply can't tell. that's why we want to control as many registers in our reg_setup and reg_readback as possible (which might be tricky to do as they are not easy to set or they might be used elsewhere which changes behaviour or crashes the xbox)
-->

JayFoxRox pushed a commit that referenced this issue Sep 11, 2018
With a Spice port chardev, it is possible to reenter
monitor_qapi_event_queue() (when the client disconnects for
example). This will dead-lock on monitor_lock.

Instead, use some TLS variables to check for recursion and queue the
events.

Fixes:
 (gdb) bt
 #0  0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
 #1  0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
 #2  0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
 #3  0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
 #4  0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
 #5  0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
 #6  0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
 #7  0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
 #8  0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
 #9  0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
 #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
 #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
 #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
 #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
 #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
 #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
 #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
 #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
 #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
 #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
 #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
 #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
 #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
 #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
 #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
 #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
 #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
 #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
 #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644

Note that error report is now moved to the first caller, which may
receive an error for a recursed event. This is probably fine (95% of
callers use &error_abort, the rest have NULL error and ignore it)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180731150144.14022-1-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[*_no_recurse renamed to *_no_reenter, local variables reordered]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
@JayFoxRox
Copy link
Owner Author

Partially implemented by JayFoxRox/xbox-tools#70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant