-
Notifications
You must be signed in to change notification settings - Fork 115
Add a high performance, general purpose RSP command queue engine #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really well implemented and documented! Loved it!
I was only able to review the headers files for now (and a few other places here and there) and tried to write down whenever I have a question in mind. I will continue with the implementation but that's something I'm not very proficient. Still eventually I will go over the whole thing but I'm ok to merge once someone goes over my current comments if I lag behind.
This is huge! Congrats to both of you @rasky & @snacchus! Can't tell how happy I am having you building upon frankenGAS :)
The only downside is we have more to do on the pre-emptive multithreading now :) One step at a time though.
As mentioned in a reply above, I revised some of the API in rsp_queue.inc to make writing overlays a little bit easier and less error prone. The location of the saved state now doesn't need to be explicitly passed to the I also added a bunch of documentation to the macros in rsp_queue.inc and a short guide on how to write overlays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over the implementation rather quickly as well. Please forgive me if some of the questions does not make sense. I believe most of the things might feel trivial to you in hindsight.
Can't believe how much work you have put on this!
I'm ready to merge this whenever you feel comfortable. I will also run the tests on my hardware, just in case we find something different with the setup.
Co-authored-by: Giovanni Bajo <rasky@develer.com>
This PR introduces the rspq library (short for "RSP command queue"), which provides the basic infrastructure to allow a very efficient use of the RSP coprocessor. On the CPU side, it implements an API to enqueue "commands" to be executed by RSP into a ring buffer, that is concurrently consumed by RSP in background. On the RSP side, it provides the core loop that reads and execute the queue prepared by the CPU, and an infrastructure to write "RSP overlays", that is libraries that plug upon the RSP command queue to perform actual RSP jobs (eg: 3D graphics, audio, etc.).
The library is extremely efficient. It is designed for very high throughput and low latency, as the RSP pulls by the queue concurrently as the CPU fills it. Through some complex synchronization paradigms, both CPU and RSP run fully lockless, that is never need to explicitly synchronize with each other (unless requested by the user). The CPU can keep filling the queue and must only wait for RSP in case the queue becomes full; on the other side, the RSP can keep processing the queue without ever talking to the CPU.
The library has been designed to be able to enqueue thousands of RSP commands per frame without its overhead to be measurable, which should be more than enough for most use cases.
Commands
Each command in the queue is made by one or more 32-bit words (up to 15 currently). The MSB of the first word is the command ID. The higher 4 bits are called the "overlay ID" and identify the overlay that is able to execute the command; the lower 4 bits are the command index, which identify the command within the overlay. For instance, command ID 0x37 is command index 7 in overlay 3.
As the RSP executes the queue, it will parse the command ID and dispatch it for execution. When required, the RSP will automatically load the RSP overlay needed to execute a command. In the previous example, the RSP will load into IMEM/DMEM overlay 3 (unless it was already loaded) and then dispatch command 7 to it.
Higher-level libraries and overlays
Higher-level libraries that come with their RSP ucode can be designed to use the RSP command queue to efficiently coexist with all other RSP libraries provided by libdragon. In fact, by using the overlay mechanism, each library can obtain its own overlay ID, and enqueue commands to be executed by the RSP through the same unique queue. Overlay IDs are allocated dynamically by rspq in registration order, to avoid conflicts between libraries.
End-users can then use all these libraries at the same time, without having to arrange for complex RSP synchronization, asynchronous execution or plan for efficient context switching. In fact, they don't even need to be aware that the libraries are using the RSP. Through the unified command queue, the RSP can be used efficiently and effortlessly without idle time, nor wasting CPU cycles waiting for completion of a task before switching to another one.
Higher-level libraries that are designed to use the RSP command queue must:
rspq_init
at initialization. The function can be called multiple times by different libraries, with no side-effect.rspq_overlay_register
to register arsp_ucode_t
as RSP command queue overlay, obtaining an overlay ID to use.rspq_write
andrspq_flush
to enqueue commands for the RSP. For instance, a matrix library might provide a "matrix_mult" function that internally callsrspq_write
to enqueue a command for the RSP to perform the calculation.To be compatible with the queue engine, ucodes must simply include
rsp_queue.inc
at the top of the file and define a header and a command table at the beginning of their data section usingRSPQ_BeginOverlayHeader
,RSPQ_DefineCommand
andRSPQ_EndOverlayHeader
. An overlay ucode doesn't have a single entry point: it exposes multiple functions bound to different commands, that will be called by the queue engine when the commands are enqueued. Seetests/rsp_test.S
for an example.Blocks
A block (
rspq_block_t
) is a prerecorded sequence of RSP commands that can be played back. Blocks can be created viarspq_block_begin
/rspq_block_end
, and then executed byrspq_block_run
. It is also possible to do nested calls (a block can call another block), up to 8 levels deep.A block is very efficient to run because it is played back by the RSP itself. The CPU just enqueues a single command that "calls" the block. It is thus much faster than enqueuing the same commands every frame.
Notice that this library does not support static (compile-time) blocks. Blocks must always be created at runtime once (eg: at init time) before being used.
Syncpoints
The RSP command queue is designed to be fully lockless, but sometimes it is required to know when the RSP has actually executed an enqueued command or not (eg: to use its result). To do so, this library offers a synchronization primitive called "syncpoint" (
rspq_syncpoint_t
). A syncpoint can be created viarspq_syncpoint
and records the current writing position in the queue. It is then possible to callrspq_check_syncpoint
to check whether the RSP has reached that position, orrspq_wait_syncpoint
to wait for the RSP to reach that position.Syncpoints are implemented using RSP interrupts, so their overhead is small but still measurable. They should not be abused.
High-priority queue
This library offers a mechanism to preempt the execution of RSP to give priority to very urgent tasks: the high-priority queue. Since the moment a high-priority queue is created via
rspq_highpri_begin
, the RSP immediately suspends execution of the command queue, and switches to the high-priority queue, waiting for commands. All commands added via standard APIs (rspq_write
) are then directed to the high-priority queue, untilrspq_highpri_end
is called. Once the RSP has finished executing all the commands enqueued in the high-priority queue, it resumes execution of the standard queue.If required, it is possible to call
rspq_highpri_sync
to wait for the high-priority queue to be fully executed.Final notes
For an explanation of implementation details, see documentation in
src/rspq.c
.In addition to the core library, this PR also ports the existing RSP mixer library to be compatible with rspq.
The implementation of this library was a combined effort between @rasky and me. I prototyped the command queue, implemented overlay support and ported the mixer library. Rasky added the blocks and highpri queue, drove the API design and wrote most of the absolutely incredible rspq ucode, which is both extremely fast and as efficient on instruction size as humanly possible, leaving plenty of IMEM for overlays.