Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the best performance I have been able to get to with the RP2040 PRU. I really have a much better understanding of Remora now and how it interacts with LinuxCNC. The main point of all of these changes has been to reduce the amount of jitter in the base thread running on the MCU to eliminate timing jitter on the step pulses. There were 3 major sources of timing jitter:
To address the data copy delays (having to pause the base thread while data is copied in and out with the host) I made a simple double-buffer and now the base thread only needs to be interrupted long enough to switch the pointers over. This did require some small changes to the stepgen code. I am not blocking the servo thread at the moment as I don't think it needs it, but this is maybe something to look at if you start doing more than just basic I/O/Blink in the servo thread.
To address number 2 I made the following changes:
The above changes are fine and good but there are some gotchas now (issue 3) because the base thread is running in an interrupt context and doing floating-point math. Since the RP2040 doesn't have a FPU, this math requires calls to the software floating point libraries. The issue here is that those libraries are stored in the SPI flash and contention on that bus (networking code on the other core mainly) plus it is not really fast to begin with means that the base thread can still be interrupted by the rest of the system even if you use critical sections. And because of the library access it's non-trivial to try to get specific functions (stepgen) loaded into RAM. So I just load the entirety of Remora into RAM "set(PICO_COPY_TO_RAM 1)" and it all fits and this eliminates the jitter from the FP instructions.
Might need to keep an eye on this as the networking packet buffers are dynamic, but there is a decent amount of memory left as Remora is fairly compact and the config file is still stored in flash:
Running from flash:
[build] Memory region Used Size Region Size %age Used
[build] FLASH: 168008 B 2 MB 8.01%
[build] RAM: 45788 B 256 KB 17.47%
[build] SCRATCH_X: 2 KB 4 KB 50.00%
[build] SCRATCH_Y: 0 GB 4 KB 0.00%
Running from RAM:
[build] Memory region Used Size Region Size %age Used
[build] FLASH: 167452 B 2 MB 7.98%
[build] RAM: 202908 B 256 KB 77.40%
[build] SCRATCH_X: 2 KB 4 KB 50.00%
[build] SCRATCH_Y: 0 GB 4 KB 0.00%
So the Remora application is taking up about 157K.
A fixed point implementation of the stepgen would be pretty helpful here as it would avoid software FP on the M0 processors, but that's outside my scope for now. I am pretty happy with the performance of the little RP2040 PRU with the above optimizations. I think there is still work to be done on the component side to tune up the gains, but I am getting pretty good results with a 1ms servo thread on a RPI4.