boards/raspberrypi-4b: Implement SMP support#18861
Conversation
|
Thank you @linguini1 will take a look and test on hardware in few hours :-) I also found and ordered ESP32-P4-Nano board for ~25EUR so we will be able to test this hw too :-) |
|
Awesome news! Thank you! |
|
Okay here goes the testing :-) All seems to work good :-) Great work @linguini1 =) We may want to put some more benchmarks into this config to see what is the performance gain with smp? :-) I need to get some sleep, tomorrow will try to read more about this |
|
In this approach looks like all cores will execute NuttX in parallel? for (uint8_t cpu = 0; cpu < CONFIG_SMP_NCPUS; cpu++)
{
putreg64((uint64_t)_start, BCM_SPINTBL_CPU(cpu));
}Shouldn't CPU0 launch the OS/RTOS and other CPUs stay idle and wait for tasking? :-) reading more.. will update here with more details :-) |
|
Pretty much what happens @cederom . However, CPU0 does all the initial boot and the remaining 3 cores then run in parallel after the early init function is called. This is why we see the garbled console output right at the beginning of the console launch. From reading the code, my understanding is that the remaining 3 CPUs will hit a point in the init process where (based on their core number) run the idle thread instead of the actual OS. Only CPU0 continues into the application. But, this is why I'm wondering if it's better to make the other cores jump to the idle thread starting point instead, or where exactly they should jump to when booting (as opposed to |
|
Never implemented SMP on NuttX so we both learn here :D Maybe What I found in the docs:
This wiki document https://cwiki.apache.org/confluence/display/NUTTX/SMP mentions SMP implementation pretty well:
Let's take a look at other architectures implementation :-)
Hmm, looks like smp related code is somewhere else, let's take a look at arch/arm64/src/common: There is is, let's take a look :-) nuttx/arch/arm64/src/common/Make.defs Lines 98 to 101 in b17e448 If we look at nuttx/arch/arm64/src/common/arm64_cpustart.c Lines 161 to 239 in b17e448 and search who is using
Okay so long story short it seems that SMP is done somewhere around If that does not work out-of-the-box then probably we need some custom pieces in Also Does that make sense? :-P |
|
Makes sense! I did try just letting the existing arm64 logic handle it, but it didn't work properly. I'll take a look at the rest of the implementations you linked and try to follow those instead, with the scheduling profiler enabled to see what's going on. |
hartmannathan
left a comment
There was a problem hiding this comment.
Thanks for working on this. I can't wait to try it!
|
Some hints also here https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Initialization+Sequence :-) |
That's funny, I don't see that in the Documentation in the repository. Was this one never migrated from the CWIKI to Documentation? (It also contains a little bit of obsolete info, like board_app_initialize() which was recently replaced with board_late_initialize().) |
|
Yeah we need to just copy paste missing stuff and then update whole thing.. in a "free moment".. but you know.. new board.. some project.. etc etc.. there not much "free moments" when you get old :-P |
|
Okay after reading through the examples, it seems the arm64 architecture makes the assumption that all arm64 chips can have their cores started the same way (done via My solution at the moment is just to add the spintable modification step in #ifdef CONFIG_ARCH_CHIP_BCM2711
/* According to the bcm2711.dtsi file in the Linux source tree [1], the
* CPUs on the BCM2711 use a spin-table enable method and poll the
* following addresses:
*
* CPU0: 0x000000d8 (BCM_MBOX_CLR06)
* CPU1: 0x000000e0 (BCM_MBOX_CLR08)
* CPU2: 0x000000e8 (BCM_MBOX_CLR10)
* CPU3: 0x000000f0 (BCM_MBOX_CLR12)
*
* Some kernel docs about booting [2] have a handy explanation of how this
* works:
*
* "polling their cpu-release-addr location, which must be contained in the
* reserved region ... when a read of the location pointed to by the
* cpu-release-addr returns a non-zero value, the CPU must jump to this
* value" [2]
*
* In our case, we want these CPUs to load the NuttX kernel defined by
* `_start`. We don't need to worry about CPU0, that one always starts.
* These are 64-bit words (hence skipping every second register).
*
* [1] https://github.com/raspberrypi/linux/blob/rpi-6.12.y/
* arch/arm/boot/dts/broadcom/bcm2711.dtsi
*
* [2] https://www.kernel.org/doc/Documentation/arm64/booting.txt
*/
putreg64((uint64_t)__start, BCM_SPINTBL_CPU(cpu_num));
#endifwhich should hopefully do all the CPU startup at the right time and it will use the |
|
So for some reason, the approach of putting the spin table initialization in the |
9eaff8b to
bbfd632
Compare
|
After further experimenting, it appears that the spin table has to be dealt with early in the boot process or the CPUs do not start properly. I've put the logic in a |
|
need fix the ci error: |
Adds SMP support and an `smp` configuration for the RPi4B. Signed-off-by: Matteo Golin <matteo.golin@gmail.com>
Documents the new SMP configuration. Signed-off-by: Matteo Golin <matteo.golin@gmail.com>
Summary
Implements SMP support for the Raspberry Pi 4B on all four cores using the SMP
spin tables on the BCM2711.
Impact
Users can leverage all four cores on NuttX now! 4x more powerful.
Closes #16954.
Part of GSoC #18507!
Testing
SMP test: