Skip to content

fix(rgbled): double-buffer WS2812 strip to eliminate races#7304

Merged
pfeerick merged 2 commits intomainfrom
ws2812-double-buffering
Apr 29, 2026
Merged

fix(rgbled): double-buffer WS2812 strip to eliminate races#7304
pfeerick merged 2 commits intomainfrom
ws2812-double-buffering

Conversation

@raphaelcoeffic
Copy link
Copy Markdown
Member

@raphaelcoeffic raphaelcoeffic commented Apr 19, 2026

The WS2812 LED strip was driven from two contexts without any serialisation: the FreeRTOS refresh timer (50ms) and Lua's applyRGBLedColors. Both called ws2812_update directly, racing on the DMA arming sequence, and the DMA ISR read the same _led_colors buffer that callers were mid-writing — producing torn pixels and partial-frame flicker.

Switch to a lock-free single-driver design:

  • Board (rgb_leds.cpp) now owns a back buffer (writes) and a front buffer (DMA). Setters touch only the back buffer; the timer task is the sole arm point for ws2812_update at runtime.
  • A seqlock-style counter (_commit_seq) publishes batches: setters mark the LSB to flag mid-batch; rgbLedColorApply rounds up to the next even value to publish; the timer copies back→front only when the seq is even, differs from the last consumed value, and DMA is idle. Compiler barriers prevent reordering of plain back-buffer writes around the volatile seq updates.
  • Pre-OS callers (charging UI) bypass the timer: rgbLedColorApply detects scheduler_is_running() == false and flushes synchronously.

To make the board the sole entry point, the no-buf ws2812_set_color / get_color / get_state are replaced by buffer-parameterised variants; all direct callers in horus/taranis/pl18/pa01/rm-h750/jumper-h750 boards and the generic_stm32 fsLedRGB now go through rgbSetLedColor. fsLedRGB and pa01's status-LED helpers gain explicit rgbLedColorApply calls to preserve their fire-and-forget semantics.

pa01's separate _led_charge_colors buffer and ad-hoc ws2812_init are removed; rgbLedHwInit is exposed so the charging UI can bring up the WS2812 hardware before the scheduler is running while still sharing the rgb_leds.cpp front buffer.

V12 / V14 / V16 6POS LED fix

On radios that define SIXPOS_SWITCH_INDEX, getSixPosAnalogValue() ran in the mixer task and wrote directly to the WS2812 back buffer on every switch change. That made the mixer a second writer racing with Lua's setRGBLedColor / applyRGBLedColors batches — the seqlock above only serialises publication, not concurrent back-buffer writes.

The function is split:

  • The sticky position tracker stays in the ADC hot path (a single volatile uint8_t write, read elsewhere as an atomic byte).
  • The LED paint moves into a new weak rgbLedOnUpdate() hook that the rgb_leds.cpp refresh timer invokes before publishing. The 6POS LEDs (indices 0..5) now have the LED timer task as their sole writer, while Lua addresses map through BLING_LED_STRIP_START=6 onto the remaining range — so the two writable ranges are disjoint and cannot race.

The WS2812 LED strip was driven from two contexts without any
serialisation: the FreeRTOS refresh timer (50ms) and Lua's
applyRGBLedColors. Both called ws2812_update directly, racing on the
DMA arming sequence, and the DMA ISR read the same _led_colors buffer
that callers were mid-writing — producing torn pixels and partial-frame
flicker.

Switch to a lock-free single-driver design:

- Board (rgb_leds.cpp) now owns a back buffer (writes) and a front
  buffer (DMA). Setters touch only the back buffer; the timer task is
  the sole arm point for ws2812_update at runtime.
- A seqlock-style counter (_commit_seq) publishes batches: setters
  mark the LSB to flag mid-batch; rgbLedColorApply rounds up to the
  next even value to publish; the timer copies back→front only when
  the seq is even, differs from the last consumed value, and DMA is
  idle. Compiler barriers prevent reordering of plain back-buffer
  writes around the volatile seq updates.
- Pre-OS callers (charging UI) bypass the timer: rgbLedColorApply
  detects scheduler_is_running() == false and flushes synchronously.

To make the board the sole entry point, the no-buf ws2812_set_color /
get_color / get_state are replaced by buffer-parameterised variants;
all direct callers in horus/taranis/pl18/pa01/rm-h750/jumper-h750
boards and the generic_stm32 fsLedRGB now go through rgbSetLedColor.
fsLedRGB and pa01's status-LED helpers gain explicit rgbLedColorApply
calls to preserve their fire-and-forget semantics.

pa01's separate _led_charge_colors buffer and ad-hoc ws2812_init are
removed; rgbLedHwInit is exposed so the charging UI can bring up the
WS2812 hardware before the scheduler is running while still sharing
the rgb_leds.cpp front buffer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@philmoz
Copy link
Copy Markdown
Collaborator

philmoz commented Apr 19, 2026

This may still have issues on V14 & V16 radios which define SIXPOS_SWITCH_INDEX and also have bling LED's.

The LED's for the 6POS switch are set in the getSixPosAnalogValue function which I think gets called from the mixer.

For scripts that use a two stage update pattern (turn off all, turn on those needed, call applyRGBLedColors) it can still cause LED flickering if the mixer interrupts at the wrong time,

getSixPosAnalogValue() ran in the mixer task and wrote directly to the
WS2812 back buffer. On V12/V14/V16 that made the mixer a second writer
racing with Lua's setRGBLedColor / applyRGBLedColors batches — the
double-buffer seqlock only serialises publication, not concurrent
back-buffer writes.

Split the function: keep the sticky position tracker in the ADC hot
path (a single volatile uint8_t write, read by other tasks as an
atomic byte), and move the LED paint into a new weak rgbLedOnUpdate()
hook called by the rgb_leds.cpp refresh timer. The 6POS LEDs (indices
0..5) now have the LED timer task as their sole writer, while Lua
addresses map through BLING_LED_STRIP_START=6 onto the remaining
range — so the two writable ranges are disjoint and cannot race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@raphaelcoeffic
Copy link
Copy Markdown
Member Author

raphaelcoeffic commented Apr 20, 2026

The LED's for the 6POS switch are set in the getSixPosAnalogValue function which I think gets called from the mixer.

Right, the mixer has no business touching WS2812 buffers. 793e0de should solve this, though I cannot test if it even works since I don't have any V14 or V16.

@raphaelcoeffic
Copy link
Copy Markdown
Member Author

raphaelcoeffic commented Apr 20, 2026

Follow-up idea (not part of this PR): the sticky 6POS state update could move out of getAnalogValue entirely and into the board-level adc_wait_completion callback, matching the pattern #7263 established for the Flysky gimbal sync.

The tracker would rewrite adcValues[SIXPOS_SWITCH_INDEX] in place with the scaled value once per ADC conversion, and getAnalogValue would drop its #if defined(SIXPOS_SWITCH_INDEX) branch and become a plain array read.

This ties the state update to actual ADC events instead of to arbitrary callers (cli, diaganas, etc. currently re-trigger the tracker on every read). Keeping it out of this PR since it's orthogonal to the race fix and touches the public ADC read path.

See #7309

@gismo2004
Copy link
Copy Markdown
Contributor

gismo2004 commented Apr 20, 2026

@raphaelcoeffic i have tested this on my V16 and it seems to work as before! However, I have something that is reproducible. If i press e.g. button 4 multiple times (the same using 6), it sometimes happens that it is showing 1 as active. This seems to be a bad reading from ADC? Maybe because its measuring the "transition" while a cap is loaded somewhere?
Video

edit: sometimes it is also 2 instead of 1 but it is always lower than the button pressed (it is never 6 when pressing 3)
edit-edit: FYI: the artifacts link below leads to a 404 (link)

@3djc 3djc added the backport/2.12 To be backported to a 2.12 release also. label Apr 20, 2026
@gismo2004
Copy link
Copy Markdown
Contributor

With #7309 on top, I am not able to reproduce the issue from above... looks good!

@raphaelcoeffic
Copy link
Copy Markdown
Member Author

With #7309 on top, I am not able to reproduce the issue from above... looks good!

Ok, so we need both PRs cherry-picked to 2.12.

@pfeerick pfeerick added this to the 2.12.1 milestone Apr 28, 2026
@pfeerick pfeerick added the bug 🪲 Something isn't working label Apr 28, 2026
@helloradiosky
Copy link
Copy Markdown
Contributor

The LED's for the 6POS switch are set in the getSixPosAnalogValue function which I think gets called from the mixer.

Right, the mixer has no business touching WS2812 buffers. 793e0de should solve this, though I cannot test if it even works since I don't have any V14 or V16.

@raphaelcoeffic :Thank you so much for helping me solve these problems with this device. Could you please send me your address so I can send you some samples for testing? info@helloradiosky.com

@helloradiosky
Copy link
Copy Markdown
Contributor

@raphaelcoeffic i have tested this on my V16 and it seems to work as before! However, I have something that is reproducible. If i press e.g. button 4 multiple times (the same using 6), it sometimes happens that it is showing 1 as active. This seems to be a bad reading from ADC? Maybe because its measuring the "transition" while a cap is loaded somewhere? Video

edit: sometimes it is also 2 instead of 1 but it is always lower than the button pressed (it is never 6 when pressing 3) edit-edit: FYI: the artifacts link below leads to a 404 (link)

I've also encountered this issue. It requires using getSixPosAnalogValue. Adding a little debouncing when reading the value can stabilize it. The key is that the values ​​read on the first and second reads should be similar to indicate a correct button press; otherwise, it's very easy to make mistakes.

@pfeerick
Copy link
Copy Markdown
Member

This seems to be working ok on V16 and TX16SMK3 was fine.

@pfeerick pfeerick merged commit ac4c4d7 into main Apr 29, 2026
37 checks passed
@pfeerick pfeerick deleted the ws2812-double-buffering branch April 29, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/2.12 To be backported to a 2.12 release also. bug 🪲 Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants