Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle core0 on esp32 platform #2021

Merged
merged 3 commits into from Mar 2, 2023

Conversation

CapnBry
Copy link
Member

@CapnBry CapnBry commented Dec 28, 2022

Adds a throttle to the main loop() code on ESP32 platform (both TX and RX) to allow core0 to idle and reduce power consumption by roughly 20mA at the 3.3V rail. Did anyone miss my wordy comprehensive PRs?

Details

ExpressLRS runs all cores on all platforms balls-to-the-wall (except when in wifi mode). This prevents the ESP32 from enabling any power save functionality despite the fact that we don't need to run all our tasks at 100000Hz. Each core's power can be reduced by up to 25mA (50mA total) by simply allowing it to idle. This PR reduces the update frequency of the core0 tasks down to a target of either 500Hz or 1000Hz depending on the packet rate, and makes use of the FreeRTOS function xTaskDelayUntil() to idle the remainder of loop duration. Compare this to simply calling delay() every loop, which will always delay and possibly greatly reduce the loop frequency under periods of high load-- delayUntil will not delay if the scheduler is already behind.

I've been experimenting with various power save methods for half the year and this seems to meet my requirements for enough savings with what appears to be absolutely no discernable impact to operation. I've tried downclocking, periodic downclocking, straight delays, single-core task aggregation, undervolting, and even a more complicated timeslice scheduler but all of those had disadvantages. I can't find any subsystem that is degraded by the chosen solution.

A 1000Hz loop rate achieves over 90% of the power savings of a 500Hz loop rate, and slower rates didn't appear to show any more power decrease (tested down to ~91Hz / 11ms delay). Therefore, I chose to use a 1000Hz loop for all 1000Hz modes (F1000, D500, D250) and a 500Hz loop for all other packet rates.

Results

Every ESP32 target I tested showed measurable reduction in power usage. In receivers, there is approximately 20-23mA current reduction (250Hz 10mW) that lowers the load on the LDO, reducing heat generation, and the ESP32 chip itself also runs cooler. The transmitter module current savings are roughly the same, but the amount is less noticeable in the overall handset power usage where 20mA @ 3.3V is only maybe 8mA @ 8V. I found it easiest to test both RX and TX by tying the delay to a switch to be able to turn it on and off (rxtx_common.cpp).

#include "devCRSF.h"
void throttleMainLoop()
{
    if (CRSF_to_BIT(CRSF::ChannelData[AUX1])) // disable if AUX1 high
        return;
...
  • Reduces the power of a True Diversity RX ~20mA from 592mW to 486mW a 21% power reduction (100Hz 10mW Std). Flipping the delay on and off goes from 80-110mA to 100-130mA. We have a weird sinusoidal power draw on the RX that slowly goes up and down over a several minute period which I do not understand but is pre-existing. shrug emoji
  • RadioMaster Zorro total system power consumption reduced by ~11mA, a roughly 4% gain in overall battery life (250Hz 10mW Std). This pushed my modded Zorro with 1400mAh 18350s from 5h32m to 5h55m. That's 190mA @ 8.0V, and F500 is just 139mA @ 8.0V.
  • HappyModel ES900TX reduction of ~20mA @4.8V from 1.824W to 1.726W (200Hz 10mW Std)
  • Jumper T-Lite total system draw reduced from 203mA to 187mA @ 4.0V (250Hz 10mW Std), almost 8% system power decrease. How do they run a whole handset and ExpressLRS TX on 0.748W?! Some magic power BS in that shell.

Tested

  • RadioMaster TX16S internal TX (ESP32)
  • RadioMaster Zorro internal TX (ESP32)
  • Jumper T-Lite internal TX (ESP32)
  • HappyModel ES900TX TX (ESP32)
  • HappyModel EP2 RX (ESP8285)
  • RadioMaster ER5A PWM RX (ESP8285)
  • FrSky R9MM (or is it the R9 mini?) RX (STM32)
  • Undisclosed ESP32 True Diversity RX (ESP32)
  • Undisclosed ESP32 Vario RX (ESP32)
  • Over 25 hours of flight time with PWM fixed wing and Race quads

Future Work

At some point we need to get core1 away from while (1) if (Serial.available()) process(Serial.read()) with a DMA-based solution with an interruptible sleep. This can achieve similar power savings as in the testing I did with a bare ESP32-PICO-D4 dev board, each core that runs at 100% utilization draws roughly 25mA extra power over one that loops at 1000Hz.

@CapnBry
Copy link
Member Author

CapnBry commented Dec 28, 2022

I also tested the Lua to make sure adding a delay didn't have some weird effect on it. Code for profiling:

diff --git a/src/lua/elrsV3.lua b/src/lua/elrsV3.lua
index 649a3e04..343b461f 100644
--- a/src/lua/elrsV3.lua
+++ b/src/lua/elrsV3.lua
@@ -571,11 +571,38 @@ local function refreshNext()
     linkstatTimeout = time + 100
   elseif time > fieldTimeout and fields_count ~= 0 then
     if #loadQ > 0 then
+      -- init stats
+      if lstats == nil then
+        lstats = {}
+        lstats.start = time
+        lstats.cnt = 0
+        lstats.tot = 0
+        lstats.currstart = 0
+        lstats.timeout = 0
+      end
+
+      -- collect stats
+      if fieldTimeout ~= 0 then -- timeout or reload field
+        lstats.timeout = lstats.timeout + 1
+      elseif lstats.currstart ~= 0 then
+        local currtime = time - lstats.currstart
+        lstats.cnt = lstats.cnt + 1
+        lstats.tot = lstats.tot + currtime
+      end
+
+      lstats.currstart = time
       crossfireTelemetryPush(0x2C, { deviceId, handsetId, loadQ[#loadQ], fieldChunk })
       fieldTimeout = time + 50 -- 0.5s
     end
   end
 
+  -- report stats
+  if #loadQ == 0 and lstats ~= nil then
+    print(string.format("LoadStats wall=%d cnt=%d tot=%d (%.1fea) to=%d",
+      time - lstats.start, lstats.cnt, lstats.tot, lstats.tot / lstats.cnt, lstats.timeout))
+    lstats = nil
+  end
+
   if time > titleShowWarnTimeout then
     -- if elrsFlags bit set is bit higher than bit 0 and bit 1, it is warning flags
     titleShowWarn = (elrsFlags > 3 and not titleShowWarn) or nil

The value here I was checking was tot (total load time in 10ms ticks) and ea (load time per chunk). It seems to be about 138ms to 149ms per chunk or 4.5-4.6s for a full load, except for 1000Hz which is always slower. I don't see any difference in the before and after load times. No extra timeouts (to) were observed.

** Before ** 3.75Mbps
333Hz
LoadStats cnt=33 tot=458 (13.9ea) to=0
250Hz
LoadStats cnt=33 tot=456 (13.8ea) to=0
150Hz
LoadStats cnt=33 tot=451 (13.7ea) to=0
100Hz
LoadStats cnt=33 tot=462 (14.0ea) to=0
50Hz
LoadStats cnt=33 tot=454 (13.8ea) to=0

** Before ** 920kbit
100Hz
LoadStats cnt=33 tot=462 (14.0ea) to=0
50Hz
LoadStats cnt=33 tot=465 (14.1ea) to=0
1000Hz
LoadStats cnt=51 tot=747 (14.6ea) to=0

** After ** 3.75Mbit
1000Hz
LoadStats cnt=33 tot=493 (14.9ea) to=2
LoadStats cnt=33 tot=519 (15.7ea) to=4
333Hz
LoadStats cnt=33 tot=465 (14.1ea) to=0
250Hz
LoadStats cnt=33 tot=453 (13.8ea) to=0
100Hz
LoadStats cnt=33 tot=458 (13.9ea) to=0

** After ** 920kbit
250Hz 
LoadStats cnt=33 tot=462 (14.0ea) to=0
1000Hz
LoadStats cnt=51 tot=746 (14.6ea) to=0

@CapnBry CapnBry changed the base branch from 3.x.x-maintenance to master December 31, 2022 14:46
@aowi7280
Copy link

Glad to see someone doing this. My QX7 eats batteries fast since changing over from R9M. Now have a Happymodel es900tx

@CapnBry
Copy link
Member Author

CapnBry commented Jan 19, 2023

nonchalantly clicks the v3.3 tag

@wvarty
Copy link
Collaborator

wvarty commented Mar 2, 2023

Will this potentially slow down / delay things that need to run in loop()? E.g. UART read / writes, etc?

@pkendall64
Copy link
Collaborator

Will this potentially slow down / delay things that need to run in loop()? E.g. UART read / writes, etc?

No, core 1 is the loop core which runs all the essential devices and core 0 is used for all the ancillary devices.

@wvarty
Copy link
Collaborator

wvarty commented Mar 2, 2023

Will this potentially slow down / delay things that need to run in loop()? E.g. UART read / writes, etc?

No, core 1 is the loop core which runs all the essential devices and core 0 is used for all the ancillary devices.

So there are no impacts at all to any code that is in loop() right now, or in future, in terms of delays to execution?
My main concern is that someone adds something to the main loop expecting it to fire off near immediately (ignoring ISR execution times), and it doesn't. You're saying that shouldn't be affected?

@pkendall64
Copy link
Collaborator

So there are no impacts at all to any code that is in loop() right now, or in future, in terms of delays to execution? My main concern is that someone adds something to the main loop expecting it to fire off near immediately (ignoring ISR execution times), and it doesn't. You're saying that shouldn't be affected?

Correct, loop is not affected by the PR. loop runs full steam ahead all the time.

@pkendall64 pkendall64 merged commit 8c7693b into ExpressLRS:master Mar 2, 2023
@CapnBry CapnBry deleted the esp32-core0-throttle branch June 22, 2023 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants