Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add main loop load average #4431

Merged
merged 2 commits into from
Nov 24, 2018
Merged

Add main loop load average #4431

merged 2 commits into from
Nov 24, 2018

Conversation

andrethomas
Copy link
Contributor

Add main loop avarage duty cycle measured against setoption36 value to telemetry data as LoadAvg

andrethomas and others added 2 commits November 24, 2018 18:12
Add main loop avarage duty cycle measured against setoption36 value to telemetry data as LoadAvg
@arendst arendst merged commit b061f8d into arendst:development Nov 24, 2018
@andrethomas andrethomas deleted the patch-1 branch November 24, 2018 16:17
arendst added a commit that referenced this pull request Nov 24, 2018
Add CPU average load to state message (#4431)
@synekvl
Copy link
Contributor

synekvl commented Nov 25, 2018

Assume value should be in percents nevermind I get with some devices LoadAvg above 800 (as 849, 851, 829....) It is with Wemos D1, Wemos D1mini and Sonoff Touch. Another are below "100"

@andrethomas
Copy link
Contributor Author

Hi, no it is not a % value out of 100 - it is an indicative load on the main loop measured against the millisecond target loop delay set using setoption36... so it can be higher than 100 but ideally to give enough time to the SDK for wifi functions etc you should adjust setoption36's value to achieve a loadavg of less than 100 over an extended period of time... i.e. lets say over a teleperiod of 10 seconds.

So you will find with the same setting (say 50, which it defaults to) is sufficient for most devices but when you start adding a lot of sensors and actively using the webui you will definitely see loadavg's above 100 and either this is OK with you from a power management and wifi stability perspective, or you have the option to deprioritise driver polling by increasing the setoption36 value.

@Jason2866 also made the observation last night on discord chat that not all ESP chips are made equal - For example, he found that the load average is higher on a sonoff basic R2 than it is on a wemos d1 mini - and then I also have a sonoff R1 which basically behaves the same as my nodemcu v2 boards.

It is still in open development but I'll wiki this feature in more detail when Theo makes a release.

@synekvl
Copy link
Contributor

synekvl commented Nov 25, 2018

@andrethomas Thank you for your explanation. Frankly it is difficult for me to understand this figure, nevermind two devices, in my case, having this value 800+ have sensors attached - D1mini has BME280 and WeD1 has Si7021. The third above 800 - Sonoff Touch (1T) has nothing attached and has LoadAvg=820...

So they are my values and I will try to understand meaning of SetOption36 which has not been changed yet.

Cheers
Vladimir

@andrethomas
Copy link
Contributor Author

andrethomas commented Nov 25, 2018

On SetOption36 of 50, if you are getting a loadavg of 820 it could be calculated that the main processing loop is taking approximately 410ms to complete which is much longer than expected - so it means there is some iteration of a driver that is consuming time in the main loop... could be indicative of a malfunction or perhaps one of the drivers in use requires optimisation. To give an indication, most sonoff basics would have a load avg of around 40 on SetOption36 = 50. So although not a direct indication that something is wrong it does seem to lend to the idea that something is not happening as fast as it would normally be expected to happen.

It is not clear from the datasheet how the ESP chip responds to thermal dynamics inside a closed housing with insufficient cooling - does it auto scale down the clock speed or not? This cannot be confirmed from the datasheet.

The general observation that we've made is that if this is way in excess of 100 you may experience secondary problems such as mqtt reconnects and/or wifi reconnects. SetOption36 therefore allows you to adjust the target loop speed depending on what the device itself is used for while ensuring that the SDK core has sufficient cpu time to process wifi and related network processes.

@andrethomas
Copy link
Contributor Author

Maybe this will clarify it better.

I configured a NoceMCU V2 board with http://thehackbox.org/tasmota/sonoff.bin and only configured WiFi and MQTT settings and set the device as Generic module.

SetOption36 = 50 (The default)
Before configuring I2C, I get a 6 sample load average of 38
After just selecting I2C pins in GPIO (No device connected yet), I get a 6 sample load average of 48
Connecting an LM75AD temperature sensor to the I2C pins does not change - stays at 48
Now for the interesting part - I navigate to the main web ui page where it displays the temperature reading on a 2-second interval (iirc)
For this, I now see a 6 sample load average of 78 :)

So this does compute - the more work the device is doing the higher the load average will be because it spends more time per loop cycle... because the indicated load average is still below 100 (if you want to view it as a % of time spent compared to the setoption36 value, even though this is not the intended purpose) so it means my main loop is on average still completing well under the targetted 50ms and provides a fair margin for the SDK to do its background work.

For reference, when I say 6 sample average I mean that I set the teleperiod = 10 seconds, and took the 6 values provided over that 1 minute period and averaged it out to get the above 6 sample averages.

@andrethomas
Copy link
Contributor Author

@synekvl Can you confirm that your sleep = 0 on the devices with high load averages? Having sleep > 0 will cause the main loop to be exponentially longer/slower so you will get a higher loadavg for the main loop as it is time based.

@synekvl
Copy link
Contributor

synekvl commented Nov 26, 2018

@andrethomas Hi, I was quite busy today, so I could check that parameter just now. And ?? Tradaaaaaa !!! You were right as far as these three devices had the Sleep parameter set to value "20". When I set it to '0', now D1 mini and Wemos D1 LoadAvg dropped down below '50', and Sonoff Touch T1 to even '27'...

So your idea was 100% correct. Thank you for your advices and discoveries, how it works and how the parametres are connected.

Cheers
Vladimir

@andrethomas
Copy link
Contributor Author

@synekvl You're welcome. We made some more changes so that it behaves more like a % indicator now.

With SetOption39 set there is no need for sleep setting anymore so there's new firmware from dev binaries to which you can update to get a more % like result.

See wiki here: https://github.com/arendst/Sonoff-Tasmota/wiki/SetOption36

gemu2015 pushed a commit to gemu2015/Sonoff-Tasmota that referenced this pull request Jan 27, 2019
gemu2015 pushed a commit to gemu2015/Sonoff-Tasmota that referenced this pull request Jan 27, 2019
Add CPU average load to state message (arendst#4431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants