-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commander: cpuResourcesCheck: add 2 seconds hysteresis to trigger failure #22304
Conversation
src/modules/commander/HealthAndArmingChecks/checks/cpuResourceCheck.cpp
Outdated
Show resolved
Hide resolved
Alternatively you could add a load average to |
so e.g. putting a filter in I don't exactly remember the thought process that brought us to hysteresis over a filter before - but in general I don't have strong opinions on this one. Just as long as we're targeting the particular behavior we care about - i.e. that momentary spikes shouldn't be flagged, but consistent high load should. Hysteresis seems the most direct way to do that, as it requires the load to be constantly over the threshold for x time. But if you would also want to flag "mostly" over the thresshold for some time (e.g. with some couple down spikes over that hysteresis period) then a filter would be the way to go. If using the filter, then for sure would want it logged, as I suppose we wont expose the filter gains for reconstruction post processing. |
I'd vote for the filter because of the exact case where you could have constant 100% CPU load with some short downward spikes in between and that would pass the check. |
src/modules/commander/HealthAndArmingChecks/checks/cpuResourceCheck.hpp
Outdated
Show resolved
Hide resolved
src/modules/commander/HealthAndArmingChecks/checks/cpuResourceCheck.cpp
Outdated
Show resolved
Hide resolved
bdec3f6
to
d5b42d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with some short downward spikes in between and that would pass the check
I covered the case where the CPU is high for 2 seconds and then has downwards spikes by making the hysteresis apply both ways.
d5b42d4
to
a7c4e6f
Compare
so this works if the true state was already reached. but not if we have e.g. high (1 sec), single low. high (1 sec). Probably we'd still want to flag this 2 second portion, but in the current case it will reset the timer after the single low. if we're worried about this being the false negative report.. then probably as @MaEtUgR said the filter is the better option. |
Thanks for the clean up @MaEtUgR. To me this symmetrical 2s hysteresis seems to be a good compromise. For further refinement of the check I would try to pull in the learnings of real-world failure cases. |
src/modules/commander/HealthAndArmingChecks/checks/cpuResourceCheck.cpp
Outdated
Show resolved
Hide resolved
…lure Signed-off-by: Silvan Fuhrer <silvan@auterion.com>
Signed-off-by: Silvan Fuhrer <silvan@auterion.com>
a7c4e6f
to
0713847
Compare
Solved Problem
Momentary CPU spikes are not an issue and should not prevent arming or alarm the operator.
Solution
This PR adds a 2 second hysteresis to the high CPU load preflight check.
Changelog Entry
For release notes:
Alternatives
Have high_cpu_load_start_time_us_ as as class member instead of a static variable?
Testing
Tested on a PH4, with COM_CPU_MAX set to 60 and then doing some param changes. On main this then throws warnings, with this PR not anymore.