New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disarm failure on HGLRCF722 #11344
Comments
IMHO, this is not, and should never be, expected behavior. A disarm should always be registered and actioned by the firmware. |
Betaflight requires a certain number of disarm packets to enter disarm (4 I think?), so because there was no new data RC received disarm did not happen? |
I believe there were packets sent, looking at the logs. The disarm registered on the graph I pasted in the bug description above, just, it didn't disarm at all. Not sure it's a RX thing. |
At the time of the disarm, throttle had very extended steps, meaning that the RC signals were not being seen to change by the FC. |
@pipei - could you please confirm that your build would have included PR #11319 - you should see CLI parameter If it is, please confirm that the value is the default of 25, and maybe try some cautious test flights with that value reduced from default of 25 to a value of 1, and see if that fixes this problem? Could you also please post a link to your log (eg dropbox or google drive) and perhaps keep an eye on signal strength and link quality in dBM in the OSD? |
https://drive.google.com/file/d/1T_A4dRjF80UKHrA0iGKTbsNt6HbpLLZo/view?usp=sharing That's the blackbox log file of the very short flight. I doubt link quality is an issue as I was standing with a meter between the transmitter and receiver, which is why I could (stupidly) catch it before it flew away. |
It looks like you had a 150hz RC link which then almost totally failed some time soon after arming. Also logging wasn't started until after arming, and it is at 1:1 PID rate, so perhaps that, and the OSD task being initialised with the motors on arming, somehow overloaded the scheduler. In any case, from the moment you provided any input, the RC data was very much delayed, with no change in set point of any kind except in widely spaced steps - up to and longer than 400ms between any sign of data being received. It's not clear why this should have happened. Most likely the disarm failure was because there were not the required number of disarm packets received before hitting the ground and initiating runaway protection. The root cause is whatever caused the Rx link to fail. That is not obvious from the log. |
One possibility is too strong signal. Some receivers have problems with input stage saturation ... |
The RC link did not fail. I can reliably cause the RC link to stutter in the GUI of Betaflight Configurator by loading the CPU. |
Also, note that it took 1.1 seconds for the thing to actually stop the motors. The failsafe is set to 0.4s, which clearly also did not work. |
Also please note that the drone never hit the ground. I had to grab it from the air, as it was doing strange things while going up and down. With effort I flipped it over and then after a while it stopped by itself. |
What I meant by the RC link 'failing' is that the flight controller 'appeared' to only get RC data about every 300ms. Now the receiver may well have been receiving a 150hz signal, but either the link it self failed or the flight controller didn't run the RC task at the normal frequency. The end result was that the flight controller only saw a handful of control values during the 4s from arm to runaway prevention. The link was logged as starting at 150hz. Here I zoom in enough to show every single step change in 'actioned' RC Command. There was none of the usual variability at 150hz; just a handful of steps in the 3s flight. Functionally you had a 2-3hz RC control signal, not slow enough to cause failsafe, but so incredibly slow that it was not possible to control the quad.. I don't know what the timeout is on the count for the three disarm packets, but most likely with the 'link' appearing to be so slow, the FC had difficulty getting them. Note that of the 4s flight time, throttle was only above zero for a total of 2.6ms until you held it. The PID system was working normally, but only had very intermittent RC steps to work with. The image shows the whole flight from arm to runaway triggering was a little under 4s, with throttle above zero for 2.6s. It went up, then it tightly followed a step in yaw lasting 300ms at about 110deg/s as you cut throttle. Then there was a small roll input. It seems to me that it was in level mode. The PID system was working perfectly normally all the time. So whatever the problem was, it affected handling of the RC data, nothing much else. |
That you held it in the air explains why the gyro responses are not exactly typical of a bouncy landing. |
Just a nitpick, you mean 2.6 seconds throttle above zero, not 2.6 ms? Ah, you corrected to 2.6s later. Apologies. |
Note sure if this is related, but it might give some clues:
"I've been experiencing delayed disarm on multiple models, and I'm now able to reproduce the issue consistently. I often hand catch the landing when flying LOS, and the delayed disarm makes my heart skip a beat when it occurs in this scenario, and in any other precision landing scenario as well. I first acknowledged this problem on v2.0. Once I could reproduce it, I tried updating to 2.0.1, and the problem persists. TX is HM Slim Pro with hardware fix. RX is HM EP2 confirmed, and very likely EP1, and PP too. I was using an OTX nightly for a long time, as it had been solid. Attempting to address this issue, I updated to OTX 2.3.14. I thought this fixed the issue, but at that time, I hadn't discovered what actually triggered the state in which the delayed disarm occurs. Indeed the issue persists in latest stable OTX. Radio is X-Lite S with Pro gimbals. To reproduce the issue, establish a connection between tx and aircraft, and then make one or more adjustments to the accelerometer calibration via stick inputs (see image.) Thereafter, disarm will be delayed by like .25-.5 seconds, which is an eternity in precision landing scenarios, and is at least mildly disturbing in any other landing/disarm scenario. I believe I may have experienced this with SPI rx too, but this is harder to reproduce, as acc. adjustment writes to eeprom (?) which currently causes an instant failsafe. Since the connection is unlikely to reestablish in any reasonable amount of time, or perhaps at all, (at least with my current settings,) power cycling the tx is required. I believe this power cycle may be clearing the delayed disarm state, wherever it may reside, since upon reconnection to the spi rx, disarm behavior is as expected." and also, when he posted the log he said: "I took off, then landed/disarmed, and made a single acc adjustment. I then took off and landed/disarmed 2-3 more times. Each time the, the delayed disarm occurred. While capturing the log data, I started noticing/experiencing what might be the same delay affecting all input after triggering the delayed disarm state. Maybe I only noticed at disarm initially, but now appears it may be a general delay of all inputs. I can't be sure, currently. Definitely noticed on disarm though, consistently." https://cdn.discordapp.com/attachments/797109686285107241/927818141436870696/btfl_all.bbl The comment "To reproduce the issue, establish a connection between tx and aircraft, and then make one or more adjustments to the accelerometer calibration via stick inputs (see image.) Thereafter, disarm will be delayed by like .25-.5 seconds" seems related. |
@pjpei have you tried with the master branch if you can reproduce it? |
@pjpei It's default value is 1 on master (on my Matek F411 with CRSF) |
@pjpei can you please try the image from #11340 (comment) |
@haslinghuis See above, I don't have that setting on the release I was on. @SteveCEvans I've looked at github history in an attempt to find the issue, and I'm afraid the non-hard-realtime scheduler is putting me off from trying anything related to the new Betaflight again. Having worked on firmware specifically and software generally for more than 2 decades, having a non-deterministic scheduler is a disaster waiting to happen as it starves processes randomly. I hope you get it right eventually, but I'm out. |
@pjpei Where do you want to go with your comments?First, if you dont have this setting, please try an OFICIAL dev build from master before open an issue. Second, @SteveCEvans test proposal with this image is really important. |
@asizon I have reported a severe bug that caused me literal injury. You can reproduce it by loading the CPU heavily. It is unreasonable to expect me to risk an expensive drone for this, so I believe I'm done here, having given you the information you need. |
@pjpei But this bug is happens to you using non oficial and non latest betaflight build.... |
The only thing I changed on my side was adding a new filter, which is cpu-intensive. Given that the core issue is a scheduler design flaw, I have no confidence in this build of betaflight, and I'm not the only person complaining with disarm issues. I'm not doing this, sorry. I hope you get it working right, but I doubt it. |
If you don't want to use the latest version with a potential fix we can't help you very well. You're using a custom build with outdated firmware... how are we supposed to know what's wrong or if it's fixed already? I tried overloading my CPU with the latest master but I cannot reproduce a flyaway . If you don't want to do it yourself - which is understandable after your unexpected injury - please at least show us your codebase so we can compile an up-to-date version of your code to reproduce the error more easily. Or please just rebase on today's master, include #11340 and provide us with the compiled hex files if you don't want to show your code. |
This is my file change list below the comment. I was previously willing to share, but given the adversity I've experienced here while reporting a critical bug, I'd rather not give my code. Refer to #11338 for someone else also experiencing this issue. I haven't verified the mathematical correctness of my filter yet, the code is in an indeterminate state. I added a setting to be able to choose the filter in "settings.c", added the filter code in "filter.c", added structures in "filter.h" and "gyro.h", then changed "gyro_init.c" to enable the new filter if it's selected in the CLI. The core thing is the filter is slow, so I need to drop the PID frequency, and the CPU stays on 72% use. I'm now disengaging here as it feels like I'm being blamed for this, but others have also experienced disarm issues. Please take ownership of this problem and I recommend fixing the design of the scheduler.
|
@pjpei now given you forked our repo you kinda have to make it public.. if you want help then work with the guys.. other wise the team is in the middle of trying to get 4.3 out the door.... |
Unfortunately I have removed the code from my repo. It was never ready for prime-time, but I accidentally exposed a bug in Betaflight while my experimental (and not usable) filter was still in an incomplete state. No patience for it left, as with Betaflight in general. Please stop tagging me from here on. |
@pjpei I am closing this issue as without access to the source code causing the issue we cannot comment further. You criticise the scheduler and yet you have not even detailed where you call your new filter (is it from the filter task) or how many us it runs for. Given the load impact I suspect that you are stretching the execution time of the filter task excessively. As we have a cooperative multi-tasking scheme your code needs to cooperate and from the little evidence we have, it doesn’t. |
@SteveCEvans Perhaps there is a better approach to this issue - if the code that @pjpei was using highlights an issue with the scheduler then it's probably fairly trivial to just add a 'nop' loop in a gyro task such that the CPU load is similar or higher. We have details of the target and exact config he was using so I feel it should be fairly easy to replicate. @pjpei We're not blaming you for the issue at all, and thank you for your time in reporting the issue. If you still have the firmware in questions, then please give us the output of the 'tasks' command in CLI. i.e. go to cli, run 'tasks', wait 10 seconds, run 'tasks' again, give us both outputs. If we see similar issues in the future we can still reference this discussion. (sorry for tag, last one, feel free to unsubscribe from github notifications too). |
Hopefully my frustration with this firmware will prevent someone from being injured, which is literally the only reason I've made the effort to do this from the old drone binary loaded on it which I'll soon permanently erase from the drone. Thanks for the politeness, @hydra , but I mean it's quite clear that I've been blamed for causing the issue I've had by myself, and it seems that I'm not the only person being disregarded after others have also complained about the same issue, which you can clearly see mentioned by multiple other people. They have also been disregarded, referring to the "unsupported" tag on #11338 for example, and looking at the objections by @etracer65 on the scheduler overhaul. I'm pretty much over this. I wish you all the best with your endeavors, and I now wash my hands of the bug that I've found and duly informed you about.
|
@pjpei , GROW UP! They just asked you to update YOUR FORK which has the issue as there is new RX handling in the current BF Master. Then you went off. (p.s. like what other FW are you going to use? Ridiculous. Be a part of the solution. Either way, this will be looked at without you more closely.) |
While it's unfortunate that the original code related to this issue isn't provided, it should be easy to replicate the problem based on the task timings provided above. Clearly the At this point I don't consider Betaflight to be safe for general use and would not recommend it be released until these issues are fundamentally resolved. Otherwise people are going to get hurt and Betaflight's reputation will be permanently tarnished. |
@etracer yup, I agree and commented as follows in slack on this thread: "IMHO, any form of safety related issues should be treated as a higher priority than anything else. Our hobby doesn't need any more negative attention in the media". and: "it would be good to have tests that ensure a disarm is always processed even under high-load conditions". |
No one is blaming you for this. Thanks for raising the issue. We are trying to go to the bottom of it but we need your help to do it. I agree with @hydra Edit: |
IMO disarm and failsafe should explicitly be guaranteed regardless of changes made to any task. I don't know how this could be enforced with cooperative multitasking, but otherwise there should maybe be some sanity check at compile time. I think it will be very difficult in the long run to enforce implicit assumptions in an open source project involving many different people and changes. At least maybe some document should describe task and scheduler requirements. I don't actually have time to look into this myself, but it would be great to have a discussion about the general priorities and possible solutions in this regard. |
@sugaarK finnaly is here lol |
I had the same non disarm issue one week ago with my 7 inch. Perfect flight, prepared for landing but then the quad wasn't disarming. Burned 2 motors with the propellers stopped between the feet (still waiting for the replacement). Fortunately I had strong shoes and the battery was empty, so no injuries. The setupFC: MAMBA Basic F722 MK3 with Betaflight 4.3 RC2 |
Describe the bug
I tested a few new PID settings with a custom gyro filter I wrote, based on the Betaflight firmware's recent branch. The drone started losing control so I disarmed the drone, but the disarm did not register and the drone kept rising. I have been taking logs, here's a screenshot of them
My CPU load was around 72% through the (short) flight.
To Reproduce
I have not reproduced the issue, and I've flown with these settings before with no issue recently enough on the same firmware. It might be an intermittent-type bug.
Expected behavior
I expect the motors to stop on disarm. They kept spinning, and I caught the drone with my hand, which is not to be recommended.
Flight controller configuration
Flight controller
HGLRCF722
Other components
BetaFPV ELRS Lite Receiver 2.0.1
BetaFPV ELRS Micro TX Module 2.0.1
Radiomaster TX16S Hall running EdgeTX
Analogue VTX
How are the different components wired up
Soldered by hand, I've flown with these in the past. Here's my port settings, I guess I can send pictures? I don't believe there's soldering issues.
Add any other context about the problem that you think might be relevant here
The CPU is loaded at about 72% in flight. This might be part of the problem, but my expectation is that high CPU loads should not cause disarm to fail.
If this is expected behaviour, I recommend a warning on Betaflight if the CPU load is too high to avoid injury and crashes.
The text was updated successfully, but these errors were encountered: