Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disarm failure on HGLRCF722 #11344

Closed
pjpei opened this issue Jan 25, 2022 · 40 comments
Closed

Disarm failure on HGLRCF722 #11344

pjpei opened this issue Jan 25, 2022 · 40 comments

Comments

@pjpei
Copy link

pjpei commented Jan 25, 2022

Describe the bug

I tested a few new PID settings with a custom gyro filter I wrote, based on the Betaflight firmware's recent branch. The drone started losing control so I disarmed the drone, but the disarm did not register and the drone kept rising. I have been taking logs, here's a screenshot of them

image

My CPU load was around 72% through the (short) flight.

To Reproduce

I have not reproduced the issue, and I've flown with these settings before with no issue recently enough on the same firmware. It might be an intermittent-type bug.

Expected behavior

I expect the motors to stop on disarm. They kept spinning, and I caught the drone with my hand, which is not to be recommended.

Flight controller configuration

# version
# Betaflight / STM32F7X2 (S7X2) 4.3.0 Jan 24 2022 / 12:30:31 (norevision) MSP API: 1.44
# config: manufacturer_id: HGLR, board_name: HGLRCF722, version: 836acaf9, date: 2021-10-31T19:48:38Z

# start the command batch
batch start

board_name HGLRCF722
manufacturer_id HGLR

# name: BZZZZ

# feature
feature -RX_PARALLEL_PWM
feature RX_SERIAL
feature GPS
feature TELEMETRY
feature OSD

# serial
serial 1 2 115200 57600 0 115200
serial 2 2048 115200 57600 0 115200
serial 5 1024 115200 57600 0 115200

# beacon
beacon RX_LOST
beacon RX_SET

# aux
aux 0 0 0 1900 2100 0 0
aux 1 1 3 900 1100 0 0
aux 2 2 3 1175 1350 0 0
aux 3 13 3 1900 2100 0 0
aux 4 13 6 1300 1700 0 0
aux 5 26 7 1300 1700 0 0
aux 6 31 7 1875 2100 0 0
aux 7 35 3 1900 2100 0 0
aux 8 36 1 1900 2100 0 0
aux 9 39 3 1650 1800 0 0

# vtxtable
vtxtable bands 5
vtxtable channels 8
vtxtable band 1 BOSCAM_A A FACTORY 5865 5845 5825 5805 5785 5765 5745 5725
vtxtable band 2 BOSCAM_B B FACTORY 5733 5752 5771 5790 5809 5828 5847 5866
vtxtable band 3 BOSCAM_E E FACTORY 5705 5685 5665    0 5885 5905 5925    0
vtxtable band 4 FATSHARK F FACTORY 5740 5760 5780 5800 5820 5840 5860 5880
vtxtable band 5 RACEBAND R FACTORY 5658 5695 5732 5769 5806 5843 5880 5917
vtxtable powerlevels 4
vtxtable powervalues 14 20 23 25
vtxtable powerlabels 25 100 200 MAX

# master
set gyro_lpf1_static_hz = 0
set gyro_lpf2_type = LULU
set gyro_lpf2_static_hz = 700
set dyn_notch_count = 0
set dyn_notch_max_hz = 1000
set gyro_lpf1_dyn_min_hz = 0
set gyro_lpf1_dyn_max_hz = 700
set acc_calibration = 30,-8,-40,1
set mag_hardware = NONE
set baro_bustype = SPI
set baro_spi_device = 1
set baro_i2c_device = 0
set fpv_mix_degrees = 35
set serialrx_provider = CRSF
set blackbox_sample_rate = 1/1
set min_throttle = 1070
set dshot_idle_value = 750
set dshot_bidir = ON
set use_unsynced_pwm = OFF
set motor_pwm_rate = 480
set failsafe_throttle = 1050
set failsafe_procedure = AUTO-LAND
set yaw_motors_reversed = ON
set gps_provider = UBLOX
set gps_sbas_mode = AUTO
set gps_ublox_use_galileo = ON
set pid_process_denom = 2
set simplified_gyro_filter_multiplier = 140
set osd_warn_link_quality = ON
set osd_warn_over_cap = ON
set osd_cap_alarm = 1550
set osd_vbat_pos = 2104
set osd_ah_sbar_pos = 2254
set osd_ah_pos = 2126
set osd_mah_drawn_pos = 2113
set osd_craft_name_pos = 2049
set osd_gps_speed_pos = 2168
set osd_gps_lon_pos = 2442
set osd_gps_lat_pos = 2409
set osd_home_dir_pos = 2145
set osd_home_dist_pos = 2146
set osd_altitude_pos = 2081
set osd_log_status_pos = 2136
set osd_stat_battery = ON
set debug_mode = GYRO_RAW
set vtx_band = 5
set vtx_channel = 4
set vtx_power = 4
set vtx_freq = 5769
set vcd_video_system = NTSC
set name = BZZZZ

profile 1

# profile 1
set dterm_lpf1_dyn_min_hz = 112
set dterm_lpf1_dyn_max_hz = 225
set dterm_lpf1_static_hz = 112
set dterm_lpf2_static_hz = 225
set vbat_sag_compensation = 100
set iterm_relax_cutoff = 25
set yaw_lowpass_hz = 130
set p_pitch = 54
set i_pitch = 96
set d_pitch = 52
set f_pitch = 143
set p_roll = 51
set i_roll = 92
set d_roll = 46
set f_roll = 138
set p_yaw = 51
set i_yaw = 92
set f_yaw = 138
set d_min_roll = 34
set d_min_pitch = 39
set simplified_master_multiplier = 115
set simplified_dterm_filter_multiplier = 150

rateprofile 2

# end the command batch
batch end
# resource show all
Currently active IO resource assignments:
(reboot to update)
--------------------
A00: FREE
A01: FREE
A02: SERIAL_TX 2
A03: SERIAL_RX 2
A04: BARO_CS
A05: SPI_SCK 1
A06: SPI_MISO 1
A07: SPI_MOSI 1
A08: FREE
A09: SERIAL_TX 1
A10: SERIAL_RX 1
A11: USB
A12: USB
A13: LED 2
A14: LED 1
A15: FREE
B00: MOTOR 3
B01: MOTOR 4
B02: GYRO_CS 1
B03: FREE
B04: MOTOR 1
B05: MOTOR 2
B06: FREE
B07: FREE
B08: I2C_SCL 1
B09: I2C_SDA 1
B10: SERIAL_TX 3
B11: FREE
B12: OSD_CS
B13: SPI_SCK 2
B14: SPI_MISO 2
B15: SPI_MOSI 2
C00: FREE
C01: ADC_CURR
C02: ADC_BATT
C03: FREE
C04: GYRO_EXTI
C05: FREE
C06: FREE
C07: FREE
C08: PINIO 1
C09: PINIO 2
C10: SPI_SCK 3
C11: SPI_MISO 3
C12: SPI_MOSI 3
C13: BEEPER
C14: FREE
C15: FREE
D00: FREE
D01: FREE
D02: FLASH_CS
D03: FREE
D04: FREE
D05: FREE
D06: FREE
D07: FREE
D08: FREE
D09: FREE
D10: FREE
D11: FREE
D12: FREE
D13: FREE
D14: FREE
D15: FREE
E00: FREE
E01: FREE
E02: FREE
E03: FREE
E04: FREE
E05: FREE
E06: FREE
E07: FREE
E08: FREE
E09: FREE
E10: FREE
E11: FREE
E12: FREE
E13: FREE
E14: FREE
E15: FREE
F00: FREE
F01: FREE
F02: FREE
F03: FREE
F04: FREE
F05: FREE
F06: FREE
F07: FREE
F08: FREE
F09: FREE
F10: FREE
F11: FREE
F12: FREE
F13: FREE
F14: FREE
F15: FREE

Currently active Timers:
-----------------------
TIM1: FREE
TIM2: FREE
TIM3: FREE
TIM4: FREE
TIM5: FREE
TIM6: FREE
TIM7: FREE
TIM8:
    CH1 : DSHOT_BITBANG 2
TIM9: FREE
TIM10: FREE
TIM11: FREE
TIM12: FREE
TIM13: FREE
TIM14: FREE

Currently active DMA:
--------------------
DMA1 Stream 0: SPI_MISO 3
DMA1 Stream 1: FREE
DMA1 Stream 2: FREE
DMA1 Stream 3: SPI_MISO 2
DMA1 Stream 4: SPI_MOSI 2
DMA1 Stream 5: SPI_MOSI 3
DMA1 Stream 6: FREE
DMA1 Stream 7: FREE
DMA2 Stream 0: SPI_MISO 1
DMA2 Stream 1: FREE
DMA2 Stream 2: DSHOT_BITBANG 2
DMA2 Stream 3: SPI_MOSI 1
DMA2 Stream 4: ADC
DMA2 Stream 5: FREE
DMA2 Stream 6: FREE
DMA2 Stream 7: FREE

Flight controller

HGLRCF722

Other components

BetaFPV ELRS Lite Receiver 2.0.1
BetaFPV ELRS Micro TX Module 2.0.1
Radiomaster TX16S Hall running EdgeTX
Analogue VTX

How are the different components wired up

Soldered by hand, I've flown with these in the past. Here's my port settings, I guess I can send pictures? I don't believe there's soldering issues.

Screenshot from 2022-01-25 19-52-31

Add any other context about the problem that you think might be relevant here

The CPU is loaded at about 72% in flight. This might be part of the problem, but my expectation is that high CPU loads should not cause disarm to fail.

If this is expected behaviour, I recommend a warning on Betaflight if the CPU load is too high to avoid injury and crashes.

@hydra
Copy link
Contributor

hydra commented Jan 26, 2022

The CPU is loaded at about 72% in flight. This might be part of the problem, but my expectation is that high CPU loads should not cause disarm to fail.

If this is expected behaviour, I recommend a warning on Betaflight if the CPU load is too high to avoid injury and crashes.

IMHO, this is not, and should never be, expected behavior. A disarm should always be registered and actioned by the firmware.

@AlessandroAU
Copy link
Contributor

Betaflight requires a certain number of disarm packets to enter disarm (4 I think?), so because there was no new data RC received disarm did not happen?

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

I believe there were packets sent, looking at the logs. The disarm registered on the graph I pasted in the bug description above, just, it didn't disarm at all. Not sure it's a RX thing.

@ctzsnooze
Copy link
Member

At the time of the disarm, throttle had very extended steps, meaning that the RC signals were not being seen to change by the FC.

@ctzsnooze
Copy link
Member

ctzsnooze commented Jan 26, 2022

@pipei - could you please confirm that your build would have included PR #11319 - you should see CLI parameter scheduler_relax_rx if it is included.

If it is, please confirm that the value is the default of 25, and maybe try some cautious test flights with that value reduced from default of 25 to a value of 1, and see if that fixes this problem?

Could you also please post a link to your log (eg dropbox or google drive) and perhaps keep an eye on signal strength and link quality in dBM in the OSD?

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

image
I don't think the scheduler_relax_rx is included here. Linking log in next message....

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

https://drive.google.com/file/d/1T_A4dRjF80UKHrA0iGKTbsNt6HbpLLZo/view?usp=sharing

That's the blackbox log file of the very short flight. I doubt link quality is an issue as I was standing with a meter between the transmitter and receiver, which is why I could (stupidly) catch it before it flew away.

@ctzsnooze
Copy link
Member

It looks like you had a 150hz RC link which then almost totally failed some time soon after arming.

Also logging wasn't started until after arming, and it is at 1:1 PID rate, so perhaps that, and the OSD task being initialised with the motors on arming, somehow overloaded the scheduler.

In any case, from the moment you provided any input, the RC data was very much delayed, with no change in set point of any kind except in widely spaced steps - up to and longer than 400ms between any sign of data being received.

It's not clear why this should have happened.

Most likely the disarm failure was because there were not the required number of disarm packets received before hitting the ground and initiating runaway protection.

The root cause is whatever caused the Rx link to fail. That is not obvious from the log.

Screen Shot 2022-01-27 at 00 59 51

@etracer65
Copy link
Member

#11228 (comment)

@ledvinap
Copy link
Contributor

One possibility is too strong signal. Some receivers have problems with input stage saturation ...

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

It looks like you had a 150hz RC link which then almost totally failed some time soon after arming.

Also logging wasn't started until after arming, and it is at 1:1 PID rate, so perhaps that, and the OSD task being initialised with the motors on arming, somehow overloaded the scheduler.

In any case, from the moment you provided any input, the RC data was very much delayed, with no change in set point of any kind except in widely spaced steps - up to and longer than 400ms between any sign of data being received.

It's not clear why this should have happened.

Most likely the disarm failure was because there were not the required number of disarm packets received before hitting the ground and initiating runaway protection.

The root cause is whatever caused the Rx link to fail. That is not obvious from the log.

Screen Shot 2022-01-27 at 00 59 51

The RC link did not fail. I can reliably cause the RC link to stutter in the GUI of Betaflight Configurator by loading the CPU.

image

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

Also, note that it took 1.1 seconds for the thing to actually stop the motors. The failsafe is set to 0.4s, which clearly also did not work.

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

Also please note that the drone never hit the ground. I had to grab it from the air, as it was doing strange things while going up and down. With effort I flipped it over and then after a while it stopped by itself.

@ctzsnooze
Copy link
Member

ctzsnooze commented Jan 26, 2022

What I meant by the RC link 'failing' is that the flight controller 'appeared' to only get RC data about every 300ms. Now the receiver may well have been receiving a 150hz signal, but either the link it self failed or the flight controller didn't run the RC task at the normal frequency. The end result was that the flight controller only saw a handful of control values during the 4s from arm to runaway prevention.

The link was logged as starting at 150hz.

Here I zoom in enough to show every single step change in 'actioned' RC Command. There was none of the usual variability at 150hz; just a handful of steps in the 3s flight.

Functionally you had a 2-3hz RC control signal, not slow enough to cause failsafe, but so incredibly slow that it was not possible to control the quad..

I don't know what the timeout is on the count for the three disarm packets, but most likely with the 'link' appearing to be so slow, the FC had difficulty getting them.

Note that of the 4s flight time, throttle was only above zero for a total of 2.6ms until you held it.

The PID system was working normally, but only had very intermittent RC steps to work with.

Screen Shot 2022-01-27 at 01 53 55

The image shows the whole flight from arm to runaway triggering was a little under 4s, with throttle above zero for 2.6s. It went up, then it tightly followed a step in yaw lasting 300ms at about 110deg/s as you cut throttle. Then there was a small roll input. It seems to me that it was in level mode. The PID system was working perfectly normally all the time. So whatever the problem was, it affected handling of the RC data, nothing much else.

@ctzsnooze
Copy link
Member

That you held it in the air explains why the gyro responses are not exactly typical of a bouncy landing.

@pjpei
Copy link
Author

pjpei commented Jan 26, 2022

Just a nitpick, you mean 2.6 seconds throttle above zero, not 2.6 ms?

Ah, you corrected to 2.6s later. Apologies.

@hydra
Copy link
Contributor

hydra commented Jan 26, 2022

Note sure if this is related, but it might give some clues:

@HighFunction from ELRS discord reported:

"I've been experiencing delayed disarm on multiple models, and I'm now able to reproduce the issue consistently. I often hand catch the landing when flying LOS, and the delayed disarm makes my heart skip a beat when it occurs in this scenario, and in any other precision landing scenario as well. I first acknowledged this problem on v2.0. Once I could reproduce it, I tried updating to 2.0.1, and the problem persists. TX is HM Slim Pro with hardware fix. RX is HM EP2 confirmed, and very likely EP1, and PP too. I was using an OTX nightly for a long time, as it had been solid. Attempting to address this issue, I updated to OTX 2.3.14. I thought this fixed the issue, but at that time, I hadn't discovered what actually triggered the state in which the delayed disarm occurs. Indeed the issue persists in latest stable OTX. Radio is X-Lite S with Pro gimbals. To reproduce the issue, establish a connection between tx and aircraft, and then make one or more adjustments to the accelerometer calibration via stick inputs (see image.) Thereafter, disarm will be delayed by like .25-.5 seconds, which is an eternity in precision landing scenarios, and is at least mildly disturbing in any other landing/disarm scenario. I believe I may have experienced this with SPI rx too, but this is harder to reproduce, as acc. adjustment writes to eeprom (?) which currently causes an instant failsafe. Since the connection is unlikely to reestablish in any reasonable amount of time, or perhaps at all, (at least with my current settings,) power cycling the tx is required. I believe this power cycle may be clearing the delayed disarm state, wherever it may reside, since upon reconnection to the spi rx, disarm behavior is as expected."

and also, when he posted the log he said:

"I took off, then landed/disarmed, and made a single acc adjustment. I then took off and landed/disarmed 2-3 more times. Each time the, the delayed disarm occurred. While capturing the log data, I started noticing/experiencing what might be the same delay affecting all input after triggering the delayed disarm state. Maybe I only noticed at disarm initially, but now appears it may be a general delay of all inputs. I can't be sure, currently. Definitely noticed on disarm though, consistently."

https://cdn.discordapp.com/attachments/797109686285107241/927818141436870696/btfl_all.bbl

The comment "To reproduce the issue, establish a connection between tx and aircraft, and then make one or more adjustments to the accelerometer calibration via stick inputs (see image.) Thereafter, disarm will be delayed by like .25-.5 seconds" seems related.

@hydra
Copy link
Contributor

hydra commented Jan 26, 2022

@HighFunction's issues were also in #11226 and fixed by #11228.

However, it highlights what we're seeing in that when the scheduler doesn't schedule the RX task you don't get a disarm when you want it.

@IllusionFpv
Copy link
Contributor

@pjpei have you tried with the master branch if you can reproduce it?

@haslinghuis
Copy link
Member

@pjpei schedular_relax_rx is a cli setting.

It's default value is 1 on master (on my Matek F411 with CRSF)
Use 0 to disable.

@SteveCEvans
Copy link
Member

@pjpei can you please try the image from #11340 (comment)

@pjpei
Copy link
Author

pjpei commented Jan 27, 2022

@haslinghuis See above, I don't have that setting on the release I was on.

@SteveCEvans I've looked at github history in an attempt to find the issue, and I'm afraid the non-hard-realtime scheduler is putting me off from trying anything related to the new Betaflight again. Having worked on firmware specifically and software generally for more than 2 decades, having a non-deterministic scheduler is a disaster waiting to happen as it starves processes randomly. I hope you get it right eventually, but I'm out.

@asizon
Copy link
Member

asizon commented Jan 27, 2022

@pjpei Where do you want to go with your comments?First, if you dont have this setting, please try an OFICIAL dev build from master before open an issue. Second, @SteveCEvans test proposal with this image is really important.

@pjpei
Copy link
Author

pjpei commented Jan 27, 2022

@asizon I have reported a severe bug that caused me literal injury. You can reproduce it by loading the CPU heavily. It is unreasonable to expect me to risk an expensive drone for this, so I believe I'm done here, having given you the information you need.

@asizon
Copy link
Member

asizon commented Jan 27, 2022

@pjpei But this bug is happens to you using non oficial and non latest betaflight build....

@pjpei
Copy link
Author

pjpei commented Jan 27, 2022

The only thing I changed on my side was adding a new filter, which is cpu-intensive. Given that the core issue is a scheduler design flaw, I have no confidence in this build of betaflight, and I'm not the only person complaining with disarm issues. I'm not doing this, sorry. I hope you get it working right, but I doubt it.

@KarateBrot
Copy link
Member

KarateBrot commented Jan 27, 2022

@pjpei

If you don't want to use the latest version with a potential fix we can't help you very well. You're using a custom build with outdated firmware... how are we supposed to know what's wrong or if it's fixed already? I tried overloading my CPU with the latest master but I cannot reproduce a flyaway .

If you don't want to do it yourself - which is understandable after your unexpected injury - please at least show us your codebase so we can compile an up-to-date version of your code to reproduce the error more easily.

Or please just rebase on today's master, include #11340 and provide us with the compiled hex files if you don't want to show your code.

@pjpei
Copy link
Author

pjpei commented Jan 27, 2022

This is my file change list below the comment. I was previously willing to share, but given the adversity I've experienced here while reporting a critical bug, I'd rather not give my code.

Refer to #11338 for someone else also experiencing this issue.

I haven't verified the mathematical correctness of my filter yet, the code is in an indeterminate state. I added a setting to be able to choose the filter in "settings.c", added the filter code in "filter.c", added structures in "filter.h" and "gyro.h", then changed "gyro_init.c" to enable the new filter if it's selected in the CLI.

The core thing is the filter is slow, so I need to drop the PID frequency, and the CPU stays on 72% use.

I'm now disengaging here as it feels like I'm being blamed for this, but others have also experienced disarm issues. Please take ownership of this problem and I recommend fixing the design of the scheduler.


# On branch master
# Your branch is up to date with 'origin/master'.
#
# Changes to be committed:
#       modified:   src/main/cli/settings.c
#       modified:   src/main/common/filter.c
#       modified:   src/main/common/filter.h
#       modified:   src/main/sensors/gyro.h
#       modified:   src/main/sensors/gyro_init.c
#

@sugaarK
Copy link
Member

sugaarK commented Jan 27, 2022

@pjpei now given you forked our repo you kinda have to make it public.. if you want help then work with the guys.. other wise the team is in the middle of trying to get 4.3 out the door....

@pjpei
Copy link
Author

pjpei commented Jan 27, 2022

Unfortunately I have removed the code from my repo. It was never ready for prime-time, but I accidentally exposed a bug in Betaflight while my experimental (and not usable) filter was still in an incomplete state. No patience for it left, as with Betaflight in general. Please stop tagging me from here on.

@SteveCEvans
Copy link
Member

@pjpei I am closing this issue as without access to the source code causing the issue we cannot comment further. You criticise the scheduler and yet you have not even detailed where you call your new filter (is it from the filter task) or how many us it runs for. Given the load impact I suspect that you are stretching the execution time of the filter task excessively. As we have a cooperative multi-tasking scheme your code needs to cooperate and from the little evidence we have, it doesn’t.

@hydra
Copy link
Contributor

hydra commented Jan 27, 2022

@SteveCEvans Perhaps there is a better approach to this issue - if the code that @pjpei was using highlights an issue with the scheduler then it's probably fairly trivial to just add a 'nop' loop in a gyro task such that the CPU load is similar or higher. We have details of the target and exact config he was using so I feel it should be fairly easy to replicate.

@pjpei We're not blaming you for the issue at all, and thank you for your time in reporting the issue. If you still have the firmware in questions, then please give us the output of the 'tasks' command in CLI. i.e. go to cli, run 'tasks', wait 10 seconds, run 'tasks' again, give us both outputs.

If we see similar issues in the future we can still reference this discussion.

(sorry for tag, last one, feel free to unsubscribe from github notifications too).

@pjpei
Copy link
Author

pjpei commented Jan 27, 2022

Hopefully my frustration with this firmware will prevent someone from being injured, which is literally the only reason I've made the effort to do this from the old drone binary loaded on it which I'll soon permanently erase from the drone.

Thanks for the politeness, @hydra , but I mean it's quite clear that I've been blamed for causing the issue I've had by myself, and it seems that I'm not the only person being disregarded after others have also complained about the same issue, which you can clearly see mentioned by multiple other people.

They have also been disregarded, referring to the "unsupported" tag on #11338 for example, and looking at the objections by @etracer65 on the scheduler overhaul.

I'm pretty much over this. I wish you all the best with your endeavors, and I now wash my hands of the bug that I've found and duly informed you about.

# tasks
Task list             rate/hz  max/us  avg/us maxload avgload  total/ms   late    run reqd/us
00 - (         SYSTEM)     10       1       1    0.0%    0.0%         0      0    248       0
01 - (         SYSTEM)    977       8       2    0.7%    0.1%        21      7  23961       2
02 - (           GYRO)   8013      73      61   58.4%   48.8%      9720      0 197270       0
03 - (         FILTER)   4008      31      19   12.4%    7.6%      1620      0  98635       0
04 - (            PID)   4008      70      55   28.0%   22.0%      4734      0  98635       0
05 - (            ACC)    978      15       4    1.4%    0.3%        61      5  23791       4
06 - (       ATTITUDE)     95      18      15    0.1%    0.1%        26      4   2431      15
07 - (             RX)     14      34       5    0.0%    0.0%        24      6   1476      17
08 - (         SERIAL)     94  380124       8 3573.1%    0.0%       747      0   2142       8
09 - (       DISPATCH)    979       7       2    0.6%    0.1%        12      4  24002       2
10 - (BATTERY_VOLTAGE)    197       9       3    0.1%    0.0%        10      1   4869       3
11 - (BATTERY_CURRENT)     45       5       1    0.0%    0.0%         1      0   1227       1
12 - ( BATTERY_ALERTS)      6       4       3    0.0%    0.0%         0      0    125       3
13 - (         BEEPER)     94      17      11    0.1%    0.1%         7     12   2141      11
14 - (            GPS)    105      40      36    0.4%    0.3%        22     15   2542      36
16 - (           BARO)     35      28      26    0.0%    0.0%        38      4   2866      27
17 - (       ALTITUDE)     35      16      15    0.0%    0.0%        10      3    961      15
18 - (      TELEMETRY)    490      17       1    0.8%    0.0%        47     28  11183       0
20 - (            OSD)     12      45       6    0.0%    0.0%        36     58   3790      30
22 - (            CMS)     18       6       2    0.0%    0.0%         0      0    488       2
23 - (        VTXCTRL)      5      27      11    0.0%    0.0%         2      2    123      11
24 - (        CAMCTRL)      6       4       3    0.0%    0.0%         0      0    124       3
26 - (    ADCINTERNAL)      3       4       4    0.0%    0.0%         0      0     25       3
28 - (SPEED_NEGOTIATION)     93       4       1    0.0%    0.0%         1      0   2137       0
RX Check Function                  14       4                        17
Total (excluding SERIAL)                                79.4%

# tasks
Task list             rate/hz  max/us  avg/us maxload avgload  total/ms   late    run reqd/us
00 - (         SYSTEM)     10       6       5    0.0%    0.0%         0      1    161       5
01 - (         SYSTEM)    981       7       4    0.6%    0.3%        36     12  15792       4
02 - (           GYRO)   8019      61      55   48.9%   44.1%     16242      0 129458       0
03 - (         FILTER)   4010      26      25   10.4%   10.0%      2677      0  64729       0
04 - (            PID)   4008      62      56   24.8%   22.4%      7811      0  64729       0
05 - (            ACC)    974      10       6    0.9%    0.5%       100      4  15715       6
06 - (       ATTITUDE)     99      18      15    0.1%    0.1%        43      0   1597      15
07 - (             RX)     15      25       2    0.0%    0.0%        39      0    968      17
08 - (         SERIAL)     99    4331       5   42.8%    0.0%       754      0   1596       5
09 - (       DISPATCH)    976       8       4    0.7%    0.3%        19     23  15825       4
10 - (BATTERY_VOLTAGE)    198       7       7    0.1%    0.1%        17      0   3201       6
11 - (BATTERY_CURRENT)     50       3       1    0.0%    0.0%         1      0    805       1
12 - ( BATTERY_ALERTS)      5       2       3    0.0%    0.0%         0      0     80       3
13 - (         BEEPER)     99      16      15    0.1%    0.1%        12      0   1595      15
14 - (            GPS)    110      39      37    0.4%    0.4%        39      0   1749      37
16 - (           BARO)     40      36      36    0.1%    0.1%        63      1   1917      28
17 - (       ALTITUDE)     40      19      16    0.0%    0.0%        17      0    640      16
18 - (      TELEMETRY)    490       6       0    0.2%    0.0%        51     11   7896       1
20 - (            OSD)     12      34       6    0.0%    0.0%        64      7   2768      32
22 - (            CMS)     20       6       4    0.0%    0.0%         1      1    323       4
23 - (        VTXCTRL)      5       1      10    0.0%    0.0%         2      0     81      10
24 - (        CAMCTRL)      5       6       5    0.0%    0.0%         0      0     81       5
26 - (    ADCINTERNAL)      2       6       5    0.0%    0.0%         0      0     17       5
28 - (SPEED_NEGOTIATION)     99       5       3    0.0%    0.0%         2      3   1593       3
RX Check Function                  10       4                        29
Total (excluding SERIAL)                                78.4%

# tasks
Task list             rate/hz  max/us  avg/us maxload avgload  total/ms   late    run reqd/us
00 - (         SYSTEM)     10       1       4    0.0%    0.0%         0      0    203       4
01 - (         SYSTEM)    976       4       1    0.3%    0.0%        54      0  19938       1
02 - (           GYRO)   8013      63      55   50.4%   44.0%     24469      0 163271       0
03 - (         FILTER)   4008      27      20   10.8%    8.0%      4105      0  81635       0
04 - (            PID)   4008      61      53   24.4%   21.2%     11593      0  81636       0
05 - (            ACC)    974       6       4    0.5%    0.3%       148      0  19856       4
06 - (       ATTITUDE)     99      15      12    0.1%    0.1%        64      0   2019      12
07 - (             RX)     15      26       2    0.0%    0.0%        59      0   1220      17
08 - (         SERIAL)     99    4330       9   42.8%    0.0%       762      0   2008       9
09 - (       DISPATCH)    981       3       1    0.2%    0.0%        29      1  19968       0
10 - (BATTERY_VOLTAGE)    198       5       4    0.0%    0.0%        26      0   4036       4
11 - (BATTERY_CURRENT)     50       3       0    0.0%    0.0%         2      0   1015       1
12 - ( BATTERY_ALERTS)      5       2       2    0.0%    0.0%         0      0    102       2
13 - (         BEEPER)     99      14      13    0.1%    0.1%        19      0   2007      13
14 - (            GPS)    105      38      37    0.3%    0.3%        60      0   2184      37
16 - (           BARO)     40      22      11    0.0%    0.0%        95      0   2419      36
17 - (       ALTITUDE)     40      16      13    0.0%    0.0%        26      0    810      13
18 - (      TELEMETRY)    492       3       1    0.1%    0.0%        55      1   9966       0
20 - (            OSD)     12      28       6    0.0%    0.0%       100      1   3660      32
22 - (            CMS)     20       2       2    0.0%    0.0%         2      0    406       1
23 - (        VTXCTRL)      5       2       9    0.0%    0.0%         2      0    102       9
24 - (        CAMCTRL)      5       3       5    0.0%    0.0%         0      0    102       5
26 - (    ADCINTERNAL)      2       4       5    0.0%    0.0%         0      0     20       5
28 - (SPEED_NEGOTIATION)     99       3       1    0.0%    0.0%         4      0   2007       0
RX Check Function                  10       3                        45
Total (excluding SERIAL)                                74.0%

# tasks
Task list             rate/hz  max/us  avg/us maxload avgload  total/ms   late    run reqd/us
00 - (         SYSTEM)     10       1       2    0.0%    0.0%         0      0    290       2
01 - (         SYSTEM)    979       7       2    0.6%    0.1%        81     27  28441       2
02 - (           GYRO)   8019      62      61   49.7%   48.9%     36217      0 233204       0
03 - (         FILTER)   4008      22      20    8.8%    8.0%      6009      0 116602       0
04 - (            PID)   4010      62      57   24.8%   22.8%     17137      0 116602       0
05 - (            ACC)    978       9       6    0.8%    0.5%       218      1  28306       6
06 - (       ATTITUDE)     99      20      18    0.1%    0.1%        96      1   2874      18
07 - (             RX)     15      26       2    0.0%    0.0%        86      0   1740      16
08 - (         SERIAL)     99    6330       4   62.6%    0.0%       774      0   2873       4
09 - (       DISPATCH)    983       6       2    0.5%    0.1%        43     40  28504       2
10 - (BATTERY_VOLTAGE)    198       9       6    0.1%    0.1%        39      1   5764       6
11 - (BATTERY_CURRENT)     50       3       1    0.0%    0.0%         3      0   1451       1
12 - ( BATTERY_ALERTS)      5       5       4    0.0%    0.0%         0      0    145       4
13 - (         BEEPER)     99      18      15    0.1%    0.1%        29      0   2870      15
14 - (            GPS)    105      41      39    0.4%    0.4%        91      0   3141      39
16 - (           BARO)     40      35      36    0.1%    0.1%       141      0   3455      28
17 - (       ALTITUDE)     40      19      15    0.0%    0.0%        38      1   1155      15
18 - (      TELEMETRY)    489       6       3    0.2%    0.1%        63     26  14229       3
20 - (            OSD)     12      32       9    0.0%    0.0%       152      0   5235      32
22 - (            CMS)     20       7       6    0.0%    0.0%         3      0    580       6
23 - (        VTXCTRL)      5       1       8    0.0%    0.0%         2      0    145       8
24 - (        CAMCTRL)      5       1       3    0.0%    0.0%         0      0    145       3
26 - (    ADCINTERNAL)      1       6       5    0.0%    0.0%         0      0     29       5
28 - (SPEED_NEGOTIATION)     99       5       2    0.0%    0.0%         6      5   2866       2
RX Check Function                  12       3                        69
Total (excluding SERIAL)                                81.3%

@spatzengr
Copy link

@pjpei , GROW UP! They just asked you to update YOUR FORK which has the issue as there is new RX handling in the current BF Master. Then you went off.

(p.s. like what other FW are you going to use? Ridiculous. Be a part of the solution. Either way, this will be looked at without you more closely.)

@etracer65
Copy link
Member

While it's unfortunate that the original code related to this issue isn't provided, it should be easy to replicate the problem based on the task timings provided above. Clearly the GYRO task is running longer than normal. And the combination of gyro/filter/pid is more than 125us so with an 8K pid loop I would suppose the new scheduler would have problems with that. But the point is that regardless of whatever misbehavior happens in a task, it shouldn't prevent critical functions like being able to disarm. That's the fundamental underlying concern that keeps being ignored. No matter how "delayed" the tasks might get the scheduler should NEVER prevent tasks from running. Continuing to keep sticking your heads in the sand won't solve the problems.

At this point I don't consider Betaflight to be safe for general use and would not recommend it be released until these issues are fundamentally resolved. Otherwise people are going to get hurt and Betaflight's reputation will be permanently tarnished.

@hydra
Copy link
Contributor

hydra commented Jan 27, 2022

@etracer yup, I agree and commented as follows in slack on this thread: "IMHO, any form of safety related issues should be treated as a higher priority than anything else. Our hobby doesn't need any more negative attention in the media".

and: "it would be good to have tests that ensure a disarm is always processed even under high-load conditions".

@KarateBrot
Copy link
Member

KarateBrot commented Jan 27, 2022

@pjpei I'm now disengaging here as it feels like I'm being blamed for this, but others have also experienced disarm issues. Please take ownership of this problem and I recommend fixing the design of the scheduler.

No one is blaming you for this. Thanks for raising the issue. We are trying to go to the bottom of it but we need your help to do it.

I agree with @hydra

Edit:
Now that we can see what your code is about, I immediately see what the issue was: Your filter update function is blocking the firmware from running for a very long time and likely starves a lot of other tasks. Raising the issue made the scheduler less susceptible to unexpectedly long tasks and as far as I can see starving tasks are now self-regulating. So thanks for that :)

@mathiasvr
Copy link
Contributor

IMO disarm and failsafe should explicitly be guaranteed regardless of changes made to any task. I don't know how this could be enforced with cooperative multitasking, but otherwise there should maybe be some sanity check at compile time. I think it will be very difficult in the long run to enforce implicit assumptions in an open source project involving many different people and changes. At least maybe some document should describe task and scheduler requirements.

I don't actually have time to look into this myself, but it would be great to have a discussion about the general priorities and possible solutions in this regard.

@asizon
Copy link
Member

asizon commented Feb 3, 2022

@sugaarK finnaly is here lol

emuflight/EmuFlight#748

@PaulFPV
Copy link

PaulFPV commented Mar 14, 2022

I had the same non disarm issue one week ago with my 7 inch. Perfect flight, prepared for landing but then the quad wasn't disarming. Burned 2 motors with the propellers stopped between the feet (still waiting for the replacement). Fortunately I had strong shoes and the battery was empty, so no injuries.

The setup

FC: MAMBA Basic F722 MK3 with Betaflight 4.3 RC2
Radio: TX16s with Crossfire tx module and crossfire diversity nano rx on the quad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests