Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad ignition module /noisy RPM input can crash AP by generating too many interrupts #15384

Closed
AndKe opened this issue Sep 20, 2020 · 8 comments
Closed
Assignees

Comments

@AndKe
Copy link
Contributor

AndKe commented Sep 20, 2020

Issue details

At least using PCI 1.3 ICE ignition module , this issue is 100% reproducible.
http://www.falkon.cz/op.htm

RPM_TYPE=2
RPM_PIN=54
When this ignition module is unpowered, / looses power, it will try to come to life using the weak pullup power from the Cube.
Feeding this module weak power from a IO pin, it will try to come to life, and pull down the weak pullup, effectively generating a 1Mhz signal (for this to work, the ignition module must not have external power, just GND and RPM signal connected.)
In real life, this happens if the ignition module die, or it's power supply fail.

This happens because the ArduPilot spends way too much time in the interrupt handling routine.

Reproducing on bench, no ignition module needed:
Configure as mentioned above.
From a signal generator;
1: apply 12kHz signal, observe that the CPU load barely increases
2: apply 35kHz signal, observe that the CPU load reach 96%
3: apply 100kHz signal, observe that autopilot becomes unresponsive/crashes repetitively

Summary:
a ground loop, disconnected long wire in a noisy environment or defective RPM sensor or ignition module , can produce more interrupts that ArduPilot can handle, resulting in repetitive reboots and sluggish behavior in between those reboots.

Solution would be to temporarily disable IRQ if incoming signal is too fast.

Version
AP 4.0.6

Platform
[x] All
[ ] AntennaTracker
[ ] Copter
[ ] Plane
[ ] Rover
[ ] Submarine

Airframe type
5.5m Plane

Hardware type
Cube2

Logs
(none attached)

@tridge tridge self-assigned this Sep 20, 2020
@tridge
Copy link
Contributor

tridge commented Sep 21, 2020

I've reproduced the problem by connecting a 921600 baud uart to the RPM input pin

@tridge
Copy link
Contributor

tridge commented Sep 21, 2020

@AndKe can you please test the fix in #15388

@AndKe
Copy link
Contributor Author

AndKe commented Sep 21, 2020

@tridge will check soon, please notify me when what @peterbarker pointed out is fixed.

@tridge
Copy link
Contributor

tridge commented Sep 22, 2020

@AndKe it's fixed, ready for your testing

@AndKe
Copy link
Contributor Author

AndKe commented Sep 23, 2020

@tridge - I tried to ask you on gitter;

git pull upstream master
git submodule update
git fetch --all --tags
git checkout tags/ArduPlane-4.0.6 -b plane4.0.6-irq
git remote add tridge https://github.com/tridge/ardupilot.git
git fetch tridge 
git cherry-pick f131f94c078f00cccc56afcc277376ed86b7aed6
fatal: bad object f131f94c078f00cccc56afcc277376ed86b7aed6
  • is it something I am missing?

tridge added a commit that referenced this issue Sep 26, 2020
This implements a max quota of GPIO interrupts per 100ms period to
prevent high interrupt counts from consuming all CPU and causing a
lockup. The limit is set as 10k interrupts per 0.1s period. That limit
should be high enough for all reasonable uses of GPIO interrupt
handlers while being below the level that causes significant CPU loads
and flight issues

This addresses issue #15384
@tridge
Copy link
Contributor

tridge commented Sep 26, 2020

@AndKe use gitk to cherry pick, and click the patch. You want this for 4.0.6 I presume?
I've now updated the plane4.0 branch to include this fix btw, in preparation for a new release once you've tested
https://github.com/ArduPilot/ardupilot/commits/plane4.0
I'll close this issue now. Please report your test results on the PR #15388

@stephendade
Copy link
Contributor

For Dev Call: Discussion as to adding this as our first User Alert.

@AndKe
Copy link
Contributor Author

AndKe commented Oct 1, 2020

@tridge
Screenshot from 2020-10-01 15-32-01

The patch works as intended, somewhere before 600kRPM pulses (RPM_SCALING = 1) there is no more RPM data - then you can see the signal generator approach pass 100khz around 1h 07m CPU load increases, and then the interrupt is not handled anymore until rebooted, saving the CubeBlack from reboot.

tested on:

ChipDes:
  family: STM32F42x
  revision: 3
Chip:
  20016419 STM32F42x_43x rev3 (no 1M flaw)
Info:
  flash size: 2080768
  board_type: 9 (fmuv3)
  board_rev: 0
Identification complete


Thank you, please pull the patch into master.

tridge added a commit to tridge/ardupilot that referenced this issue Oct 1, 2020
This implements a max quota of GPIO interrupts per 100ms period to
prevent high interrupt counts from consuming all CPU and causing a
lockup. The limit is set as 10k interrupts per 0.1s period. That limit
should be high enough for all reasonable uses of GPIO interrupt
handlers while being below the level that causes significant CPU loads
and flight issues

This addresses issue ArduPilot#15384
tridge added a commit that referenced this issue Oct 1, 2020
This implements a max quota of GPIO interrupts per 100ms period to
prevent high interrupt counts from consuming all CPU and causing a
lockup. The limit is set as 10k interrupts per 0.1s period. That limit
should be high enough for all reasonable uses of GPIO interrupt
handlers while being below the level that causes significant CPU loads
and flight issues

This addresses issue #15384
tridge added a commit that referenced this issue Oct 1, 2020
This implements a max quota of GPIO interrupts per 100ms period to
prevent high interrupt counts from consuming all CPU and causing a
lockup. The limit is set as 10k interrupts per 0.1s period. That limit
should be high enough for all reasonable uses of GPIO interrupt
handlers while being below the level that causes significant CPU loads
and flight issues

This addresses issue #15384
tridge added a commit that referenced this issue Oct 2, 2020
This implements a max quota of GPIO interrupts per 100ms period to
prevent high interrupt counts from consuming all CPU and causing a
lockup. The limit is set as 10k interrupts per 0.1s period. That limit
should be high enough for all reasonable uses of GPIO interrupt
handlers while being below the level that causes significant CPU loads
and flight issues

This addresses issue #15384
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants