Restart autopilot on consecutive heartbeat failures#3773
Merged
patrickelectric merged 1 commit intobluerobotics:masterfrom Feb 11, 2026
Merged
Restart autopilot on consecutive heartbeat failures#3773patrickelectric merged 1 commit intobluerobotics:masterfrom
patrickelectric merged 1 commit intobluerobotics:masterfrom
Conversation
If we get no heartbeat from the subprocess, there is no way to ensure it is running safely. Killing ardupilot will now agressively prune ardupilot processes if it cannot stop the one it is attached to. Prior to this change, if we could not stop our own process, we would be in limbo, with no way to recover.
Reviewer's GuideAdds consecutive heartbeat failure monitoring to automatically restart Ardupilot when MAVLink heartbeats stop, and makes Ardupilot shutdown more robust by always pruning processes even if graceful termination fails. Sequence diagram for autopilot auto-restart on consecutive heartbeat failuressequenceDiagram
participant AutoRestartTask
participant AutopilotManager
participant VehicleManager
participant ArduPilotProcess
loop every_5_seconds
AutoRestartTask->>AutopilotManager: auto_restart_ardupilot()
alt should_be_running_and_process_not_running
AutopilotManager->>AutopilotManager: start_ardupilot()
end
alt should_be_running_and_is_running
AutopilotManager->>VehicleManager: is_heart_beating()
alt heartbeat_ok
VehicleManager-->>AutopilotManager: true
AutopilotManager->>AutopilotManager: _heartbeat_fail_count = 0
else heartbeat_missing
VehicleManager-->>AutopilotManager: false
AutopilotManager->>AutopilotManager: _heartbeat_fail_count += 1
end
else not_running_or_should_not_run
AutopilotManager->>AutopilotManager: skip_heartbeat_check
end
alt _heartbeat_fail_count >= _max_heartbeat_failures
AutopilotManager->>AutopilotManager: restart_ardupilot()
AutopilotManager->>AutopilotManager: _heartbeat_fail_count = 0
end
end
Updated class diagram for AutopilotManager heartbeat monitoring and shutdownclassDiagram
class AutopilotManager {
bool should_be_running
int _heartbeat_fail_count
int _max_heartbeat_failures
VehicleManager vehicle_manager
async auto_restart_ardupilot()
async start_ardupilot()
async restart_ardupilot()
async kill_ardupilot()
async terminate_ardupilot_subprocess()
async prune_ardupilot_processes()
bool is_running()
}
class VehicleManager {
async bool is_heart_beating()
async shutdown_vehicle()
}
class AutoPilotProcessKillFail {
}
AutopilotManager *-- VehicleManager : manages
AutopilotManager ..> AutoPilotProcessKillFail : handles_exception
Flow diagram for robust kill_ardupilot shutdown behaviorflowchart TD
A[Start kill_ardupilot] --> B[Log Terminating Ardupilot subprocess.]
B --> C[Call terminate_ardupilot_subprocess]
C -->|success| D[Log Ardupilot subprocess terminated.]
C -->|AutoPilotProcessKillFail| E[Log error about failed termination]
D --> F[Log Pruning Ardupilot's system processes.]
E --> F
F --> G[Call prune_ardupilot_processes]
G --> H[Log Ardupilot's system processes pruned.]
H --> I[End kill_ardupilot]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider making
_max_heartbeat_failuresconfigurable via settings instead of a hardcoded10, so the restart sensitivity can be tuned per deployment or vehicle type. - The
_heartbeat_fail_countis only reset on successful heartbeats and after a threshold-triggered restart; you may want to also reset it when (re)starting Ardupilot or whenshould_be_runningtransitions to false/true to avoid stale counts carrying across distinct runs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider making `_max_heartbeat_failures` configurable via settings instead of a hardcoded `10`, so the restart sensitivity can be tuned per deployment or vehicle type.
- The `_heartbeat_fail_count` is only reset on successful heartbeats and after a threshold-triggered restart; you may want to also reset it when (re)starting Ardupilot or when `should_be_running` transitions to false/true to avoid stale counts carrying across distinct runs.
## Individual Comments
### Comment 1
<location> `core/services/ardupilot_manager/autopilot_manager.py:206-215` </location>
<code_context>
+ # Monitor MAVLink heartbeat while autopilot is supposed to be running
</code_context>
<issue_to_address>
**issue (bug_risk):** Heartbeat failure counter may carry over across stop/start cycles and trigger premature restarts.
Because `_heartbeat_fail_count` is only reset on a successful heartbeat or after a restart, any accumulated failures can persist when `should_be_running` flips to `False` (e.g., user stops Ardupilot). On the next start, normal connection delays might quickly push the stale count past `_max_heartbeat_failures`, causing premature restarts right after a clean start.
Consider resetting `_heartbeat_fail_count` when `should_be_running` transitions to `True` (start) and/or when it becomes `False` (intentional stop), so the counter only tracks failures for the current run session.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
patrickelectric
approved these changes
Feb 11, 2026
Member
|
Thanks for the contributions @goasChris! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If we get no heartbeat from the subprocess, there is no way to ensure it is running safely.
Killing ardupilot will now agressively prune ardupilot processes if it cannot stop the one it is attached to. Prior to this change, if we could not stop our own process, we would be in limbo, with no way to recover.
Testing
Summary by Sourcery
Add heartbeat-based monitoring to automatically restart Ardupilot on consecutive MAVLink heartbeat failures and harden shutdown behavior to avoid lingering or zombie Ardupilot processes.
New Features:
Bug Fixes:
Enhancements: