New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DroneKit has no method for detecting command failure - do we need one? #114

Open
hamishwillee opened this Issue May 14, 2015 · 18 comments

Comments

Projects
None yet
9 participants
@hamishwillee
Contributor

hamishwillee commented May 14, 2015

Currently our instructions suggest that you change the mode like this:

v.mode = VehicleMode("AUTO")
v.flush() # After this, write to vehicle has been made

The problem is that after this code the write has succeeded, but the mode may not actually have been changed (and if you read back v.mode.name it may well be the old name). A better coding paradigm is to not proceed to the next step until all your conditions are met:

print " Get ready to take off" 
v.mode = VehicleMode("AUTO")
v.armed= True
v.flush() # After this, write to vehicle has been made, but no guarantee change succeeded.
while not v.mode.name=='AUTO' and not v.armed==False and not api.exit:
    print " Waiting for AUTO and ARMED..."
    time.sleep(1)
print "Ready to take off"

Note: I will change our docs and examples to show the above.

There are two problems with this. Firstly it involves polling, which I object to on principle. Secondly, this will sit in a loop forever if the arming/mode change fails. This can happen because it is not possible to set modes/armed from certain states. We don't process the message that tells us if the other commands failed, so there is no way of determining what is going on.

So the question is, can/should we process command failure, and if so how?

The easiest way would probably be to make the flush() api synchronous and wait on responses to all preceding set messages (leaving exception if they fail). I wouldn't like to do it using callbacks because that is very messy code. We could just live with the code above.

@mrpollo - any ideas?

P.S. Don't know if it is possible in Python, but .NET does this quite nicely - you can make asynchronous requests and (at some point) wait on the result synchronously. This would allow us to write the code above without polling, and still return a value on failure. This is the problem with spending time in other languages, you learn about things you wish you had :-)

@ggregory8

This comment has been minimized.

Show comment
Hide comment
@ggregory8

ggregory8 May 18, 2015

Hi, I agree there should be some method to identify whether a command has failed.

Another common example is arming failing. As mentioned above an easy workaround is issuing the command then polling the state until it is set, an timeout would allow you to detect this has failed but is messy also.

Are there currently any functions to handle command acknowledges?

ggregory8 commented May 18, 2015

Hi, I agree there should be some method to identify whether a command has failed.

Another common example is arming failing. As mentioned above an easy workaround is issuing the command then polling the state until it is set, an timeout would allow you to detect this has failed but is messy also.

Are there currently any functions to handle command acknowledges?

@zlite

This comment has been minimized.

Show comment
Hide comment
@zlite

zlite May 18, 2015

Contributor

Adding Fredia, the Droidplanner/Tower lead, who has been thinking about
the same thing.
On May 17, 2015 7:33 PM, "ggregory8" notifications@github.com wrote:

Hi, I agree there should be some method to identify whether a command has
failed.

Another common example is arming failing. As mentioned above an easy
workaround is issuing the command then polling the state until it is set,
an timeout would allow you to detect this has failed but is messy also.

Are there currently any functions to handle command acknowledges?


Reply to this email directly or view it on GitHub
#114 (comment)
.

Contributor

zlite commented May 18, 2015

Adding Fredia, the Droidplanner/Tower lead, who has been thinking about
the same thing.
On May 17, 2015 7:33 PM, "ggregory8" notifications@github.com wrote:

Hi, I agree there should be some method to identify whether a command has
failed.

Another common example is arming failing. As mentioned above an easy
workaround is issuing the command then polling the state until it is set,
an timeout would allow you to detect this has failed but is messy also.

Are there currently any functions to handle command acknowledges?


Reply to this email directly or view it on GitHub
#114 (comment)
.

@ne0fhyk

This comment has been minimized.

Show comment
Hide comment
@ne0fhyk

ne0fhyk May 18, 2015

Member

If the command (i.e: arm) is sent to the autopilot via the COMMAND_LONG mavlink message, as done in DroneKit-Android, then the autopilot will reply with a COMMAND_ACK message to give feedback whether the command was executed.

This mechanism can be used to detect if the command successfully reached the autopilot, and the COMMAND_ACK result field can also indicate the execution result, as described by the MAV_STATUS mavlink enum.

Member

ne0fhyk commented May 18, 2015

If the command (i.e: arm) is sent to the autopilot via the COMMAND_LONG mavlink message, as done in DroneKit-Android, then the autopilot will reply with a COMMAND_ACK message to give feedback whether the command was executed.

This mechanism can be used to detect if the command successfully reached the autopilot, and the COMMAND_ACK result field can also indicate the execution result, as described by the MAV_STATUS mavlink enum.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee May 18, 2015

Contributor

Yes. I've been thinking it might be good to change our generic send_mavlink to being a message that has a callback, passing that response on return.

The problem would be to decide how you might make that work well with our current model of having settable python attributes - it might be that we need to have another model in this case.

Contributor

hamishwillee commented May 18, 2015

Yes. I've been thinking it might be good to change our generic send_mavlink to being a message that has a callback, passing that response on return.

The problem would be to decide how you might make that work well with our current model of having settable python attributes - it might be that we need to have another model in this case.

@ne0fhyk

This comment has been minimized.

Show comment
Hide comment
@ne0fhyk

ne0fhyk May 18, 2015

Member

I'm not familiar with the python threading model, but using the described mechanism, it's possible to implement a synchronous api in java by having the executing thread block on a lock, and have it awaken by a watchdog thread when the result is confirmed, to continue execution.

However with DroneKit-Android, I decided to go with an asynchronous api as blocking on the executing thread is a no-go on mobile, and it nicely fits the exhibited client-server interaction.

Member

ne0fhyk commented May 18, 2015

I'm not familiar with the python threading model, but using the described mechanism, it's possible to implement a synchronous api in java by having the executing thread block on a lock, and have it awaken by a watchdog thread when the result is confirmed, to continue execution.

However with DroneKit-Android, I decided to go with an asynchronous api as blocking on the executing thread is a no-go on mobile, and it nicely fits the exhibited client-server interaction.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee May 18, 2015

Contributor

Thanks! AFAIK both models are possible on Python. Its really more a question of how we want people to be able to use the API - and move away from our dead simple but not quite effective approach. A question for @mrpollo

Contributor

hamishwillee commented May 18, 2015

Thanks! AFAIK both models are possible on Python. Its really more a question of how we want people to be able to use the API - and move away from our dead simple but not quite effective approach. A question for @mrpollo

@AkshayBudhiraja

This comment has been minimized.

Show comment
Hide comment
@AkshayBudhiraja

AkshayBudhiraja May 30, 2015

Has there been any progress with regards to the direction DroneKit might be taking with this?

Command failure detection is absolutely crucial be it through a having a callback for send_mavlink or having other asynchronous calls for critical applications.

AkshayBudhiraja commented May 30, 2015

Has there been any progress with regards to the direction DroneKit might be taking with this?

Command failure detection is absolutely crucial be it through a having a callback for send_mavlink or having other asynchronous calls for critical applications.

@ne0fhyk

This comment has been minimized.

Show comment
Hide comment
@ne0fhyk

ne0fhyk May 31, 2015

Member

@AkshayBudhiraja DroneKit-Android is in the process of adding the implementation discussed above, which will allow support for explicit callbacks.

Member

ne0fhyk commented May 31, 2015

@AkshayBudhiraja DroneKit-Android is in the process of adding the implementation discussed above, which will allow support for explicit callbacks.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee May 31, 2015

Contributor

@AkshayBudhiraja Not implemented yet on DroneKit-Python. We do consider it important and will be prioritising it in the coming weeks.

Contributor

hamishwillee commented May 31, 2015

@AkshayBudhiraja Not implemented yet on DroneKit-Python. We do consider it important and will be prioritising it in the coming weeks.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee Aug 10, 2015

Contributor

@tcr3dr I quite like the "futures" approach described here #209 (comment)

Contributor

hamishwillee commented Aug 10, 2015

@tcr3dr I quite like the "futures" approach described here #209 (comment)

@atomictom

This comment has been minimized.

Show comment
Hide comment
@atomictom

atomictom Aug 12, 2015

This futures library might be worth looking into.

atomictom commented Aug 12, 2015

This futures library might be worth looking into.

@tcr3dr tcr3dr modified the milestones: Minor, v1.5.0 Aug 12, 2015

@atomictom

This comment has been minimized.

Show comment
Hide comment
@atomictom

atomictom Dec 29, 2015

For those who are looking at this and want a solution for the moment, I've found that using

on_message('COMMAND_ACK')

and

on_message('STATUSTEXT')

work relatively well for detecting failure. However, as mentioned in the best practices guide, I think checking state often and explicitly, as well as polling on command success is the most reliable. That said, being able to see when there is a problem early is nice. It's also nice to be able to log the cause of failure (I send all output from both listeners to logs).

COMMAND_ACK is the MAVLink message for acknowledging the success or failure of a command, while STATUSTEXT is the MAVLink message for delivering plain text (as in, the output you see in MAVProxy). For me, on COMMAND_ACK, if the command is an arm or takeoff command and has a non-zero result (i.e. failure), I send the LAND command to immediately abort. STATUSTEXT also reports a severity which you can possibly act on. You can read about it under the section "MAV_SEVERITY" in the specification here.

atomictom commented Dec 29, 2015

For those who are looking at this and want a solution for the moment, I've found that using

on_message('COMMAND_ACK')

and

on_message('STATUSTEXT')

work relatively well for detecting failure. However, as mentioned in the best practices guide, I think checking state often and explicitly, as well as polling on command success is the most reliable. That said, being able to see when there is a problem early is nice. It's also nice to be able to log the cause of failure (I send all output from both listeners to logs).

COMMAND_ACK is the MAVLink message for acknowledging the success or failure of a command, while STATUSTEXT is the MAVLink message for delivering plain text (as in, the output you see in MAVProxy). For me, on COMMAND_ACK, if the command is an arm or takeoff command and has a non-zero result (i.e. failure), I send the LAND command to immediately abort. STATUSTEXT also reports a severity which you can possibly act on. You can read about it under the section "MAV_SEVERITY" in the specification here.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee Dec 29, 2015

Contributor

Thanks @atomictom . It may well be worth updating http://python.dronekit.io/develop/best_practice.html#monitor-and-react-to-state-changes with some of those ideas ... or adding them into logging/debugging etc. I'll think on it.

Note, we already allow you to monitor the system status using the system_status attribute. We don't explicitly expose the severity, but in most cases it is obvious.+

Contributor

hamishwillee commented Dec 29, 2015

Thanks @atomictom . It may well be worth updating http://python.dronekit.io/develop/best_practice.html#monitor-and-react-to-state-changes with some of those ideas ... or adding them into logging/debugging etc. I'll think on it.

Note, we already allow you to monitor the system status using the system_status attribute. We don't explicitly expose the severity, but in most cases it is obvious.+

@atomictom

This comment has been minimized.

Show comment
Hide comment
@atomictom

atomictom Jan 7, 2016

@hamishwillee No problem!

On the system_status. That's a good point, and something I also monitor. But it's nice to get the STATUSTEXT for logging which includes things like PreArm failure messages since the system_status is more of a summary indicating if there are issues and how bad they are, but doesn't tell any of the extra information found in STATUSTEXT.

atomictom commented Jan 7, 2016

@hamishwillee No problem!

On the system_status. That's a good point, and something I also monitor. But it's nice to get the STATUSTEXT for logging which includes things like PreArm failure messages since the system_status is more of a summary indicating if there are issues and how bad they are, but doesn't tell any of the extra information found in STATUSTEXT.

@waTeim

This comment has been minimized.

Show comment
Hide comment
@waTeim

waTeim Apr 11, 2016

This might be an X-Y problem here, but I am also considering if this is necessary to avoid overrunning the ability of the autopilot to process commands. I've added a topic to the forum as well. If there were a mechanism for acknowledgement of a command, then would that be sufficient to allow queuing of another command and expect that it would also be processed? Or alternatively, is there a way to examine the command queue depth and only issue a new command if the depth is <= 2 or some other heuristic? If such were the case then my need for this issue to be solved is reduced greatly.

waTeim commented Apr 11, 2016

This might be an X-Y problem here, but I am also considering if this is necessary to avoid overrunning the ability of the autopilot to process commands. I've added a topic to the forum as well. If there were a mechanism for acknowledgement of a command, then would that be sufficient to allow queuing of another command and expect that it would also be processed? Or alternatively, is there a way to examine the command queue depth and only issue a new command if the depth is <= 2 or some other heuristic? If such were the case then my need for this issue to be solved is reduced greatly.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee Apr 11, 2016

Contributor

@waTeim Essentially there is no way to do this right in every case. The autopilot acknowledges only some commands (those sent with a commandlong) and that acknowledgement is pretty useless in that

  1. it only tells you that something was invalid or was received - it doesn't tell you that the command succeeded or failed.
  2. The command format is lossy, so you can't guarantee the message will arrive at the vehicle or that you will receive the acknowledgement.
  3. There is no way to associate a message to its acknowledgement - though it is most likely that as long as you aren't sending messages too quickly (and that the script and the autopilot are the only things communicating on the network) that the current ack will match last sent.

And yes, it is also quite possible that messages will be swamped. In general commands are replaced by later commands "silently".

Contributor

hamishwillee commented Apr 11, 2016

@waTeim Essentially there is no way to do this right in every case. The autopilot acknowledges only some commands (those sent with a commandlong) and that acknowledgement is pretty useless in that

  1. it only tells you that something was invalid or was received - it doesn't tell you that the command succeeded or failed.
  2. The command format is lossy, so you can't guarantee the message will arrive at the vehicle or that you will receive the acknowledgement.
  3. There is no way to associate a message to its acknowledgement - though it is most likely that as long as you aren't sending messages too quickly (and that the script and the autopilot are the only things communicating on the network) that the current ack will match last sent.

And yes, it is also quite possible that messages will be swamped. In general commands are replaced by later commands "silently".

@waTeim

This comment has been minimized.

Show comment
Hide comment
@waTeim

waTeim Apr 11, 2016

@hamishwillee Hey, thanks for the quick response. Luckily I have a restricted set of commands I'm concerned about. Velocity, Yaw, and gimbal -- well at least for now. I can avoid worrying about item 2 for the most part since this is using localhost. But items 1,3 and the follow-up is a problem because that's exactly what I'm worried about. Sending commands that don't actually get executed. In practice, it looks like if I wait long enough, then it's almost guaranteed, but waiting too long is also a problem because then the dynamics are incorrect.

Kinda OT but, earlier in this (or perhaps related threads) it was suggested you could observe vehicle state and wait for matching the command. Is that reliable w.r.t. velocity and yaw like in the docs? Is the autopilot capable of processing many things in parallel?

waTeim commented Apr 11, 2016

@hamishwillee Hey, thanks for the quick response. Luckily I have a restricted set of commands I'm concerned about. Velocity, Yaw, and gimbal -- well at least for now. I can avoid worrying about item 2 for the most part since this is using localhost. But items 1,3 and the follow-up is a problem because that's exactly what I'm worried about. Sending commands that don't actually get executed. In practice, it looks like if I wait long enough, then it's almost guaranteed, but waiting too long is also a problem because then the dynamics are incorrect.

Kinda OT but, earlier in this (or perhaps related threads) it was suggested you could observe vehicle state and wait for matching the command. Is that reliable w.r.t. velocity and yaw like in the docs? Is the autopilot capable of processing many things in parallel?

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee Apr 11, 2016

Contributor

Hi @waTeim

It just so happens I'm between contracts :-) Hence the speed of reply.

So I've looked into this topic quite extensively when thinking about creating safe "synchronous" functions for setting attributes - #566 (ie that either return when command has succeeded or raise and exception on failure). The conclusion I've come to is that at the moment it is not possible to create robust functions in all cases. It is easy enough in cases like mode changes where the value changes quickly and you can read a value to detect success within a second or so, but not so good for cases like velocity control.

In the case of position, velocity and yaw this is problematic if your changes are rapid so measuring the response takes too long, or if the delta itself is small or within normal deviation for the measured value. You might be able to do it in your specific case, but I haven't been able to do "generically".

Is the autopilot capable of processing many things in parallel?

I don't know. My understanding though is that it does only 2 or three "known things" in parallel - a movement commands and a yaw or gimbal command can be done in parallel. Essentially the autopilot loop runs over incoming messages and sets the target yaw. The system then decides if it is in a state where it can yaw (ie it can only do so after the first movement command has been issued) and then does so.

Hope that helps.

Contributor

hamishwillee commented Apr 11, 2016

Hi @waTeim

It just so happens I'm between contracts :-) Hence the speed of reply.

So I've looked into this topic quite extensively when thinking about creating safe "synchronous" functions for setting attributes - #566 (ie that either return when command has succeeded or raise and exception on failure). The conclusion I've come to is that at the moment it is not possible to create robust functions in all cases. It is easy enough in cases like mode changes where the value changes quickly and you can read a value to detect success within a second or so, but not so good for cases like velocity control.

In the case of position, velocity and yaw this is problematic if your changes are rapid so measuring the response takes too long, or if the delta itself is small or within normal deviation for the measured value. You might be able to do it in your specific case, but I haven't been able to do "generically".

Is the autopilot capable of processing many things in parallel?

I don't know. My understanding though is that it does only 2 or three "known things" in parallel - a movement commands and a yaw or gimbal command can be done in parallel. Essentially the autopilot loop runs over incoming messages and sets the target yaw. The system then decides if it is in a state where it can yaw (ie it can only do so after the first movement command has been issued) and then does so.

Hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment