Added the ability to see command ack #168

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
6 participants
@djnugent

A quick attempt at showing how I think command acknowledgment should be handled. This was denied as an option for issue #114 since mavlink is a "lossy protocol." I am not aware of the properties which make mavlink "lossy", but I feel this method would allow the user to see that the connection is acting "lossy" if the COMMAND_ACK message times out.

Here is an example: https://gist.github.com/djnugent/edfbab1b2458d5773490

@mrpollo @billbonney thoughts?

@eliao

This comment has been minimized.

Show comment
Hide comment
@eliao

eliao Jun 18, 2015

It seems problematic to block on the dronekit script's thread waiting on an ack. Perhaps that works for some use cases, but it would not work for real time applications working over a lossy link. In that case this would cause delays while waiting for an ack that will never come. Over a lossy connection, one needs to assume that any and all packets can be dropped.

eliao commented Jun 18, 2015

It seems problematic to block on the dronekit script's thread waiting on an ack. Perhaps that works for some use cases, but it would not work for real time applications working over a lossy link. In that case this would cause delays while waiting for an ack that will never come. Over a lossy connection, one needs to assume that any and all packets can be dropped.

@mrpollo

This comment has been minimized.

Show comment
Hide comment
@mrpollo

mrpollo Jun 18, 2015

Member

I agree with Eric, the "lossy" really comes from the link between mavproxy and the autopilot, the transport can be anything from a 3DR Radio to Wifi, we can't control the time it takes to get an ACK, some of the symptoms are what @eliao describe, and yes all packets can be dropped.

@geeksville probably has something to say here too

Member

mrpollo commented Jun 18, 2015

I agree with Eric, the "lossy" really comes from the link between mavproxy and the autopilot, the transport can be anything from a 3DR Radio to Wifi, we can't control the time it takes to get an ACK, some of the symptoms are what @eliao describe, and yes all packets can be dropped.

@geeksville probably has something to say here too

@djnugent

This comment has been minimized.

Show comment
Hide comment
@djnugent

djnugent Jun 18, 2015

I put a timeout in so we don't block the thread forever. The documentation would just address the downsides of waiting for a command ack and explain the timeout. And I feel you are less likely to drop a packet if the connection is hard wire serial. i.e. a companion computer to pixhawk which is going to be the main audience for dronekit-python. wait_valid() is a bigger sin with no timeout(unless downloading waypoints is different).

Thanks for the input!

I put a timeout in so we don't block the thread forever. The documentation would just address the downsides of waiting for a command ack and explain the timeout. And I feel you are less likely to drop a packet if the connection is hard wire serial. i.e. a companion computer to pixhawk which is going to be the main audience for dronekit-python. wait_valid() is a bigger sin with no timeout(unless downloading waypoints is different).

Thanks for the input!

@billbonney

This comment has been minimized.

Show comment
Hide comment
@billbonney

billbonney Jun 18, 2015

Checking the ACK form sending a command would confirm the command was accepted etc., but not necessarily acted upon. ie you could see an ACK to send a command to TAke Off, but then it won't Take Off as other conditions will not have been met. As I have written before, when issuing a command the only indication that it was accepted, is that the vehicle properties change to reflect that action.

Checking the ACK form sending a command would confirm the command was accepted etc., but not necessarily acted upon. ie you could see an ACK to send a command to TAke Off, but then it won't Take Off as other conditions will not have been met. As I have written before, when issuing a command the only indication that it was accepted, is that the vehicle properties change to reflect that action.

@billbonney

This comment has been minimized.

Show comment
Hide comment
@billbonney

billbonney Jun 18, 2015

PS: And qhat should be the value of the timeout? 0.5s or 5s or 10s. Timeouts are really tricky to gauge in protocols. A timeout for arming is actually about 5s and then variable.

PS: And qhat should be the value of the timeout? 0.5s or 5s or 10s. Timeouts are really tricky to gauge in protocols. A timeout for arming is actually about 5s and then variable.

@djnugent

This comment has been minimized.

Show comment
Hide comment
@djnugent

djnugent Jun 18, 2015

Why would the vehicle not take off if it sent a MAV_RESULT_ACCEPT??? A MAV_RESULT_FAILED is sent if you are not in guided mode, not armed, and have already taken off a previously. I am confused...

Why would the vehicle not take off if it sent a MAV_RESULT_ACCEPT??? A MAV_RESULT_FAILED is sent if you are not in guided mode, not armed, and have already taken off a previously. I am confused...

@djnugent

This comment has been minimized.

Show comment
Hide comment
@djnugent

djnugent Jun 18, 2015

Yes default timeouts are hard. We could also just set the default to 0 and make the user figure out the best timeout for their command.

Yes default timeouts are hard. We could also just set the default to 0 and make the user figure out the best timeout for their command.

@tcr3dr

This comment has been minimized.

Show comment
Hide comment
@tcr3dr

tcr3dr Jun 18, 2015

Contributor

Throwing in some comments off the top of my head, ideally you'd want something you can poll for success, so it could easily integrate into gevent.

import gevent

def command_1():
   await_ack(task())

def command_2():
   await_ack(task())

gevent.spawn(command_1)
gevent.spawn(command_2)
# Nothing is blocked, it's just in the event loop
gevent.wait()

Do we want to highly discourage people from just blocking waiting for an ack, but instead allow them to poll if they want complete error handling?

Contributor

tcr3dr commented Jun 18, 2015

Throwing in some comments off the top of my head, ideally you'd want something you can poll for success, so it could easily integrate into gevent.

import gevent

def command_1():
   await_ack(task())

def command_2():
   await_ack(task())

gevent.spawn(command_1)
gevent.spawn(command_2)
# Nothing is blocked, it's just in the event loop
gevent.wait()

Do we want to highly discourage people from just blocking waiting for an ack, but instead allow them to poll if they want complete error handling?

@eliao

This comment has been minimized.

Show comment
Hide comment
@eliao

eliao Jun 18, 2015

Ideally, user code shouldn't have to deal with this. In a non-lossy environment, users don't have to. In a lossy environment, this could be solved at the protocol level via retries (preferred solution). A somewhat worse solution is having DroneKit itself spawn a thread for these commands and ensuring they get completed. This is what many of our applications currently do.

eliao commented Jun 18, 2015

Ideally, user code shouldn't have to deal with this. In a non-lossy environment, users don't have to. In a lossy environment, this could be solved at the protocol level via retries (preferred solution). A somewhat worse solution is having DroneKit itself spawn a thread for these commands and ensuring they get completed. This is what many of our applications currently do.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee Jun 18, 2015

Contributor

I'm most interested in the MAV_RESULT_FAILED as a way to understand there is a problem earlier than a timeout and get some idea of what the failure might be. The API should be asynchronous with some way to make it blockable if that is the way you want to code. It would be "best" if the callback is associated with a particular sent command and if for whatever reason it doesn't complete there is some cleanup on it.

Contributor

hamishwillee commented Jun 18, 2015

I'm most interested in the MAV_RESULT_FAILED as a way to understand there is a problem earlier than a timeout and get some idea of what the failure might be. The API should be asynchronous with some way to make it blockable if that is the way you want to code. It would be "best" if the callback is associated with a particular sent command and if for whatever reason it doesn't complete there is some cleanup on it.

@hamishwillee

This comment has been minimized.

Show comment
Hide comment
@hamishwillee

hamishwillee Jun 18, 2015

Contributor

Somewhat related, our solution should fit in with whatever we need to deliver #169

Contributor

hamishwillee commented Jun 18, 2015

Somewhat related, our solution should fit in with whatever we need to deliver #169

@djnugent

This comment has been minimized.

Show comment
Hide comment
@djnugent

djnugent Jun 18, 2015

@tcr3dr I'm not familiar with events so you lost me there.
@eliao Retries on the lower lever is ideal, but that is beyond me. This was just an experiment.

I think call backs are great to prevent blocking but I imagine we are reaching out to "arduino" type developers who are coding amateurs. I bet they wouldn't care to wait for command_ack. It's just part of the API! The idea is to be able to do a,b,c,d... in sequential order. Dealing with mavlinks lossy connection is no different than dealing with a noisy sensor with arduino. You have to filter out bad behavior. This method is not super effective or ideal but it makes it easy for the new user.

Obviously we still need call back support for the advanced users.

So is this a good temporary solution or does it cause more problems than benefits?

@tcr3dr I'm not familiar with events so you lost me there.
@eliao Retries on the lower lever is ideal, but that is beyond me. This was just an experiment.

I think call backs are great to prevent blocking but I imagine we are reaching out to "arduino" type developers who are coding amateurs. I bet they wouldn't care to wait for command_ack. It's just part of the API! The idea is to be able to do a,b,c,d... in sequential order. Dealing with mavlinks lossy connection is no different than dealing with a noisy sensor with arduino. You have to filter out bad behavior. This method is not super effective or ideal but it makes it easy for the new user.

Obviously we still need call back support for the advanced users.

So is this a good temporary solution or does it cause more problems than benefits?

@billbonney

This comment has been minimized.

Show comment
Hide comment
@billbonney

billbonney Jun 19, 2015

The callback ack from the command long doesn't add much to the logic. Even for the advanced user. It's indicative in a log for example. But in the end most actions commanded are only relevant when you look at the result from the vehicle status. I.e. A change in mode by monitoring the heartbeat and the status text.
If you want to make this easier for the Arduino crowd. Just make the default to ignore the ack as a false sense that a command has been sent and communicate in the docs how this works. Command long even has a bit to request the ack is not even sent.
In AP2 rarely do we check to see the result other than to put it in the log sometimes

The callback ack from the command long doesn't add much to the logic. Even for the advanced user. It's indicative in a log for example. But in the end most actions commanded are only relevant when you look at the result from the vehicle status. I.e. A change in mode by monitoring the heartbeat and the status text.
If you want to make this easier for the Arduino crowd. Just make the default to ignore the ack as a false sense that a command has been sent and communicate in the docs how this works. Command long even has a bit to request the ack is not even sent.
In AP2 rarely do we check to see the result other than to put it in the log sometimes

@djnugent

This comment has been minimized.

Show comment
Hide comment
@djnugent

djnugent Jul 1, 2015

Not a complete solution.

djnugent commented Jul 1, 2015

Not a complete solution.

@djnugent djnugent closed this Jul 1, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment