-
Notifications
You must be signed in to change notification settings - Fork 16.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCS_Common: whitelist AUTOPILOT_VERSION for in_delay_callback sending #13727
GCS_Common: whitelist AUTOPILOT_VERSION for in_delay_callback sending #13727
Conversation
GCSs may request this very early on in the boot process, particularly for SITL. If we try to send it during a delay callback then we end up dropping it at the moment - but we'd already sent the ack in response to the request.
So the issue here is that a gcs really wants to know capability bits asap. In most cases it will delay completion of a vehicle connection state machine to wait for it. And depending on how the gcs is implemented with respect to timeouts and retries it may slow down the process of completing the vehicle connection. Which in turn could make the user experience crappy/slow. |
@DonLakeFlyer Yep - thus whitelisting it here while we look at the deeper issue I think is a good idea :-) Can you confirm this has the autopilot spit the message out as well as the ack for your test cases, please? |
I'll give it a try tomorrow. I"m throwing in the towel for the day. Tired of chasing crazed race conditions in spaghetti QGC code. |
I don't really know what I'm talking about but maybe AP should just proactively send the capabilities soon after connecting? |
FYI: I've been testing current ArduPilot with respect to when it finally will send AUTOPILOT_VERSION. It takes a full 5 seconds from the Vehicle showing up in QGC after connect USB to before it will reponsd to the request and send the AUTOPILOT_VERSION. That's a really nasty delay. Haven't had time to try this new code. Still fighting with this on QGC Stable trying to get it to work with QGC and current ArduPilot releases. |
Unless it will contiuously stream that won't work. Need to guarantee delivery. |
I tested this with ArduCopter and I get AUTOPILOT_VERSION back the first time I ask for it. But it reports support for rally, but no support for fence. So not sure what that's about. Master/Stable ardupilot reports correct bits for copter. |
Sorry, I'm having trouble parsing that. Your second sentence seems to say it doesn't work but your third says it does seem to work? |
Correct :)
|
@DonLakeFlyer Hmph. So we probably shouldn't merge this until we're returning the correct bits. I can't see anything here by inspection. Copter.h has AC_Fence as a normal member of the Copter object - not a pointer. So it should be constructed at program startup. Its constructor sets the singleton pointer - which is what we're getting with saying "AP::fence()". So I can't see the race condition here :-( I'll have to try to reproduce. |
@DonLakeFlyer I've just tried to reproduce here and failed. I:
So I'm unable to reproduce what you're seeing - the fence bit changing value during the boot process. If we ever moved to dynamically allocate these objects we could certainly see the problem - and that's a real possibility on Plane - but we're not doing that AFAICS. Any ideas? |
GCSs may request this very early on in the boot process, particularly
for SITL.
If we try to send it during a delay callback then we end up dropping it
at the moment - but we'd already sent the ack in response to the
request.
The sequence here is that we receive the request for the message from the GCS, and put that into the pushed_ap_message_ids bitmask. Something then calls
delay_ms(...)
for some period of time, and we start to push messages out. We choose this one to go out - but intry_send_message
we discover that we're not allowed to send it. We return true, saying we sent it - lying - if we don't do this then at the moment we would effectively have "blocker" messages which stop any of the in-delay-callback-whitelisted-messages (e.g. heartbeat) getting out.The more elaborate fix for this is probably to have
try_send_message
return an enumeration value instead of true/false. There are three callers, however, and things sending from a queue might have problems moving onto the next entry while preserving the need to send the first. An alternate fix might be for the caller to check if we're in a delay callback itself and change its behaviour appropriately - so, for example:would become more elaborate.
Another option might be to whitelist virtually all of the messages - that requires an audit of the message sending to see how long they could possibly take and/or modifying the existing timing code to ensure the deadlines are not exceeded.