Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS_MAVLink: respect txbuf flow control for FTP messages #26098

Merged
merged 3 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions libraries/GCS_MAVLink/GCS.h
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ class GCS_MAVLINK
return last_radio_status.remrssi_ms;
}
static float telemetry_radio_rssi(); // 0==no signal, 1==full signal
static bool last_txbuf_is_greater(uint8_t txbuf_limit);

// mission item index to be sent on queued msg, delayed or not
uint16_t mission_item_reached_index = AP_MISSION_CMD_INDEX_NONE;
Expand Down Expand Up @@ -481,7 +482,6 @@ class GCS_MAVLINK

virtual uint64_t capabilities() const;
uint16_t get_stream_slowdown_ms() const { return stream_slowdown_ms; }
uint8_t get_last_txbuf() const { return last_txbuf; }

MAV_RESULT set_message_interval(uint32_t msg_id, int32_t interval_us);

Expand Down Expand Up @@ -768,6 +768,7 @@ class GCS_MAVLINK
uint32_t remrssi_ms;
uint8_t rssi;
uint32_t received_ms; // time RADIO_STATUS received
uint8_t txbuf = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new timeout we don't need to init to 100 anymore.

Suggested change
uint8_t txbuf = 100;
uint8_t txbuf;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the check for 0 ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more efficient to initialize just once than to execute the extra check every time.

} last_radio_status;

enum class Flags {
Expand Down Expand Up @@ -813,8 +814,6 @@ class GCS_MAVLINK

// number of extra ms to add to slow things down for the radio
uint16_t stream_slowdown_ms;
// last reported radio buffer percent available
uint8_t last_txbuf = 100;

// outbound ("deferred message") queue.

Expand Down
13 changes: 11 additions & 2 deletions libraries/GCS_MAVLink/GCS_Common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,15 @@ float GCS_MAVLINK::telemetry_radio_rssi()
return last_radio_status.rssi/254.0f;
}

bool GCS_MAVLINK::last_txbuf_is_greater(uint8_t txbuf_limit)
{
if (AP_HAL::millis() - last_radio_status.received_ms > 5000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (AP_HAL::millis() - last_radio_status.received_ms > 5000) {
if ((last_radio_status.received_ms == 0) || ((AP_HAL::millis() - last_radio_status.received_ms) > 5000)) {

Check for zero to make sure we do the correct thing if never received.

Copy link
Contributor Author

@brad112358 brad112358 Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a comment pointing out that the initialization of txbuf accomplishes this be OK instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, comment is OK too. Really the key thing is to be able to understand what is going on without having to go look in the .h.

// stale report
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// stale report
// stale report, assume 100% buffer space

return true;
}
return last_radio_status.txbuf > txbuf_limit;
}

void GCS_MAVLINK::handle_radio_status(const mavlink_message_t &msg)
{
mavlink_radio_t packet;
Expand All @@ -849,7 +858,7 @@ void GCS_MAVLINK::handle_radio_status(const mavlink_message_t &msg)
last_radio_status.remrssi_ms = now;
}

last_txbuf = packet.txbuf;
last_radio_status.txbuf = packet.txbuf;

// use the state of the transmit buffer in the radio to
// control the stream rate, giving us adaptive software
Expand Down Expand Up @@ -1164,7 +1173,7 @@ uint16_t GCS_MAVLINK::get_reschedule_interval_ms(const deferred_message_bucket_t
interval_ms *= 4;
}
#if AP_MAVLINK_FTP_ENABLED
if (AP_HAL::millis() - ftp.last_send_ms < 500) {
if (AP_HAL::millis() - ftp.last_send_ms < 1000) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd that we would use 1 second here but 3 seconds to time out a FTP session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at QGroundControl which has a 1 second ack/nak timeout with 3 retries. Does Mission Planner use a 3 second timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of this timeout:

#define FTP_SESSION_TIMEOUT 3000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(our responses crossed here) OK. I think things should work OK with this changed to 3 seconds. The stream rates will just take 3 seconds to go back to normal instead of 1. Should we move the define to a header file and use FTP_SESSION_TIMEOUT instead of 500 or 1000 in GCS_Common.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after more consideration, I think there is little reason that this timeout should be the same as the session timeout. This is essentially a heuristic and should be based on the expected behavior of the GCS, including radio latency, but doesn't need to match the session timeout which really needs to be conservative (long) as discarding the open state too early would be bad. With the current GCS behavior, I think 1 second is fine.
On the other hand, it would be easier for slower radios to make this work well if the GCS FTP retry timeout was a bit longer, say 2 seconds. Bumping this from 500 to 1000 was enough for mLRS which was already working hard to keep buffered data to a minimum. But, ELRS is even slower than mLRS. Should we increase this to 2 seconds in anticipation that some work may also be required on the GCS side to help out ELRS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that if FTP thinks the session is still active then we should still lower its data rates. However a quick test shows the MAVProxy terminates the session after param download whereas QGC and MP do not. If we get the GCS to terminate correctly there should be no problem in using the longer time out here.

// we are sending ftp replies
interval_ms *= 4;
}
Expand Down
29 changes: 18 additions & 11 deletions libraries/GCS_MAVLink/GCS_FTP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,12 +97,8 @@ void GCS_MAVLINK::handle_file_transfer_protocol(const mavlink_message_t &msg) {

bool GCS_MAVLINK::send_ftp_reply(const pending_ftp &reply)
{
/*
provide same banner we would give with old param download
*/
if (ftp.need_banner_send_mask & (1U<<reply.chan)) {
ftp.need_banner_send_mask &= ~(1U<<reply.chan);
send_banner();
if (!last_txbuf_is_greater(33)) { // It helps avoid GCS timeout if this is less than the threshold where we slow down normal streams (<=49)
return false;
}
WITH_SEMAPHORE(comm_chan_lock(reply.chan));
if (!HAVE_PAYLOAD_SPACE(chan, FILE_TRANSFER_PROTOCOL)) {
Expand All @@ -121,11 +117,6 @@ bool GCS_MAVLINK::send_ftp_reply(const pending_ftp &reply)
reply.chan,
0, reply.sysid, reply.compid,
payload);
if (reply.req_opcode == FTP_OP::TerminateSession) {
ftp.last_send_ms = 0;
} else {
ftp.last_send_ms = AP_HAL::millis();
}
return true;
}

Expand Down Expand Up @@ -155,9 +146,25 @@ void GCS_MAVLINK::ftp_error(struct pending_ftp &response, FTP_ERROR error) {
// send our response back out to the system
void GCS_MAVLINK::ftp_push_replies(pending_ftp &reply)
{
ftp.last_send_ms = AP_HAL::millis(); // Used to detect active FTP session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to detect the active FTP session as soon as possible in get_reschedule_interval_ms(), even if we are blocked from sending the first response because of txbuf flow control. This slows down normal streams sooner and allows the radio's buffer to drain more quickly and reduces the delay the GCS sees, which helps avoid GCS timout on the first FTP response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. But its there now a risk of of FTP timing out if we sit here waiting to send for a while. Looks like the threshold is 3 seconds? Maybe we need a better way to check if there is FTP stuff happening than just looking at last_send_ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, since messages can be lost and the GCS may give up on FTP, testing for FTP activity will need to involve a timeout and I think this may be the simplest way. Where is the timeout 3 seconds? QGC seems to use 1 second timeouts.


while (!send_ftp_reply(reply)) {
hal.scheduler->delay(2);
}

if (reply.req_opcode == FTP_OP::TerminateSession) {
ftp.last_send_ms = 0;
}
IamPete1 marked this conversation as resolved.
Show resolved Hide resolved
/*
provide same banner we would give with old param download
Do this after send_ftp_reply() to get the first FTP response out sooner
on slow links to avoid GCS timeout. The slowdown of normal streams in
get_reschedule_interval_ms() should help for subsequent responses.
*/
if (ftp.need_banner_send_mask & (1U<<reply.chan)) {
ftp.need_banner_send_mask &= ~(1U<<reply.chan);
send_banner();
}
}

void GCS_MAVLINK::ftp_worker(void) {
Expand Down
2 changes: 1 addition & 1 deletion libraries/GCS_MAVLink/GCS_Param.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ GCS_MAVLINK::queued_param_send()
}
count -= async_replies_sent_count;

while (count && _queued_parameter != nullptr && get_last_txbuf() > 50) {
while (count && _queued_parameter != nullptr && last_txbuf_is_greater(33)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a static constexpr for the magic number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we call it? one_third? Is that really better? Anything less obvious and I would be annoyed when reading the code to have to look up the constant. I considered eliminating the parameter and hard code 33 in last_txbuf_is_greater(), but I wanted to allow for other threshold values, even though I doubt we will need them.

But I'll make the change if you let me know what I should call it.

char param_name[AP_MAX_NAME_SIZE];
_queued_parameter->copy_name_token(_queued_parameter_token, param_name, sizeof(param_name), true);

Expand Down
Loading