recovery: refactor sent packets tracking by ghedo · Pull Request #469 · cloudflare/quiche

ghedo · 2020-04-20T11:59:10Z

Instead of using a BTreeMap, use a VecDeque to track sent packets, which
makes it easier to keep track of acked and lost packets for a short time
after they have been acked or declared lost instead of immediately
removing them from the list, which in turn will allow detecting things
like DSACKs and spurious losses.

Similarly to how the BTreeMap was used, sent packets are appended to the
back of the queue (so they end up being sorted by packet number / sent
time). Access now happen by iterating over the queue from the start
rather than looking up specific packets by packet number. When acked or
declared lost, the packets are simply marked accordingly without being
removed from the queue.

At predefined times, the semt packets marked as acked or lost are then
removed from the start of the queue in contiguous blocks, in order to
avoid having to copy elements around to cover gaps in the queue caused
by removed elements. Later this can be changed to e.g. delay cleaning of
lost packets for 1 RTT after they were marked as lost.

This new strcture also allows us to follow the -recovery draft
pseudo-code somewhat more closely, and as a side-effect, there are no
more potentially expensive BTreeMap lookups in hot codepaths, which
might help reduce CPU usage when e.g. processing ACK frames (flamegraphs
generated on my laptop show that the time spent inside
Recovery::on_ack_received() is cut in half).

This is based on #468. Note that before merging we should do more testing in the lab to make sure this doesn't have any adverse side-effects.

junhochoi · 2020-04-22T00:33:39Z

While the CC will not have access to the full acked packet information
anymore, I believe that simply passing the cumulative acked bytes count
and largest acked sent time should be enough for both Reno and CUBIC
(and Hystart as well, though need to check that more carefully). Not
sure about BBR though.

I think r/w access to the packet is required for various congestion control related implementation. For example CUBIC (#475) need packet.time and packet.size which can be covered in this PR but HyStart++ (#476) need a new packet.pkt_num of the packet to detect the RTT round. Also BBR need to access/update additional fields such as packet.{delivered,delivered_rate,rtt} etc - see ttps://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00

ghedo · 2020-04-22T12:04:17Z

HyStart++ (#476) need a new packet.pkt_num of the packet to detect the RTT round

The new on_packets_acked() callback has access to the largest_acked_pkt field in Recovery like any other callback, is that not enough? We could also explicitly pass the largest acked number from the ACK frame itself, but that doesn't seem to be needed.

It also seems to me that Linux works the same way as this PR, that is, it first processes SACKs in a loop, and then calls tcp_cong_control() once per-ACK packet, so their Hystart implementation needs to work that way too, no? Is Hystart++ so different from Hystart that it cannot work like that?

Though, in the worst case scenario, each callback can still loop through the r.sent[epoch] map for packets that have time_acked.is_some(), though instead of doing that always like we do now, they could do that just when they actually need it.

Also BBR need to access/update additional fields such as packet.{delivered,delivered_rate,rtt} etc

Why would it need to update information of acked packets? Like, it seems pointless to modify the Sent struct within on_packet_acked() / on_packets_acked() given that the packet is being removed from the sent list. Do you mean for delivery rate estimation? AFAICT that is done separately from on_packet_acked() / on_packets_acked() right now, will BBR need to change that?

But in any case I think the point from before applies, Linux seems to follow the same process as what is implemented in this PR, so if they can implement BBR like that why can't we?

junhochoi · 2020-04-22T19:00:01Z

I think now I understand what you try to do.

First of all I don't think we want to follow what linux tcp does - different OS has different tcp implementation. However while I agree we may not need to access Sent structure of acked packet fully, it's still valuable to follow the spec first. For BBR, I need to correctly myself that we don't need to update fields on the acked packet (it need to access in on_packet_sent()).

I will add a comment in the code instead.

It also seems to me that Linux works the same way as this PR, that is, it first processes SACKs in a loop, and then calls tcp_cong_control() once per-ACK packet, so their Hystart implementation needs to work that way too, no? Is Hystart++ so different from Hystart that it cannot work like that?

In linux tcp, it pass the number of packet acked by the ack additionally. Hystart++ need the same information but it's ok for now because current master iterates over acked packet so we can get acked packet count from there.

Instead of using a BTreeMap, use a VecDeque to track sent packets, which makes it easier to keep track of acked and lost packets for a short time after they have been acked or declared lost instead of immediately removing them from the list, which in turn will allow detecting things like DSACKs and spurious losses. Similarly to how the BTreeMap was used, sent packets are appended to the back of the queue (so they end up being sorted by packet number / sent time). Access now happen by iterating over the queue from the start rather than looking up specific packets by packet number. When acked or declared lost, the packets are simply marked accordingly _without_ being removed from the queue. At predefined times, the semt packets marked as acked or lost are then removed from the start of the queue in contiguous blocks, in order to avoid having to copy elements around to cover gaps in the queue caused by removed elements. Later this can be changed to e.g. delay cleaning of lost packets for 1 RTT after they were marked as lost. This new strcture also allows us to follow the -recovery draft pseudo-code somewhat more closely, and as a side-effect, there are no more potentially expensive BTreeMap lookups in hot codepaths, which might help reduce CPU usage when e.g. processing ACK frames (flamegraphs generated on my laptop show that the time spent inside Recovery::on_ack_received() is cut in half).

ghedo requested a review from a team as a code owner April 20, 2020 11:59

ghedo mentioned this pull request Apr 20, 2020

Adaptive reordering thresholds #470

Merged

ghedo force-pushed the recovery-update branch from b8b50fb to 8953886 Compare April 21, 2020 13:36

ghedo force-pushed the recovery-refactor-sent branch from 0808c46 to 3043f90 Compare April 21, 2020 13:38

ghedo force-pushed the recovery-update branch from 8953886 to 7a5c79d Compare April 22, 2020 10:44

ghedo changed the base branch from recovery-update to master April 22, 2020 10:49

ghedo force-pushed the recovery-refactor-sent branch from 3043f90 to ecdde0f Compare April 22, 2020 10:49

junhochoi reviewed Apr 22, 2020

View reviewed changes

Comment thread src/recovery.rs Outdated

junhochoi reviewed Apr 23, 2020

View reviewed changes

Comment thread src/recovery.rs

junhochoi reviewed Apr 23, 2020

View reviewed changes

Comment thread src/recovery.rs

ghedo force-pushed the recovery-refactor-sent branch 2 times, most recently from 0b7493c to ff6719e Compare April 24, 2020 17:33

ghedo force-pushed the recovery-refactor-sent branch 2 times, most recently from 0415612 to 1135e74 Compare May 12, 2020 10:36

ghedo force-pushed the recovery-refactor-sent branch 3 times, most recently from 48e7409 to f2049a6 Compare June 1, 2020 11:57

LPardue reviewed Jun 1, 2020

View reviewed changes

Comment thread src/recovery.rs Outdated

LPardue reviewed Jun 1, 2020

View reviewed changes

Comment thread src/recovery.rs Outdated

LPardue previously approved these changes Jun 1, 2020

View reviewed changes

junhochoi reviewed Jun 2, 2020

View reviewed changes

Comment thread src/recovery.rs

ghedo dismissed LPardue’s stale review via c1bd7eb June 2, 2020 10:25

ghedo force-pushed the recovery-refactor-sent branch from f2049a6 to c1bd7eb Compare June 2, 2020 10:25

junhochoi approved these changes Jun 3, 2020

View reviewed changes

ghedo merged commit 6259a29 into master Jun 3, 2020

ghedo deleted the recovery-refactor-sent branch June 3, 2020 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recovery: refactor sent packets tracking#469

recovery: refactor sent packets tracking#469
ghedo merged 1 commit intomasterfrom
recovery-refactor-sent

ghedo commented Apr 20, 2020 •

edited

Loading

Uh oh!

junhochoi commented Apr 22, 2020

Uh oh!

ghedo commented Apr 22, 2020

Uh oh!

junhochoi commented Apr 22, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ghedo commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junhochoi commented Apr 22, 2020

Uh oh!

ghedo commented Apr 22, 2020

Uh oh!

junhochoi commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ghedo commented Apr 20, 2020 •

edited

Loading

junhochoi commented Apr 22, 2020 •

edited

Loading