Skip to content

recovery: refactor sent packets tracking#469

Merged
ghedo merged 1 commit intomasterfrom
recovery-refactor-sent
Jun 3, 2020
Merged

recovery: refactor sent packets tracking#469
ghedo merged 1 commit intomasterfrom
recovery-refactor-sent

Conversation

@ghedo
Copy link
Copy Markdown
Member

@ghedo ghedo commented Apr 20, 2020

Instead of using a BTreeMap, use a VecDeque to track sent packets, which
makes it easier to keep track of acked and lost packets for a short time
after they have been acked or declared lost instead of immediately
removing them from the list, which in turn will allow detecting things
like DSACKs and spurious losses.

Similarly to how the BTreeMap was used, sent packets are appended to the
back of the queue (so they end up being sorted by packet number / sent
time). Access now happen by iterating over the queue from the start
rather than looking up specific packets by packet number. When acked or
declared lost, the packets are simply marked accordingly without being
removed from the queue.

At predefined times, the semt packets marked as acked or lost are then
removed from the start of the queue in contiguous blocks, in order to
avoid having to copy elements around to cover gaps in the queue caused
by removed elements. Later this can be changed to e.g. delay cleaning of
lost packets for 1 RTT after they were marked as lost.

This new strcture also allows us to follow the -recovery draft
pseudo-code somewhat more closely, and as a side-effect, there are no
more potentially expensive BTreeMap lookups in hot codepaths, which
might help reduce CPU usage when e.g. processing ACK frames (flamegraphs
generated on my laptop show that the time spent inside
Recovery::on_ack_received() is cut in half).


This is based on #468. Note that before merging we should do more testing in the lab to make sure this doesn't have any adverse side-effects.

@ghedo ghedo requested a review from a team as a code owner April 20, 2020 11:59
@ghedo ghedo force-pushed the recovery-update branch from b8b50fb to 8953886 Compare April 21, 2020 13:36
@ghedo ghedo force-pushed the recovery-refactor-sent branch from 0808c46 to 3043f90 Compare April 21, 2020 13:38
@junhochoi
Copy link
Copy Markdown
Contributor

While the CC will not have access to the full acked packet information
anymore, I believe that simply passing the cumulative acked bytes count
and largest acked sent time should be enough for both Reno and CUBIC
(and Hystart as well, though need to check that more carefully). Not
sure about BBR though.

I think r/w access to the packet is required for various congestion control related implementation. For example CUBIC (#475) need packet.time and packet.size which can be covered in this PR but HyStart++ (#476) need a new packet.pkt_num of the packet to detect the RTT round. Also BBR need to access/update additional fields such as packet.{delivered,delivered_rate,rtt} etc - see ttps://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-control-00

@ghedo ghedo force-pushed the recovery-update branch from 8953886 to 7a5c79d Compare April 22, 2020 10:44
@ghedo ghedo changed the base branch from recovery-update to master April 22, 2020 10:49
@ghedo ghedo force-pushed the recovery-refactor-sent branch from 3043f90 to ecdde0f Compare April 22, 2020 10:49
@ghedo
Copy link
Copy Markdown
Member Author

ghedo commented Apr 22, 2020

HyStart++ (#476) need a new packet.pkt_num of the packet to detect the RTT round

The new on_packets_acked() callback has access to the largest_acked_pkt field in Recovery like any other callback, is that not enough? We could also explicitly pass the largest acked number from the ACK frame itself, but that doesn't seem to be needed.

It also seems to me that Linux works the same way as this PR, that is, it first processes SACKs in a loop, and then calls tcp_cong_control() once per-ACK packet, so their Hystart implementation needs to work that way too, no? Is Hystart++ so different from Hystart that it cannot work like that?

Though, in the worst case scenario, each callback can still loop through the r.sent[epoch] map for packets that have time_acked.is_some(), though instead of doing that always like we do now, they could do that just when they actually need it.

Also BBR need to access/update additional fields such as packet.{delivered,delivered_rate,rtt} etc

Why would it need to update information of acked packets? Like, it seems pointless to modify the Sent struct within on_packet_acked() / on_packets_acked() given that the packet is being removed from the sent list. Do you mean for delivery rate estimation? AFAICT that is done separately from on_packet_acked() / on_packets_acked() right now, will BBR need to change that?

But in any case I think the point from before applies, Linux seems to follow the same process as what is implemented in this PR, so if they can implement BBR like that why can't we?

@junhochoi
Copy link
Copy Markdown
Contributor

junhochoi commented Apr 22, 2020

I think now I understand what you try to do.

First of all I don't think we want to follow what linux tcp does - different OS has different tcp implementation. However while I agree we may not need to access Sent structure of acked packet fully, it's still valuable to follow the spec first. For BBR, I need to correctly myself that we don't need to update fields on the acked packet (it need to access in on_packet_sent()).

I will add a comment in the code instead.

It also seems to me that Linux works the same way as this PR, that is, it first processes SACKs in a loop, and then calls tcp_cong_control() once per-ACK packet, so their Hystart implementation needs to work that way too, no? Is Hystart++ so different from Hystart that it cannot work like that?

In linux tcp, it pass the number of packet acked by the ack additionally. Hystart++ need the same information but it's ok for now because current master iterates over acked packet so we can get acked packet count from there.

Comment thread src/recovery.rs Outdated
Comment thread src/recovery.rs
Comment thread src/recovery.rs
@ghedo ghedo force-pushed the recovery-refactor-sent branch 2 times, most recently from 0b7493c to ff6719e Compare April 24, 2020 17:33
@ghedo ghedo force-pushed the recovery-refactor-sent branch 2 times, most recently from 0415612 to 1135e74 Compare May 12, 2020 10:36
@ghedo ghedo force-pushed the recovery-refactor-sent branch 3 times, most recently from 48e7409 to f2049a6 Compare June 1, 2020 11:57
Comment thread src/recovery.rs Outdated
Comment thread src/recovery.rs Outdated
LPardue
LPardue previously approved these changes Jun 1, 2020
Comment thread src/recovery.rs
Instead of using a BTreeMap, use a VecDeque to track sent packets, which
makes it easier to keep track of acked and lost packets for a short time
after they have been acked or declared lost instead of immediately
removing them from the list, which in turn will allow detecting things
like DSACKs and spurious losses.

Similarly to how the BTreeMap was used, sent packets are appended to the
back of the queue (so they end up being sorted by packet number / sent
time). Access now happen by iterating over the queue from the start
rather than looking up specific packets by packet number. When acked or
declared lost, the packets are simply marked accordingly _without_ being
removed from the queue.

At predefined times, the semt packets marked as acked or lost are then
removed from the start of the queue in contiguous blocks, in order to
avoid having to copy elements around to cover gaps in the queue caused
by removed elements. Later this can be changed to e.g. delay cleaning of
lost packets for 1 RTT after they were marked as lost.

This new strcture also allows us to follow the -recovery draft
pseudo-code somewhat more closely, and as a side-effect, there are no
more potentially expensive BTreeMap lookups in hot codepaths, which
might help reduce CPU usage when e.g. processing ACK frames (flamegraphs
generated on my laptop show that the time spent inside
Recovery::on_ack_received() is cut in half).
@ghedo ghedo merged commit 6259a29 into master Jun 3, 2020
@ghedo ghedo deleted the recovery-refactor-sent branch June 3, 2020 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants