Skip to content
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.

RACK (and QUIC) #87

Closed
goelvidhi opened this issue Aug 31, 2021 · 13 comments
Closed

RACK (and QUIC) #87

goelvidhi opened this issue Aug 31, 2021 · 13 comments
Assignees

Comments

@goelvidhi
Copy link
Contributor

goelvidhi commented Aug 31, 2021

Markku Kojo said,

The draft states that RACK (and QUIC loss detection) can be used
with CUBIC to detect losses. However, it seems to have gone
unnoticed that RACK may also detect loss of a retransmission in
which case the congestion control response is required to be taken
twice, i.e., ssthresh and cwnd must be lowered again (MUST in
RFC 5681 Sec. 4.3). Once RACK got published all new congestion
controls and updates to existing RFCs must include this essential
congestion control response, if the congestion control mechanism
intends to use RACK for loss detection.

This draft does not have any such requirement nor does it specify
how this is done?

@goelvidhi goelvidhi self-assigned this Sep 1, 2021
@bbriscoe
Copy link
Contributor

bbriscoe commented Sep 15, 2021

RACK detects a loss whether retransmission or not, and §3.4 of RACK [RFC8985] says:

Note that [RFC5681] mandates a
principle that loss in two successive windows of data or the loss of
a retransmission must be taken as two indications of congestion and
therefore results in two separate congestion control reactions.

So surely it's not right to say "all new congestion controls and updates to existing RFCs must include this essential congestion control response." A congestion control that uses RACK surely doesn't need to say anything special about lost retransmission, because just referring to the RACK RFC will be enough.

If I'm right, I suggest this issue is just closed with no action.

@larseggert
Copy link
Contributor

@markkukojo are you OK with the resolution @bbriscoe proposes?

@goelvidhi
Copy link
Contributor Author

goelvidhi commented Sep 15, 2021

In addition to what Bob said, a packet loss whether of original packet or of a retransmission - both are considered loss. So, ssthresh and cwnd will be reduced twice. And as Bob said just mentioning RACK as a method to detect loss should be enough.

After a window reduction in response to a congestion event detected by duplicate ACKs, Explicit Congestion Notification-Echo (ECN-Echo, ECE) ACKs {{!RFC3168}}, TCP RACK {{!RFC8985}} or QUIC loss detection {{!RFC9002}}

@goelvidhi
Copy link
Contributor Author

@markkukojo can you check the above comments and let us know if there is still some unresolved?

@markkukojo
Copy link

Thanks for reminding and apologies for replying so slowly.

Unfortunately Bob is very wrong here, or he just misunderstood my comment.

RACK is a pure loss detection algorithm and RFC 8985 states it explicitly. So, it does not and cannot provide any congestion control response, i.e., it does not provide any algorithm nor actions to take when a loss of retransmission is detected. Therefore, RFC 8985 just reminds that when RACK is used for loss detection the congestion control response in use MUST follow this essential principle of reacting twice. For example, RFC 8985 does not say how CUBIC should select the new value of W_max, when a loss of rexmit is detected, so referring to RFC 8985 cannot be enough.

When RACK was published, it introduced a way to detect a loss of a rexmit that is something that never existed in the RFC series before and hence no congestion control response in the RFC series is prepared to handle it. In other words, we do not have any RFC that advises an implementer how to do it correctly. Moreover, RFC 6675 machinery simply does not work if a loss of rexmit is detected. And, it is not a trivial "just reduce twice" modification but the whole machinery needs to be carefully rethought. Therefore, RFC 8985 explicitly prohibits using RACK with RFC 6675 (MUST NOT). Consequently, a possible update to RFC 6675 needs to include the response to a lost rexmit, if it intends to allow the use of RACK with it. Correspondingly, any new congestion control response algorithm must include correct actions to implement this "reduce twice" correctly in all scenarios so that any implementer has no problem in implementing it correctly. Otherwise, it must say that RACK MUST NOT be used with it.

Hope this clarifies the issue.

@goelvidhi
Copy link
Contributor Author

goelvidhi commented Sep 27, 2021

@markkukojo What Bob and I are trying to say here is that CUBIC provides guidance to packet losses - whether it is for original data packet or retransmission - CUBIC will reduce congestion window in the same way for both events. We are using RACK to detect packet losses and then providing guidance what to do after that.
Please see Section 4.6,

When a congestion event is detected by mechanisms described in Section 3.1, CUBIC updates Wmax and reduces cwnd and ssthresh immediately ..

Still not convinced with the below comment you made,

And, it is not a trivial "just reduce twice" modification but the whole machinery needs to be carefully rethought.

Why do you think it is not as simple as reduce twice? For a network, packet is some bytes traveling through it - there is no difference in original data packet vs retransmission when it drops a packet. I am not getting why you think loss of a retransmissions should have a different response as compared to original data packet.

@larseggert
Copy link
Contributor

The proposal here is still to close the issue with no action. If anyone objects, a PR would be nice.

@goelvidhi
Copy link
Contributor Author

@markkukojo waiting to hear from you :-)

@markkukojo
Copy link

Why do you think it is not as simple as reduce twice? For a network, packet is some bytes traveling through it - there is no difference in original data packet vs retransmission when it drops a packet. I am not getting why you think loss of a retransmissions should have a different response as compared to original data packet.

This is a bit tricky because it's not only about this draft but concerns more the other existing RFCs.
First, no CUBIC nor any other algorithm does reduce cwnd and ssthresh always when there is a loss; They reduce only when first entering loss recovery and additional losses during the loss recovery do not result in additional cwnd reduction as these are losses within the same RTT. However, loss of a rexmit is a loss on another RTT that occurs during loss recovery and needs a special attention and that is not currently present in the RFC series for any fast recovery algo:

  1. we need a detection algorithm that a) detects a loss of rexmit and b) indicates to the loss recovery and congestion control algorithms if it happened;
  2. we need a loss recovery algorithm that correctly selects what to (re)send;
  3. we need a congestion control algorithm to a) react to congestion events and b) correctly determine how much to send during recovery

For 1 a) we have RACK but for 1 b) we do not have anything because RFC 8985 does not track whether a loss it detects is an "original" loss or a loss of rexmit nor does it indicate when a loss of a rexmit occurs. It just marks a segment lost and that works in Linux to resend (again).

For 2 we only have RFC 6675 in the RFC series because RACK requires SACK. The fast recovery algorithm in RFC 6675 is not written like Linux code, so e.g., NextSeg() does not work with lost rexmits. Similarly, the pipe algorithm does not count pipe correctly with lost rexmits so the congestion control does not work properly (for 3 b). Also, other modifications are likely needed as well. That is why RFC 8985 says it must not be used with RFC 6675 (but RFC 6675 needs an update).

For 3 a) it is pretty much "reduce again" like you say, but depending on the cc algo one also needs to see how to correctly calculate the new values for cwnd and ssthresh as well as for any other variables that the cc algo needs in all possible scenarios. But the crucial info in this draft (and in others) is to be able to say exactly when to do this "reduce again".

While RACK is fine as a detection algo I feel uneasy to say that one MAY use it with this draft as currently written because an implementer does not have necessary instructions in the RFC series how to do it correctly until RFC 6675 gets updated (or another RFC is published to describe how to implement loss recovery with lost rexmits correctly) and necessary extensions are added to RACK to track lost rexmits and to indicate when a loss of rexmit occurs.
Because RFC 8985 and RFC 6675 do not give this necessary info, this draft basically should provide it but it is not the role of this draft as a pure congestion control algorithm. OTOH, if it is left unspecified, it is quite likely some implementers will try to figure it out themselves and would quite likely get it wrong (the experienced developers for major stacks are likely to get it right but there are tons of other stacks with developers to whom the RFCs should give precise implementation instructions).

So, this is maybe more of a process question what should be done to have all necessary pieces described properly and where to write it.

@larseggert
Copy link
Contributor

@markkukojo thanks for the context. Could we scope this back to 8312bis and decide whether concrete changes are needed or not?

@goelvidhi
Copy link
Contributor Author

goelvidhi commented Oct 15, 2021

Thank you @markkukojo for the detailed explanation.

Because RFC 8985 and RFC 6675 do not give this necessary info, this draft basically should provide it but it is not the role of this draft as a pure congestion control algorithm.

I agree with this statement and the reason is QUIC (recovery time is set to when the loss was first detected on ACK and retransmitted packets are sent after that with new packet numbers and sent timestamp) can very clearly detect original losses and rexmit losses and that is documented well in RFC 9002. So, CUBIC just works with QUIC without any modification. Although the response in congestion control is the same for both type of losses which I think is the right thing.

So, I think there is no update needed in CUBIC for this issue.

@larseggert
Copy link
Contributor

@markkukojo do you agree with closing? If not, what concrete text changes would you like to propose?

@nsdyoshi
Copy link

nsdyoshi commented Nov 4, 2021

+1 for Vidhi. It seems to me that we will need recovery algorithm for RACK with TCP which is outside of this draft.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants