Skip to content
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.

Overly aggressive window increase #14

Closed
lisongxu opened this issue Nov 17, 2020 · 19 comments
Closed

Overly aggressive window increase #14

lisongxu opened this issue Nov 17, 2020 · 19 comments
Assignees
Labels
design Normative change relative to RFC8312 or earlier bis versions

Comments

@lisongxu
Copy link
Contributor

Since we are revising this RFC, I guess it is a good time to fix some Cubic bugs reported in our NSDI 2019 paper .

This RFC sets W_cubic(t + RTT) as the target window size after in the next RTT. However, this targe size may be too high, like even higher than 2 * cwnd (i.e., more aggressive than slow start), in the following special cases.

  • case 1: RTT is extremely long. An extremely long RTT is very likely an indication of network congestion, in such an environment it is dangerous to set a very high target.

  • case 2: after a long idle period (i.e., a big increase of t). This is a bug reported and fixed by Google.

  • case 3: after a long application rate-limited period (i.e., a bug increase of t). Similar to case 2

To be safer, we may change Equation (1) as follow to fix all the above bugs

    W_cubic(t) = C*(t-K)^3 + origin_point (Eq. 1)
    if (W_cubic (t) > 2* cwnd)
        W_cubic(t)  =  2 * cwnd

Note that, Linux Cubic already does something similar (line 328) by limiting target to be no more then 1.5 * cwnd.

Thanks

@larseggert larseggert added the design Normative change relative to RFC8312 or earlier bis versions label Nov 17, 2020
@sangtaeha
Copy link
Contributor

@lisongxu Agreed. We also include this change in the list of changes for this new revision.

@goelvidhi
Copy link
Contributor

goelvidhi commented Nov 18, 2020

@lisongxu Below is my response for the three cases you mentioned,

  1. At least we use SRTT instead of RTT which mitigates the problem somewhat
  2. For any idle period, we reset the epoch period to 0.
  3. Same as 2.

Having said that, your suggestion is a safe option regardless.

@goelvidhi
Copy link
Contributor

@larseggert I can do this change. Feel free to assign :-)

@lisongxu
Copy link
Contributor Author

What I proposed is a simple fix mainly for other implementations. For bug 2, Google already proposed a fix that has been implemented in Linux and been adopted into the cubic RFC. (Section 5.8). Thanks

@larseggert
Copy link
Contributor

Would good to give this issue a more descriptive title.

@lisongxu lisongxu changed the title Some bugs to fix Overly aggressive window increase Nov 19, 2020
@lisongxu
Copy link
Contributor Author

@larseggert done

@lisongxu
Copy link
Contributor Author

@yuchungcheng Case 2 is already fixed by Google in function bictcp_cwnd_event. How about case 3? Does bictcp_cwnd_event also fix case 3? Thanks

@goelvidhi
Copy link
Contributor

@lisongxu The fix to bound the CUBIC target cwnd is added to #3. Let me know if there is any remaining items for this issue.

@yuchungcheng
Copy link

yuchungcheng commented Nov 20, 2020 via email

@larseggert
Copy link
Contributor

@lisongxu The fix to bound the CUBIC target cwnd is added to #3. Let me know if there is any remaining items for this issue.

Could I ask that we do individual PRs to address individual issues? It sometimes causes a little rebasing effort, but it's much easier to review such PRs compared to ones that contain changes for multiple issues.

@lisongxu
Copy link
Contributor Author

@yuchungcheng Thanks.

The reason that I am not sure about case 3 is that Cubic still sends out packets just at a low rate limited by application instead of cwnd. If Cubic remains in case 3 for a long time, cwnd will not be increased for a long time but it is not idle. As a result, I guess bictcp_cwnd_event does not fix case 3, because bictcp_cwnd_event only detects the idle period? Am I missing something? Thanks

@goelvidhi
Copy link
Contributor

@lisongxu The fix to bound the CUBIC target cwnd is added to #3. Let me know if there is any remaining items for this issue.

Could I ask that we do individual PRs to address individual issues? It sometimes causes a little rebasing effort, but it's much easier to review such PRs compared to ones that contain changes for multiple issues.

Sorry about that but the fix in #3 is incomplete without the bounds on target. Will try to keep them separate as much as possible.

@yuchungcheng
Copy link

yuchungcheng commented Nov 20, 2020 via email

@goelvidhi
Copy link
Contributor

@yuchungcheng Doesn't https://tools.ietf.org/html/rfc7661 solve the different app-limited scenarios?

@lisongxu
Copy link
Contributor Author

@yuchungcheng Thank you!

Yes, we should avoid overly conservative (e.g., 1s in your example) and overly aggressive (e.g., 5m in your example). This is why I am suggesting a simple fix to set the lower bound (cwnd) and the upper bound (2*cwnd) to the target cwnd in the next RTT, and the detailed discussions can be found in issue #1 . Thanks

@lisongxu
Copy link
Contributor Author

@goelvidhi Thank you. That (rfc7661) requests a major change to Cubic, and I believe that the basic idea of rfc7661 is consistent with what we have been discussed (i.e., avoiding overly conservative and overly aggressive and being responsive to congestion). Thanks

@nealcardwell
Copy link
Contributor

nealcardwell commented Nov 20, 2020

AFAICT the effect of the proposed fix in the first message in this thread -- #14 (comment) -- would be to bound the rate of increase of the cwnd to at most doubling each round trip time. Is that the intent?

The Linux TCP CUBIC implementation has already always bounded the rate of increase of the cwnd to at most 1.5x per round trip time. Initially it did this implicitly (the logic only allowed cwnd increases on at most every alternate ACK), and then when the stretch ACKs fixes were put in place the bound became explicit. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/ipv4/tcp_cubic.c?id=d578e18ce93f5d33a7120fd57c453e22a4c0fc37

Given that separate pre-existing bound of 1.5x per round trip, AFAICT the proposed fix of bounding to less than 2x per round trip would be NOP, AFAICT? Or perhaps I misunderstand the proposal.

By the way, for YouTube we found this implicit bound of 1.5x per round trip was important for keeping losses at a reasonable level. This has a large impact on behavior, and presumably big implications for fairness between CUBIC implementations. So this may be important to document in the RFC, if it is not already (I couldn't find it, but I may have just missed it).

@lisongxu
Copy link
Contributor Author

@nealcardwell Yes, you are right that Linux has already implemented the 1.5x upper bound. I am fine to change the upper bound from 2.0x to 1.5x. What is important is to add an upper bound to RFC to make sure that all other implementations also implement something similar. Thanks

@goelvidhi
Copy link
Contributor

Closed by #3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
design Normative change relative to RFC8312 or earlier bis versions
Projects
None yet
Development

No branches or pull requests

6 participants