Skip to content
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.

Contribution to buffer bloat and slower convergence due to larger decrease factor #89

Closed
goelvidhi opened this issue Aug 31, 2021 · 9 comments · Fixed by #123
Closed
Assignees

Comments

@goelvidhi
Copy link
Contributor

goelvidhi commented Aug 31, 2021

Markku Kojo said,

This draft uses a larger cwnd decrease factor, resulting in larger
average cwnd and buffer occupation. This means that it is
likely to contribute significantly to buffer bloat, particularly
when considering also the use of concave increase function in the
beginning of the congestion avoidance that keeps the cwnd close
to maximum most of the time as carefully explained in the draft.
This means that CUBIC keeps also buffer bloated router queues
very efficiently full at all times.

Currently the draft does mention the slower convergence speed
as the only side effect for the larger decrease factor and does
not discuss the contribution to buffer bloat. It would be
important to assess this together with measurement data to
back up any observations.

Do we have data in different environments, including buffer-bloated
environments that show how much effect CUBIC has compared to
AIMD TCP?
And, how does larger decrease function impact convergence speed,
particularly in buffer-bloated environments.
Many people have complained that window-based (TCP) congestion
control drives buffer bloat. Of course, also the current standard
AIMD TCP tends to fill in the buffer-bloated queues but it
unlikely does it as effectively as CUBIC? This would be good to
understand better.

@lisongxu lisongxu self-assigned this Sep 1, 2021
@goelvidhi
Copy link
Contributor Author

goelvidhi commented Sep 14, 2021

@markkukojo,
@bbriscoe pointed me to this paper https://www.simula.no/sites/default/files/publications/Simula.simula.618.pdf. If you look at Figure 7, CUBIC has a much lower queuing delay variation than New Reno which means that CUBIC has a shallower saw-teeth than New Reno. But as you said, due to higher β (0.7) and concave increase function, CUBIC can reach W_max faster. This means that although in this paper, CUBIC shows a higher queuing delay than New Reno, this can be easily solved by configuring a lower threshold/interval for loss / ECN at the bottleneck AQM which would make CUBIC perform decreases more often without losing link utilization as significantly (as compared to New Reno). This would also help to solve buffer bloat better than New Reno.

@bbriscoe
Copy link
Contributor

I would add to Vidhi's comment by explaining that network operators are not expected to 'solve' this issue by setting a lower AQM threshold wherever Cubic is used (which is clearly impractical, but might be how someone awkward could interpret Vidhi's words). Nonetheless, widespread deployment of Cubic (2006+ timeframe) gives more scope for setting AQM thresholds lower for the same utilization. Indeed, the subsequent round of AQM implementations (2012+ timeframe) could set AQM thresholds to a lower default than if Reno had still been widely deployed.

I don't think 'bufferbloat' is even the right word for an AQM that is set for the amplitude of Reno's sawteeth rather than Cubic's. Bloat implies excessively large, not just a little too large.

The lesson here is that we need to be careful attributing blame. Once AQMs are deployed to address real bufferbloat, the root cause of the residual 'bufferbloat' is not in the buffer, it's in the large variations of congestion control sawteeth (in slow-start as well as congestion avoidance). It is inappropriate to blame Cubic for squeezing the sawteeth up nearer to the threshold - the blame for the threshold needing to be that high in the first place falls on the predecessor to Cubic (Reno).

Should the draft say something about this? I think it should (briefly - to counter any future criticism similar to Markku's). But that would require a new section. It doesn't fit under any of the headings in the existing 'Discussion' section, which follows the structure suggested by RFC5033. But not saying anything would also be OK for me.

@bbriscoe
Copy link
Contributor

Regarding slower convergence, the same section could also say that the smaller reduction per round (larger β) means that it takes more rounds to reduce in response to continuing congestion or another flow trying to push in. Nonetheless, convergence speed prior to Cubic was primarily limited by the slow additive increase of Reno, which can be faster once Cubic gets into true Cubic mode.

This was referenced Sep 15, 2021
@bbriscoe
Copy link
Contributor

Any new text about slower convergence obviously ought to mention Cubic's optional fast convergence mechanism.

A good paper that is mostly an evaluation of Cubic's convergence (with fast convergence mechanism enabled) is:
Leith, D. J.; Shorten, R. N. & McCullagh, G. Experimental evaluation of Cubic-TCP Proc. Int'l Wkshp on Protocols for Future, Large-scale & Diverse Network Transports (PFLDNeT'07), 2007
It's not particularly complementary about Cubic's convergence.

On that subject, this text in the fast convergence section is infeasible to comply with:

Fast Convergence is designed for network environments with multiple CUBIC flows. In network
environments with only a single CUBIC flow and without any other traffic, Fast Convergence SHOULD
be disabled.

This ought to say whether fast convergence is recommended for use over the public Internet, or not (given the public Internet is designed for both single flows and multiple flows). I believe fast convergence is generally enabled, so this is the sort of thing that ought to be recommended in an RFC that wraps up more than a decade of experience using Cubic.

@larseggert
Copy link
Contributor

@lisongxu since you self-assigned this some time ago, would you please prepare a PR to close this issue?

@lisongxu
Copy link
Contributor

Thanks, @larseggert . I will

@lisongxu
Copy link
Contributor

@larseggert This issue overlaps mainly with two other issues. The buffer bloat part overlaps with issue #94, and the convergence part overlaps with issue #96.

@larseggert larseggert linked a pull request Oct 18, 2021 that will close this issue
@larseggert
Copy link
Contributor

@lisongxu do you think anything more needs to be done here after those issues are closed?

@lisongxu
Copy link
Contributor

@larseggert No, thank you

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants