Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Congestion window may grow too slowly #458

Closed
WesleyRosenblum opened this issue Jan 22, 2021 · 5 comments
Closed

Congestion window may grow too slowly #458

WesleyRosenblum opened this issue Jan 22, 2021 · 5 comments
Labels
loss_recovery Pertaining to the QUIC Loss Detection and Congestion Control RFC MVP

Comments

@WesleyRosenblum
Copy link
Contributor

WesleyRosenblum commented Jan 22, 2021

If the application sends a burst of packets up to the congestion window size, when those packets are acked only the first one or two may result in an increased window as the window during the processing of the later acks will be considered "under utilized". This is because the acks may be processed in a burst before the application is able to send again. The result would be the congestion window increasing only by the MTU despite a full congestion window's worth of data being acknowledged.

The purpose of checking if the congestion window is under utilized before increasing the window is to ensure that we don't increase the window if the current window size has not been "tested" or "validated" by actually sending and acknowledging that many bytes. If we don't have this check, then the application may suddenly send a large amount of data, much larger than had been successfully sent before, and congestion may result. On the other hand, if this check is too strict (as it is now), we will not allow the window to grow fast enough.

See quinn-rs/quinn#975 ("Extra slow slow start")

QUIC WG References

  • Discuss Application-Limited Sending #1637:

Naively, if this function is called for two different packets in a row, without a corresponding send operation being performed between them, then only the first OnPacketAckedCC ever actually increases congestion_window because the previous call just decremented bytes_in_flight and incremented congestion_window guaranteeing that IsAppLimited() will always return false for future packets.

In a slightly more intelligent interpretation of this, where you actually preform this logic for all the bytes acknowledged in a whole ACK frame in a single call, you can still have the same problem if you end up processing two ACK frames back to back before you end up doing a send.

my read of the Linux code tells me that at ack receipt time, the sender considers itself app-limited if in slow-start and cwnd > 2 x inflight, or if in cong-avoid and cwnd > inflight.

  • Define "under-utilization" of cwnd #2630

If we need the entire cwnd to be used before increasing it, we lose out on 1 RTT of growth during slow start. Both Windows and Linux TCP compensate for this by doing something like increase it when > 1/2 cwnd is used... we need to write something here that's at least as performant, and not just "when inflight >= cwnd"

Good point about the in_flight > 1/2 CWND check in Linux. In BBR, that only applies to the first doubling of IW, and after that it's fully bandwidth based, FWIW.

-Keep an in-flight high watermark, InFlightHigh, that's reset on each CWND change.
-When considering growing the CWND, only let it go as large as InFlightHigh*2.
-Justification for the "*2": it allows base-2 exponential growth, which is the fastest growth we ever want.
EDIT: expire on CWND change might not work. I think InFlightHigh expiry is the hard part here.
Possible expiration events: persistent congestion, ssthresh changes, app idle.

Other Implementsations

s2n-quic Implementation Options

  1. Track application idle state in the Congestion Controller
  • Connections express transmission interest. When a connection has a transmission interest, it is added to the waiting for transmissions list in the Connection Container. In the with_connection method of the Connection Container, we could call a new method on the connection that would check for transmission interests and if there are none, tell the Congestion Controller on each path that it is idle. The idle flag would be reset whenever a packet is sent.
  • This gets a little tricky, as there are several reasons why a non-idle application may not be able to send data (amplification limited, retransmissions only, etc). We'd only want to consider the connection to be application idle if the connection does not want to send anything, regardless of other limitations.
   fn update_app_idle_state(&mut self) {
        let mut interests = ConnectionInterests::default();

        let mut transmission = Interest::default();
        transmission += self.path_manager.transmission_interest();
        transmission += shared_state.space_manager.transmission_interest();
        transmission += self.local_id_registry.transmission_interest();
        
        if matches!(transmission, Interest::None) {
            self.path_manager.update_app_idle_state(true);
        }
    }
  1. Follow the Chromium approach and just have more generous thresholds for deciding when application limited
   fn is_congestion_limited(&self) -> bool {
        let available_congestion_window =
            self.congestion_window.saturating_sub(*self.bytes_in_flight);
        if available_congestion_window == 0 {
            return true;
        }

        if matches!(self.state, SlowStart) && self.bytes_in_flight > self.congestion_window / 2 {
            return true;
        }
        
        available_congestion_window <= self.max_datagram_size * MAX_BURST_MULTIPLIER
    }
  1. In on_packet_ack, we check if we are under_utilized as we do today. If not, we set a timestamp to 1 RTT from now, and consider under_utilized to be false until we reach that timestamp, at which point we check it again.
  2. Combine the Chromium approach with checking if the congestion window is utilized in on_packet_sent, and caching that result in the congestion controller. This value would then be checked in on_packet_ack. Since the change from under_utilized to utilized occurs before the congestion controller prevents sending, I believe this would work, since as long as the application is able to send data, under_utilized should flip to false and on_packet_ack would be allowed to grow the window.
@WesleyRosenblum WesleyRosenblum added loss_recovery Pertaining to the QUIC Loss Detection and Congestion Control RFC MVP labels Jan 22, 2021
@camshaft
Copy link
Contributor

I really like the simplicity of the Chromium/Linux approach. It makes the change self-contained rather than introducing more complex state tracking. I also think it's always something we can go back and change if we're ok with the added complexity as well.

@WesleyRosenblum
Copy link
Contributor Author

Agreed. And given I think we'll need to revisit some of this code anyway when implementing pacing it makes sense to start with a simpler approach until we have a better idea of how pacing will work.

@WesleyRosenblum
Copy link
Contributor Author

I implemented something close to the Chromium approach, and while the results are much better, in the "Slow Start Unlimited" simulation we are still not doubling the congestion window each transmission round. The reason for this is that this simulation sends the full congestion window, and then acks the full congestion window (send and receive are not interleaved at all). Is such a scenario realistic?

Chromium approach:
Screen Shot 2021-01-29 at 7 20 05 PM

Completely unconstrained:
Screen Shot 2021-01-29 at 7 28 13 PM

@WesleyRosenblum
Copy link
Contributor Author

I've implemented approach 4 in #474

@WesleyRosenblum
Copy link
Contributor Author

With #474 the congestion window grows as expected in slow start:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
loss_recovery Pertaining to the QUIC Loss Detection and Congestion Control RFC MVP
Projects
None yet
Development

No branches or pull requests

2 participants