Congestion window may grow too slowly #458

WesleyRosenblum · 2021-01-22T03:30:48Z

If the application sends a burst of packets up to the congestion window size, when those packets are acked only the first one or two may result in an increased window as the window during the processing of the later acks will be considered "under utilized". This is because the acks may be processed in a burst before the application is able to send again. The result would be the congestion window increasing only by the MTU despite a full congestion window's worth of data being acknowledged.

The purpose of checking if the congestion window is under utilized before increasing the window is to ensure that we don't increase the window if the current window size has not been "tested" or "validated" by actually sending and acknowledging that many bytes. If we don't have this check, then the application may suddenly send a large amount of data, much larger than had been successfully sent before, and congestion may result. On the other hand, if this check is too strict (as it is now), we will not allow the window to grow fast enough.

See quinn-rs/quinn#975 ("Extra slow slow start")

QUIC WG References

Discuss Application-Limited Sending #1637:

Naively, if this function is called for two different packets in a row, without a corresponding send operation being performed between them, then only the first OnPacketAckedCC ever actually increases congestion_window because the previous call just decremented bytes_in_flight and incremented congestion_window guaranteeing that IsAppLimited() will always return false for future packets.

In a slightly more intelligent interpretation of this, where you actually preform this logic for all the bytes acknowledged in a whole ACK frame in a single call, you can still have the same problem if you end up processing two ACK frames back to back before you end up doing a send.

RFC 7661: Updating TCP to Support Rate-Limited Traffic
Congestion control during application limited state #2554

my read of the Linux code tells me that at ack receipt time, the sender considers itself app-limited if in slow-start and cwnd > 2 x inflight, or if in cong-avoid and cwnd > inflight.

Define "under-utilization" of cwnd #2630

If we need the entire cwnd to be used before increasing it, we lose out on 1 RTT of growth during slow start. Both Windows and Linux TCP compensate for this by doing something like increase it when > 1/2 cwnd is used... we need to write something here that's at least as performant, and not just "when inflight >= cwnd"

Good point about the in_flight > 1/2 CWND check in Linux. In BBR, that only applies to the first doubling of IW, and after that it's fully bandwidth based, FWIW.

-Keep an in-flight high watermark, InFlightHigh, that's reset on each CWND change.
-When considering growing the CWND, only let it go as large as InFlightHigh*2.
-Justification for the "*2": it allows base-2 exponential growth, which is the fastest growth we ever want.
EDIT: expire on CWND change might not work. I think InFlightHigh expiry is the hard part here.
Possible expiration events: persistent congestion, ssthresh changes, app idle.

Other Implementsations

Chromium: TcpCubicSenderBytes::IsCwndLimited
- Has a slow start threshold and a congestion avoidance threshold, does not actively track if application limited
Linux TCP: tcp_is_cwnd_limited
- Similar to Chromium
- Maybe related: tcp_cwnd_validate and tcp_cwnd_application_limited
Facebook MVFST updateAppIdleState
- Idle state is updated on the congestion controller based on if the application has any data to send
Quiche update_app_limited
- App limited state updated when application does not have data to send
Quinn app_limited
- Introduces a new app_limited field which caches whether the last poll_transmit attempt couldn't produce data because there was no application data available.

s2n-quic Implementation Options

Track application idle state in the Congestion Controller

Connections express transmission interest. When a connection has a transmission interest, it is added to the waiting for transmissions list in the Connection Container. In the with_connection method of the Connection Container, we could call a new method on the connection that would check for transmission interests and if there are none, tell the Congestion Controller on each path that it is idle. The idle flag would be reset whenever a packet is sent.
This gets a little tricky, as there are several reasons why a non-idle application may not be able to send data (amplification limited, retransmissions only, etc). We'd only want to consider the connection to be application idle if the connection does not want to send anything, regardless of other limitations.

   fn update_app_idle_state(&mut self) {
        let mut interests = ConnectionInterests::default();

        let mut transmission = Interest::default();
        transmission += self.path_manager.transmission_interest();
        transmission += shared_state.space_manager.transmission_interest();
        transmission += self.local_id_registry.transmission_interest();
        
        if matches!(transmission, Interest::None) {
            self.path_manager.update_app_idle_state(true);
        }
    }

Follow the Chromium approach and just have more generous thresholds for deciding when application limited

   fn is_congestion_limited(&self) -> bool {
        let available_congestion_window =
            self.congestion_window.saturating_sub(*self.bytes_in_flight);
        if available_congestion_window == 0 {
            return true;
        }

        if matches!(self.state, SlowStart) && self.bytes_in_flight > self.congestion_window / 2 {
            return true;
        }
        
        available_congestion_window <= self.max_datagram_size * MAX_BURST_MULTIPLIER
    }

In on_packet_ack, we check if we are under_utilized as we do today. If not, we set a timestamp to 1 RTT from now, and consider under_utilized to be false until we reach that timestamp, at which point we check it again.
Combine the Chromium approach with checking if the congestion window is utilized in on_packet_sent, and caching that result in the congestion controller. This value would then be checked in on_packet_ack. Since the change from under_utilized to utilized occurs before the congestion controller prevents sending, I believe this would work, since as long as the application is able to send data, under_utilized should flip to false and on_packet_ack would be allowed to grow the window.

The text was updated successfully, but these errors were encountered:

camshaft · 2021-01-29T20:46:56Z

I really like the simplicity of the Chromium/Linux approach. It makes the change self-contained rather than introducing more complex state tracking. I also think it's always something we can go back and change if we're ok with the added complexity as well.

WesleyRosenblum · 2021-01-29T20:56:27Z

Agreed. And given I think we'll need to revisit some of this code anyway when implementing pacing it makes sense to start with a simpler approach until we have a better idea of how pacing will work.

WesleyRosenblum · 2021-01-30T03:33:04Z

I implemented something close to the Chromium approach, and while the results are much better, in the "Slow Start Unlimited" simulation we are still not doubling the congestion window each transmission round. The reason for this is that this simulation sends the full congestion window, and then acks the full congestion window (send and receive are not interleaved at all). Is such a scenario realistic?

Chromium approach:

Completely unconstrained:

WesleyRosenblum · 2021-02-03T06:21:02Z

I've implemented approach 4 in #474

WesleyRosenblum · 2021-02-03T19:07:18Z

With #474 the congestion window grows as expected in slow start:

WesleyRosenblum added loss_recovery Pertaining to the QUIC Loss Detection and Congestion Control RFC MVP labels Jan 22, 2021

WesleyRosenblum mentioned this issue Jan 27, 2021

Plot a simulation of the congestion controller #468

Merged

WesleyRosenblum mentioned this issue Feb 1, 2021

Allow congestion window to grow even when the window is not completely full and determine window utilization in on_packet_sent #474

Merged

WesleyRosenblum closed this as completed Feb 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Congestion window may grow too slowly #458

Congestion window may grow too slowly #458

WesleyRosenblum commented Jan 22, 2021 •

edited

camshaft commented Jan 29, 2021

WesleyRosenblum commented Jan 29, 2021

WesleyRosenblum commented Jan 30, 2021

WesleyRosenblum commented Feb 3, 2021

WesleyRosenblum commented Feb 3, 2021

Congestion window may grow too slowly #458

Congestion window may grow too slowly #458

Comments

WesleyRosenblum commented Jan 22, 2021 • edited

camshaft commented Jan 29, 2021

WesleyRosenblum commented Jan 29, 2021

WesleyRosenblum commented Jan 30, 2021

WesleyRosenblum commented Feb 3, 2021

WesleyRosenblum commented Feb 3, 2021

WesleyRosenblum commented Jan 22, 2021 •

edited