Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport connection stream counter to v1.17 #991

Merged

Conversation

lijunwangs
Copy link

@lijunwangs lijunwangs commented Apr 23, 2024

Problem

There are two issues in v1.17 which are bugs in the QOS accounting

  1. The stream counts are not counted for a stake across all connections, causing over allocation of stream bandwidth.
  2. The stream counts are not counted for the unstaked node across all connections -- incentivizing the unstaked to spam connections.

In master this is well encapsulated in ConnectionStreamCounter which track the stream counts across connections for a key (pubkey or IpAddr).

Summary of Changes

Back port ConnectionStreamCounter, following 3 issues are solved:

  1. Over allocations of PPS for staked nodes.
  2. Over allocations of PPS for unstaked nodes.
  3. Backport of sleep instead of drop when stream rate exceeded limit; #939: use sleep instead of drop when throttling streams.

Fixes #

@lijunwangs lijunwangs changed the title Backport connection stream counter Backport connection stream counter to v1.17 Apr 23, 2024
@lijunwangs lijunwangs marked this pull request as draft April 23, 2024 06:41
@lijunwangs lijunwangs force-pushed the backport_ConnectionStreamCounter branch from 02c8c2a to 56f9211 Compare April 24, 2024 02:44
@@ -0,0 +1,47 @@
use std::{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done without the refactor? this is stable branch. we need the absolutely most concise effective diff

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exactly a refactoring -- The ConnectionStreamCounter is needed to do the counting across connections. Even though I understand the desire to keep changes minimum, I think do special handling for the backport itself to keep it small is bad practice as if there are two different implementations. The divergence makes maintaining these two branches very difficult and cause it easier to slip in new bugs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file was not introduced by either of the PRs that this one is backporting. no tests? straight to mb?

you can try to get @CriesofCarrots or @sakridge to sign off

@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2024

Codecov Report

Attention: Patch coverage is 70.58824% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 81.6%. Comparing base (4f9c393) to head (c0ef443).
Report is 1 commits behind head on v1.17.

Additional details and impacted files
@@            Coverage Diff            @@
##            v1.17     #991     +/-   ##
=========================================
- Coverage    81.6%    81.6%   -0.1%     
=========================================
  Files         806      807      +1     
  Lines      219348   219370     +22     
=========================================
+ Hits       179078   179093     +15     
- Misses      40270    40277      +7     

@lijunwangs lijunwangs requested review from a team as code owners April 29, 2024 21:28
@lijunwangs lijunwangs force-pushed the backport_ConnectionStreamCounter branch from c4e7475 to 2661360 Compare April 29, 2024 21:28
pgarg66
pgarg66 previously approved these changes Apr 29, 2024
@lijunwangs lijunwangs marked this pull request as draft April 30, 2024 19:26
@lijunwangs lijunwangs force-pushed the backport_ConnectionStreamCounter branch from 2661360 to c0ef443 Compare April 30, 2024 19:36
@lijunwangs lijunwangs marked this pull request as ready for review April 30, 2024 20:40
@sakridge
Copy link

sakridge commented May 3, 2024

I think I'm ok with the refactor. I would just like to see the testing it has gone through. unit test which stresses the streams, maybe creating connections which might get dropped and then recreated so maybe we could have a bunch of tasks pending while we are sleeping? And then also time on testnet and nodes experiencing mainnet traffic and checking of the counter values to see that it's working as expected.

@lijunwangs
Copy link
Author

I have the jito-relayers one with latest v1.17 and the one with v1.17 + the changes in this PR. You can see the throttled streams with this change is much more stable compared to that without it -- indicating less thrashing in the system.
Screenshot 2024-05-03 at 11 53 52 AM

@lijunwangs
Copy link
Author

The changed has been running on 3gG. packets sent down and streams throttled.
Screenshot 2024-05-03 at 12 14 51 PM

@lijunwangs
Copy link
Author

Regarding sleep vs drop -- I do not think the change makes it any worse in worst case scenario. The worst case is bounded by the max_uni_streams per connections -- the maximum concurrent streams can be in open state for a connection. Even without the sleep, a malicious client can drag on feeding the data for the stream to simulate the scenario of keeping the stream open for "long" time. In regard of opening new connections while in sleep, it also a not net new change. While we are sleeping, we hold reference to the connection object -- so even the client close it, on the server side the connection is still counted in the connection cache table and is limited by how many concurrent connections per IP/pubkey. Regarding the just pure QUIC connection spamming, we already have the issue even without this change. It has to be addressed by early filtering from Quinn enhancement. So I do not think it make it any worse.

@lijunwangs lijunwangs requested a review from joeaba May 7, 2024 14:57
@lijunwangs lijunwangs merged commit 58916d9 into anza-xyz:v1.17 May 7, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants