Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiquic support in solana_streamer -- rebase of #634 to latest master #1452

Merged
merged 13 commits into from
Jun 21, 2024

Conversation

lijunwangs
Copy link

@lijunwangs lijunwangs commented May 22, 2024

Problem

We have found in production that the quic streamer endpoint is overwhelmed with connections and txn packets that cause large number of connection timeout or send transaction time out. With multiple endpoints, multiple threads can be utilized to ingest the input packets.

Summary of Changes

  • tpu: use multiple quic endpoints

  • cluster-info: manage port range by hand...

  • local-cluster: keep udp tpu socket around for tests

  • First set the number of endpoint to 1 -- pending more test results in the latest v1.18 on skip rage problem.

Fixes #

@lijunwangs lijunwangs changed the title multiquic support in solana_streamer -- rebase of #634 multiquic support in solana_streamer -- rebase of #634 to latest master Jun 4, 2024
@alessandrod
Copy link

I've re-re-re-reviewed this. It looks great (😏), except for the manual range thing, which I would try to undo.

See this commit: solana-labs@fb0eea2. I think if we revert this - so we don't set reuseport=true in multi_bind_in_range - it should work (tm).

I don't think that reverting that commit actually breaks anything. In most other places we don't set reuseport=true, so WSL will break there anyway. I'd just revert it.

local-cluster/src/local_cluster.rs Show resolved Hide resolved
net-utils/src/lib.rs Outdated Show resolved Hide resolved
net-utils/src/lib.rs Outdated Show resolved Hide resolved
@lijunwangs lijunwangs merged commit 2443048 into anza-xyz:master Jun 21, 2024
51 checks passed
samkim-crypto pushed a commit to samkim-crypto/agave that referenced this pull request Jul 31, 2024
…est master (anza-xyz#1452)

* net-utils: support SO_REUSEPORT

tpu: use multiple quic endpoints

cluster-info: manage port range by hand...

local-cluster: keep udp tpu socket around for tests

* Missing cargo file

* sort cargo.toml

* divide the concurrent_connections among the endpoints for multiquic

* Change default multiquic endpoint count to 1

* Missing Cargo.lock changes

* revert reuseaddr changes

* revert reuseaddr changes;fmt code

* reverted port range changes

* revert DEFAULT_TPU_ENABLE_UDP change in local_cluster

* Turn tpu_enable_udp to true to prevent concurrent local cluster tests to use the same QUIC ports

* changed QUIC_ENDPOINTS to 10 for testing

* Turn QUIC_ENDPOINTS to 1 for now

---------

Co-authored-by: Trent Nelson <trent@solana.com>
Co-authored-by: Lijun Wang <lijun.wang@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants