Add get_delegators_for_validator #3219

vicsn · 2024-04-16T14:01:19Z

Motivation

Adds an endpoint to get all delegators for a particular validator.

Test Plan

See the sister PR: https://github.com/AleoHQ/snarkVM/pull/2434

iamalwaysuncomfortable

The time it takes for this endpoint to turn the response into JSON should be benchmarked here as this potentially more costly than the actual RockDB access.

iamalwaysuncomfortable · 2024-04-23T21:33:41Z

This is awaiting a benchmark. We will perform this soon.

howardwu · 2024-04-29T23:12:01Z

What's the status on this PR? I see failing CIs across the board and no benchmarks.

iamalwaysuncomfortable

A 6-hour network test was run against this PR.

Network Configuration

10 Validators: m5.8xlarge (32vCPU, 128 MB RAM)
30 Clients: m5.4xlarge (16vCPU, 64 MB RAM)
17000 Delegators bonded to a single delegator

Tests

The following tests were run in serial.

Test 1: 1 request cannon sent 100 requests at 1 request per second at a single client
Test 2: 30 request cannons attempting to send 100 blocking HTTP requests at 1 request per second against a single client
Test 3: 50 request cannons attempting to send 100 blocking HTTP requests at 1 request per second at the load balancer distributing traffic among all client nodes equally.

Test 1 Results:

REQUEST STATISTICS:
  requests: 100
  request-cannons: 1
  avg request payload size: 1.4 mb

TIME:
  min: 4228 ms
  max: 6597 ms
  mean: 5460 ms
  std dev: 1234 ms
Number of Errors: 0
Number of Requests: 11

NOTES: 
This test caused the node to fall behind in its peers in sync rate by a small amount during the test.

Test 2 Results

REQUEST STATISTICS:
  requests: 3000 (via 30 cannons)
  request-cannons: 30
  avg request payload size (mb): 1.4 mb

TIME:
  request roundtrip min: 44483 ms
  request roundtrip max: 109171 ms
  request roundtrip mean: 52374 ms
  request roundtrip std dev: 19126 ms
Number of Requests: 100
Number of Errors: 0

NOTES: 
This test caused the node to stop syncing during the requests.

It also caused calls to any other endpoint to extend to the range of `50000 ms`. 

After the requests stopped the node continued to sync.

Test 3 Results

REQUEST STATISTICS:
  requests: 5000
  request cannons: 50
  avg response payload size: 1.4 mb

TIME:
  min: 4109 ms
  max: 25658 ms
  mean: 13242.851485148514 ms
  std dev: 3711.8065908291583 ms

NOTES: This test caused block sync speed of several client nodes to fall during the test. Block sync speed recovered after the test concluded. Sustained heavy load on this endpoint would likely heavily slow down client performance.

Conclusion

When there are a lot of delegators attached to a validator, this endpoint seemingly has the ability to make nodes fall out of sync and block other tasks. This isn't entirely surprising given querying RocksDB for this information is a compute heavy task and this task is spawned into the main tokio task queue.

This endpoint needs to be throttled heavily.

Recommendations

Spawn this task onto the blocking threadpool via spawn_blocking! (Addressed by commit 1c74880)
This endpoint should return an error if the block height is too far behind. Sending outdated committee data isn't a good idea (Addressed by commit 1c74880)
Limit this endpoint to serving a maximum amount of 5 concurrent invocations of this task (and not per IP, 5 invocations total)

Signed-off-by: Mike Turner <mturner@aleo.org>

… node is too far behind

vicsn · 2024-05-04T19:25:25Z

Thx for pushing on this. Looks like a valid DoS vector if someone exposes their endpoints.

I think its likely other REST endpoints (e.g. GET /<network>/block/{height}/transactions and GET /<network>/blocks?) would cause similar performance issues. If you agree, it might be good to tackle defenses which apply across multiple REST endpoints in a separate PR so its easier to review.

Or perhaps our recommendation should be for clients with a public endpoint to have a very low general rate limit. Infura doesn't let you query geth directly either.

iamalwaysuncomfortable

Another test was run on the latest changes.

It seems like these changes take care of the previous issues by not blocking the main thread and denying traffic if the node fails to sync further.

As is the rate limiting defaults should prevent dos-ing.

Details of the tests below.

Test 1

REQUEST STATISTICS:
  requests: 100
  request-cannons: 1
  avg request payload size: 1.6 mb


TIME:
  min: 4462 ms
  max: 5760 ms
  mean: 5007 ms
  std dev: 448 ms

NOTES: No degradation in block_sync time was seen

Test 2

REQUEST STATISTICS:
  requests: 3000
  request-cannons: 30
  avg request payload size: 1.6 mb

TIME:
  min: 6389 ms
  max: 30107 ms
  mean: 17809.54 ms
  std dev: 3377 ms


NOTES: The client under attack experienced lag in block sync rate. However after experiencing the lag in sync rate it denied further traffic and then was able to continue syncing. This lag-catchup response curve continued for as long as the load did. After the load relented - the node was able to catch back up.

Test 3

REQUEST STATISTICS:
  requests: 5000
  request-cannons: 30
  avg request payload size: 1.6 mb

TIME:
  min: 12481 ms
  max: 50167  ms
  mean: 34967 ms
  std dev: 3249 ms

NOTES: Clients stayed at tip during the entire test

node/rest/src/routes.rs

vicsn · 2024-05-06T14:03:46Z

node/rest/src/routes.rs

+        Path(validator): Path<Address<N>>,
+    ) -> Result<ErasedJson, RestError> {
+        // Do not process the request if the node is too far behind to avoid sending outdated data.
+        if rest.routing.num_blocks_behind() > SYNC_LENIENCY {


Nit: can you update the doc comment on SYNC_LENIENCY?

What would you like it to say?

Don't have a specific preference, but for example:

/// The maximum number of blocks the client can be behind it's latest peer before it skips /// processing incoming transactions, solutions and GET delegator requests.

Or:

/// The maximum number of blocks the client can be behind it's latest peer before it skips /// processing incoming transactions, solutions and heavy REST requests.

Add get_delegators_for_validator

fffaa67

vicsn requested a review from raychu86 April 16, 2024 14:01

vicsn mentioned this pull request Apr 16, 2024

Add support to fetch delegators for given validator AleoNet/snarkVM#2434

Merged

2 tasks

vicsn marked this pull request as ready for review April 17, 2024 16:26

iamalwaysuncomfortable reviewed Apr 19, 2024

View reviewed changes

raychu86 and others added 2 commits April 29, 2024 16:12

Merge branch 'mainnet-staging' into get_delegators_for_validator

ebb54d8

Merge branch 'mainnet-staging' into get_delegators_for_validator

99a8efc

iamalwaysuncomfortable requested changes May 4, 2024

View reviewed changes

iamalwaysuncomfortable added 2 commits May 4, 2024 14:16

Merge branch 'mainnet-staging' into get_delegators_for_validator

c34a5e6

Signed-off-by: Mike Turner <mturner@aleo.org>

Put delegator retrieval into a blocking task + return an error if the…

1c74880

… node is too far behind

iamalwaysuncomfortable approved these changes May 5, 2024

View reviewed changes

iamalwaysuncomfortable reviewed May 6, 2024

View reviewed changes

node/rest/src/routes.rs Show resolved Hide resolved

vicsn commented May 6, 2024

View reviewed changes

howardwu approved these changes May 9, 2024

View reviewed changes

howardwu merged commit d6a441b into mainnet-staging May 9, 2024
24 checks passed

howardwu deleted the get_delegators_for_validator branch May 9, 2024 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get_delegators_for_validator #3219

Add get_delegators_for_validator #3219

vicsn commented Apr 16, 2024

iamalwaysuncomfortable left a comment •

edited

iamalwaysuncomfortable commented Apr 23, 2024 •

edited

howardwu commented Apr 29, 2024

iamalwaysuncomfortable left a comment •

edited

vicsn commented May 4, 2024 •

edited

iamalwaysuncomfortable left a comment •

edited

vicsn May 6, 2024

iamalwaysuncomfortable May 6, 2024

vicsn May 7, 2024 •

edited

Add get_delegators_for_validator #3219

Add get_delegators_for_validator #3219

Conversation

vicsn commented Apr 16, 2024

Motivation

Test Plan

iamalwaysuncomfortable left a comment • edited

Choose a reason for hiding this comment

iamalwaysuncomfortable commented Apr 23, 2024 • edited

howardwu commented Apr 29, 2024

iamalwaysuncomfortable left a comment • edited

Choose a reason for hiding this comment

Network Configuration

Tests

Test 1 Results:

Test 2 Results

Test 3 Results

Conclusion

Recommendations

vicsn commented May 4, 2024 • edited

iamalwaysuncomfortable left a comment • edited

Choose a reason for hiding this comment

Test 1

Test 2

Test 3

vicsn May 6, 2024

Choose a reason for hiding this comment

iamalwaysuncomfortable May 6, 2024

Choose a reason for hiding this comment

vicsn May 7, 2024 • edited

Choose a reason for hiding this comment

iamalwaysuncomfortable left a comment •

edited

iamalwaysuncomfortable commented Apr 23, 2024 •

edited

iamalwaysuncomfortable left a comment •

edited

vicsn commented May 4, 2024 •

edited

iamalwaysuncomfortable left a comment •

edited

vicsn May 7, 2024 •

edited