Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getnetworksolps & getnetworkhashps RPCs hang with large num_blocks #6688

Closed
3 of 4 tasks
Tracked by #7366
teor2345 opened this issue May 15, 2023 · 6 comments
Closed
3 of 4 tasks
Tracked by #7366

getnetworksolps & getnetworkhashps RPCs hang with large num_blocks #6688

teor2345 opened this issue May 15, 2023 · 6 comments
Assignees
Labels
A-rpc Area: Remote Procedure Call interfaces C-bug Category: This is a bug C-security Category: Security issues I-hang A Zebra component stops responding to requests S-needs-triage Status: A bug report needs triage

Comments

@teor2345
Copy link
Contributor

teor2345 commented May 15, 2023

Description

This bug is potentially remotely triggerable via a mining pool web or query interface, then via JSON-RPC to zebrad.

Some pool software could provide a query interface that takes arbitrary height ranges, or have a large default range. It wouldn't be a typical implementation, but there's a lot of weird mining software out there. If that happens, Zebra would hang.

This impacts Zebra with the getblocktemplate-rpcs feature enabled:

  • getnetworksolps & getnetworkhashps:
    • large num_blocks: hangs the RPC request rather than crashing, because it fetches the data for all the blocks in the range
    • large height: provides an incorrect answer, likely based on an incorrect block calculation (not a security issue)

Scheduling

This might be a blocker for asking mining pools to run Zebra on production mainnet miners. We might want to check with any pools that start running Zebra.

Steps to Reproduce

When I run zebrad on RPC port 28232, and send the following large num_blocks or height:

zcash-cli --rpcport=28232 getnetworksolps `python -c 'print 2**31 - 1'`
zcash-cli --rpcport=28232 getnetworkhashps `python -c 'print 2**31 - 1'`
zcash-cli --rpcport=28232 getnetworksolps 120 `python -c 'print 2**31 - 1'`
zcash-cli --rpcport=28232 getnetworkhashps 120 `python -c 'print 2**31 - 1'`
zcash-cli --rpcport=28232 getnetworksolps 1000000 2000000
zcash-cli --rpcport=28232 getnetworkhashps 1000000 2000000

Expected Behaviour

I expect the RPC to return an error or a valid value.

Actual Behaviour

Instead, they hang or return an invalid answer.

Analysis

This is happening because we read the entire block from the state, for every height in the RPC range (1.5 kB to 2 MB per block).

Instead, we could do these quick fixes:

  • only read the block headers (1.5 kB per header), and
  • put a limit on the length of the block range sent to the RPCs, and error otherwise

Testing

We should test that the limit range works using unit tests, and also matches zcashd using zcash-rpc-diff.

We should test every RPC that accepts heights with a real state service and heights between 2^22 to 2^32. Some of these values are outside the valid height range, but some RPC methods don't check the valid height range. (And it might increase in future.) We could use a proptest for this.

Environment

Zebra Version

2023-03-28T06:11:25.432588Z  INFO zebrad::application: Diagnostic Metadata:
version: 1.0.0-rc.6+16.g96e5b54
Zcash network: Mainnet
state version: 25
branch: main
git commit: 96e5b54
commit timestamp: 2023-03-28T04:13:26Z
target triple: x86_64-unknown-linux-gnu
build profile: release

Zebra Features

This only happens with getblocktemplate-rpcs.

Operating System

$ uname -a
Linux x 5.15.85 #1-NixOS SMP Wed Dec 21 16:36:38 UTC 2022 x86_64 GNU/Linux
@teor2345 teor2345 added C-bug Category: This is a bug S-needs-triage Status: A bug report needs triage P-Low ❄️ C-security Category: Security issues I-hang A Zebra component stops responding to requests A-rpc Area: Remote Procedure Call interfaces labels May 15, 2023
@teor2345
Copy link
Contributor Author

This is a low priority because these RPCs are behind a feature.

@teor2345
Copy link
Contributor Author

teor2345 commented Jun 29, 2023

I think I found the cause for this bug, we're loading entire blocks (up to 2 MB) just to read the difficulties from their headers:

let mut block_iter = any_ancestor_blocks(non_finalized_state, db, start_hash)
.take(num_blocks.checked_add(1).unwrap_or(num_blocks))

@mpguerra
Copy link
Contributor

Hey team! Please add your planning poker estimate with Zenhub @arya2 @oxarbitrage @teor2345 @upbqdn

@teor2345
Copy link
Contributor Author

I moved the optional parts of this ticket into #7403 before estimating it. Estimates are only for the quick fix.

@teor2345
Copy link
Contributor Author

I added an extra test case with a large valid block range.

@teor2345
Copy link
Contributor Author

We decided PR #7647 was enough to fix these bugs, if anyone needs good performance for large ranges they can tell us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rpc Area: Remote Procedure Call interfaces C-bug Category: This is a bug C-security Category: Security issues I-hang A Zebra component stops responding to requests S-needs-triage Status: A bug report needs triage
Projects
Archived in project
Development

No branches or pull requests

2 participants