Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scanblocks RPC call #20664

Closed

Conversation

jonasschnelli
Copy link
Contributor

@jonasschnelli jonasschnelli commented Dec 15, 2020

The scanblocks RPC call allows one to get relevant blockhashes from a set of descriptors by scanning all blockfilters in a given range.

Example:

scanblocks start '["addr(<bitcoin_address>)"]' 661000
(returns relevant blockhashes for <bitcoin_address> from blockrange 661000->tip)

Why is this useful?

Fast wallet rescans: get the relevant blocks and only rescan those via rescanblockchain getblockheader(<hash>)[height] getblockheader(<hash>)[height]). A future PR may add an option to allow to provide an array of blockhashes to rescanblockchain.

prune wallet rescans: (needs additional changes): together with a call to fetch blocks from the p2p network if they have been pruned, it would allow to rescan wallets back to the genesis block in pruned mode (relevant #15946).

SPV mode (needs additional changes): it would be possible to build the blockfilterindex from the p2p network (rather then deriving them from the blocks) and thus allow some sort of hybrid-SPV mode with moderate bandwidth consumption (related #9483)

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 16, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #23549 (Add scanblocks RPC call (attempt 2) by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

When I call without a height param it doesn't actually scan (documentation suggests otherwise).

I was able to find some blocks used by addresses in my own wallet, both receiving on and sending from. If anyone wants to ACK #19136 I'll test with a ranged descriptor too.

Sadly the RPC call timed out, so it didn't return the result. You may need to add an abstraction like for rescanwallet where one can query rescans in progress, but also obtain the result later. That might be a lot of extra code though.

#20295 should also help with prune wallet rescans

Don't forget to drop Conflicts from the second commit message.

Can you also add a test for passing a descriptor object (with range)?

Other RPC methods that involve descriptors require a checksum; we should probably do that here as well.

For a followup it would be nice if descriptors are automatically expanded, so we don't need to specify a range. This comes with some headaches, see #15845

The documentation should remind the user about false positive block matches.

test/functional/rpc_scanblockfilters.py Outdated Show resolved Hide resolved
test/functional/rpc_scanblockfilters.py Outdated Show resolved Hide resolved
test/functional/rpc_scanblockfilters.py Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested concept ACK

Naming-wise, scanobjects, startheight and filtertype should probably be snakecase.

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
test/functional/rpc_scanblockfilters.py Outdated Show resolved Hide resolved
@luke-jr
Copy link
Member

luke-jr commented Jan 3, 2021

Seems like block filters are just a potential implementation detail/optimisation here, not part of the fundamental concept...

Maybe rename it to scanblocks (and consider supporting a slow scan when the filters are disabled?)

@fjahr
Copy link
Contributor

fjahr commented Jan 6, 2021

Concept ACK

@jonasschnelli
Copy link
Contributor Author

Maybe rename it to scanblocks (and consider supporting a slow scan when the filters are disabled?)

I can do that. But the call doesn't scan blocks (hence your proposed name scanblocks). It looks for relevant blockhashes based on descriptors.
scanblocks doesn't sound after what this call is doing.

@flack
Copy link
Contributor

flack commented Jan 6, 2021

maybe call it findblockhashes then? Or findblocks even?

@Sjors
Copy link
Member

Sjors commented Jan 6, 2021

I think scanblocks is fine. Without filters it "scans blocks" and returns block hashes. With filter it doesn't scan raw block data, but it scans filters block by block.

@jonasschnelli
Copy link
Contributor Author

Without filters it "scans blocks" and returns block hashes. With filter it doesn't scan raw block data, but it scans filters block by block.

Wouldn't it be inconsistent to have scanblocks (general) in relation to rescanblockchain (wallet)?
But yes... using scanblock perhaps is acceptable.

@jonasschnelli jonasschnelli changed the title Add scanblockfilters RPC call Add scanblocks RPC call Jan 6, 2021
Copy link

@felipsoarez felipsoarez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, not tested

@jonasschnelli jonasschnelli force-pushed the 2020/12/filterblocks_rpc branch 2 times, most recently from c6f0c5d to f22a048 Compare January 8, 2021 10:42
@jonasschnelli
Copy link
Contributor Author

Overhauled this PR. Fixed all the reported issues.

Added the same interface then scantxoutset (with action "start", "abort", "status"). Allows to get progress reports via a different RPC call (or abort the current scan). No concurrent scans possible.

Also added stop_height.

Some performance reports on an i7-8700 CPU @ 3.20GHz with NVme disk:

  • Scanning an pkh xpub (m/0/* & m/1/*, range 0-220 == 440 sPUk) on mainnet from genesis to tip: 2m19.176s
  • Same xpub, scan only the last three years: 1m0.729s
  • Same xpub, "catch up"-scan the last 5 weeks (scan the 5040 latest blocks): 0m2.465s

Single sPUk:

  • Scanning an single address on mainnet from genesis to tip: 1m20.258s
  • Same address, scan only the last three years: 0m41.519s
  • Same address, "catch up"-scan the last 5 weeks: 0m1.744s

@Sjors
Copy link
Member

Sjors commented Jan 8, 2021

Thanks for the update! Unfortunately unlike rescanblockchain we actually need the result of this call. Just having status is not enough to obtain the result if RPC times out. I think we need to store the result somewhere too, perhaps in a way that you can only read once. However if that's too difficult we could instead recommend that the user pays attention to the start_height and stop_height and calls the RPC in batches for very old wallets.

@jonasschnelli
Copy link
Contributor Author

Thanks for the update! Unfortunately unlike rescanblockchain we actually need the result of this call. Just having status is not enough to obtain the result if RPC times out.

I think we can handle this the same way as scantxoutset (both call are in the same group of how much time they consume).
You just need to set the rpctimeout correctly (as one has to do for the three waitfor* calls).

I thought about adding dispatched threads. start could return a UUID where status and abort would take this as an argument. But it seems overengineering this.

Let's just keep it identical to scantxoutset (they are pretty much identical, one scan the blocks, the other the utxo set).

@Sjors
Copy link
Member

Sjors commented Jan 8, 2021

Having long running tasks that you can manage via a UUID does sound useful, but I agree it would be overkill in this PR.

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Jan 28, 2021
@jonasschnelli
Copy link
Contributor Author

Rebased

},
"[scanobjects,...]"},
{"start_height", RPCArg::Type::NUM, /*default*/ "0", "height to start to filter from"},
{"stop_height", RPCArg::Type::NUM, /*default*/ "<tip>", "height to stop to scan"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

{
return RPCHelpMan{"scanblocks",
"\nReturn relevant blockhashes for given descriptors.\n"
"This call may take serval minutes. Make sure to use no RPC timeout (bitcoin-cli -rpcclienttimeout=0)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This call may take serval minutes. Make sure to use no RPC timeout (bitcoin-cli -rpcclienttimeout=0)",
"This call may take several minutes. Make sure to use no RPC timeout (bitcoin-cli -rpcclienttimeout=0)",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #23549


// loop through the scan objects, add scripts to the needle_set
GCSFilter::ElementSet needle_set;
for (const UniValue& scanobject : request.params[1].get_array().getValues()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my information: This array seems to unbounded. The bound is probably some maximum size of RPC requests in general. Anyway, I wonder whether it is an issue or not. I guess not because nobody mentioned it here. Could anybody elaborate on this a bit, please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only subject to authenticated RPC user input and isn't handled in previous RPC calls (e.g. scantxoutset) so if someone wants to DoS their own node with a massive input array that seems okay to me.

std::vector<BlockFilter> filters;
const CBlockIndex* start_block = block; // for progress reporting
const CBlockIndex* last_scanned_block = block;
g_scanfilter_should_abort_scan = false;
Copy link
Contributor

@kiminuo kiminuo Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my information: Would it make sense to move this line to L2414 and to make the for-loop on L2417 abortable too? Or is it an overkill? I'm not sure how long the for loop can take for "many scanobjects".

If the for-loop can take a long time then this g_scanfilter_should_abort_scan = false; assignment may mean that a user abort request might be ignored.

Copy link
Member

@jamesob jamesob Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overkill IMO; even for a long list, EvalDescriptorStringOrObject should be quite fast since it doesn't rely on any disk IO, unlike the calls below.

# Copyright (c) 2021 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
"""Test the scanblocks rpc call."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What about:

Suggested change
"""Test the scanblocks rpc call."""
"""Test the scanblocks RPC call."""

?

It looks like it is mostly written in this way in tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #23549

from test_framework.test_framework import BitcoinTestFramework
from test_framework.util import assert_equal, assert_raises_rpc_error

class scanblocksTest(BitcoinTestFramework):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
class scanblocksTest(BitcoinTestFramework):
class ScanblocksTest(BitcoinTestFramework):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #23549

{"stop_height", RPCArg::Type::NUM, /*default*/ "<tip>", "height to stop to scan"},
{"filtertype", RPCArg::Type::STR, /*default*/ "basic", "The type name of the filter"}
},
RPCResult{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 6a69dd2 "Add scanblocks RPC call - scan for relevant blocks with descriptors"

This RPCResult does not reflect the actual results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #23549

block = next;
}
ret.pushKV("from_height", start_block->nHeight);
ret.pushKV("to_height", last_scanned_block->nHeight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 6a69dd2 "Add scanblocks RPC call - scan for relevant blocks with descriptors"

I agree with @fjahr here. last_scanned_block isn't actually the last block we have scanned, and if we abort, then to_height does not accurately reflect where we have actually scanned up to. It would be better to move it up into the if the guards the actual scanning. But in that case, it's the same as g_scanfilter_progress_height, so it can just be removed.

self.nodes[0].sendtoaddress("mkS4HXoTYWRTescLGaUTGbtTTYX5EjJyEE", 1.0) # childkey 5 of tpubD6NzVbkrYhZ4WaWSyoBvQwbpLkojyoTZPRsgXELWz3Popb3qkjcJyJUGLnL4qHHoQvao8ESaAstxYSnhyswJ76uZPStJRJCTKvosUCJZL5B

# mine a block and assure that the mined blockhash is in the filterresult
blockhash = self.nodes[0].generate(1)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 71b7cdb "Add scanblock functional test"

Since the indexes are built asynchronously, it is possible that after generating the block, the block filter index has not updated to include the new block, so scanblocks might not find anything. Instead, we need to wait for it to become synced:

Suggested change
blockhash = self.nodes[0].generate(1)[0]
blockhash = self.nodes[0].generate(1)[0]
self.wait_until(lambda: all(i["synced"] for i in self.nodes[0].getindexinfo().values()))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #23549

@theStack
Copy link
Contributor

Concept ACK

There have been various code reviews but no changes since months. @jonasschnelli: are you still working on this PR? (Just checking out before I start to code-review).

@maflcko
Copy link
Member

maflcko commented Jul 31, 2021

Removed from high-prio for now, but can be added back any time.

@Kirandevraj
Copy link

Compiled and tested Successfully 6a69dd2 at Ubuntu 18.04.
Started Bitcoin Core in regtest with keypool=5. Created and mined blocks to addresses in descriptor wallet after creating addresses more than the keypool size using getnewaddress. Scanned blocks for all the addresses using scanblocks RPC and were able to get the right blocks that contained the transactions in it for all the addresses.

Copy link
Member

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@jamesob
Copy link
Member

jamesob commented Oct 5, 2021

Concept ACK. Would love to see this, or any other mechanism that allows performing wallet rescans in ~2min on good hardware, make it in.

@luke-jr
Copy link
Member

luke-jr commented Oct 5, 2021

Rebased here: master...luke-jr:rpc_filterblocks

@maflcko
Copy link
Member

maflcko commented Oct 6, 2021

Should this be marked "up for grabs" @jonasschnelli ?

# also test the stop height
assert(blockhash in self.nodes[0].scanblocks("start", ["addr("+addr_1+")"], self.nodes[0].getblockheader(blockhash)['height'], self.nodes[0].getblockheader(blockhash)['height'])['relevant_blocks'])

# use the stop_height to exclude the relevent block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevent ==> relevant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #23549

@jamesob
Copy link
Member

jamesob commented Oct 10, 2021

I haven't started review yet, but given @MarcoFalke has done some previous work on this, we should ensure that this change isn't subject to the bugs he points out here: #15845 (comment)

@jamesob
Copy link
Member

jamesob commented Nov 18, 2021

For what it's worth, I'm working on resurrecting this PR. Hoping to have a branch pushed by end of day.

@meshcollider
Copy link
Contributor

Please see #23549

@jamesob jamesob mentioned this pull request Dec 9, 2021
4 tasks
achow101 added a commit that referenced this pull request Oct 13, 2022
626b7c8 fuzz: add scanblocks as safe for fuzzing (James O'Beirne)
94fe545 test: rpc: add scanblocks functional test (Jonas Schnelli)
6ef2566 rpc: add scanblocks - scan for relevant blocks with descriptors (Jonas Schnelli)
a4258f6 rpc: move-only: consolidate blockchain scan args (James O'Beirne)

Pull request description:

  Revives #20664. All feedback from the previous PR has either been responded to inline or incorporated here.

  ---

  Major changes from Jonas' PR:
  - consolidated arguments for scantxoutset/scanblocks
  - substantial cleanup of the functional test

  Here's the range-diff (`git range-diff master jonasschnelli/2020/12/filterblocks_rpc jamesob/2021-11-scanblocks`): https://gist.github.com/jamesob/aa4a975344209f0316444b8de2ec1d18

  ### Original PR description

  > The `scanblocks` RPC call allows one to get relevant blockhashes from a set of descriptors by scanning all blockfilters in a given range.
  >
  > **Example:**
  >
  > `scanblocks start '["addr(<bitcoin_address>)"]' 661000` (returns relevant blockhashes for `<bitcoin_address>` from blockrange 661000->tip)
  >
  > ## Why is this useful?
  > **Fast wallet rescans**: get the relevant blocks and only rescan those via `rescanblockchain getblockheader(<hash>)[height] getblockheader(<hash>)[height])`. A future PR may add an option to allow to provide an array of blockhashes to `rescanblockchain`.
  >
  > **prune wallet rescans**: (_needs additional changes_): together with a call to fetch blocks from the p2p network if they have been pruned, it would allow to rescan wallets back to the genesis block in pruned mode (relevant #15946).
  >
  > **SPV mode** (_needs additional changes_): it would be possible to build the blockfilterindex from the p2p network (rather then deriving them from the blocks) and thus allow some sort of hybrid-SPV mode with moderate bandwidth consumption (related #9483)

ACKs for top commit:
  furszy:
    diff re-ACK 626b7c8

Tree-SHA512: f84e4dcb851b122b39e9700c58fbc31e899cdcf9b587df9505eaf1f45578cc4253e89ce2a45d1ff21bd213e31ddeedbbcad2c80810f46755b30acc17b07e2873
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 23, 2022
626b7c8 fuzz: add scanblocks as safe for fuzzing (James O'Beirne)
94fe545 test: rpc: add scanblocks functional test (Jonas Schnelli)
6ef2566 rpc: add scanblocks - scan for relevant blocks with descriptors (Jonas Schnelli)
a4258f6 rpc: move-only: consolidate blockchain scan args (James O'Beirne)

Pull request description:

  Revives bitcoin#20664. All feedback from the previous PR has either been responded to inline or incorporated here.

  ---

  Major changes from Jonas' PR:
  - consolidated arguments for scantxoutset/scanblocks
  - substantial cleanup of the functional test

  Here's the range-diff (`git range-diff master jonasschnelli/2020/12/filterblocks_rpc jamesob/2021-11-scanblocks`): https://gist.github.com/jamesob/aa4a975344209f0316444b8de2ec1d18

  ### Original PR description

  > The `scanblocks` RPC call allows one to get relevant blockhashes from a set of descriptors by scanning all blockfilters in a given range.
  >
  > **Example:**
  >
  > `scanblocks start '["addr(<bitcoin_address>)"]' 661000` (returns relevant blockhashes for `<bitcoin_address>` from blockrange 661000->tip)
  >
  > ## Why is this useful?
  > **Fast wallet rescans**: get the relevant blocks and only rescan those via `rescanblockchain getblockheader(<hash>)[height] getblockheader(<hash>)[height])`. A future PR may add an option to allow to provide an array of blockhashes to `rescanblockchain`.
  >
  > **prune wallet rescans**: (_needs additional changes_): together with a call to fetch blocks from the p2p network if they have been pruned, it would allow to rescan wallets back to the genesis block in pruned mode (relevant bitcoin#15946).
  >
  > **SPV mode** (_needs additional changes_): it would be possible to build the blockfilterindex from the p2p network (rather then deriving them from the blocks) and thus allow some sort of hybrid-SPV mode with moderate bandwidth consumption (related bitcoin#9483)

ACKs for top commit:
  furszy:
    diff re-ACK 626b7c8

Tree-SHA512: f84e4dcb851b122b39e9700c58fbc31e899cdcf9b587df9505eaf1f45578cc4253e89ce2a45d1ff21bd213e31ddeedbbcad2c80810f46755b30acc17b07e2873
@bitcoin bitcoin locked and limited conversation to collaborators Dec 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet