Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add address-based index (attempt 4?) #14053

Closed
wants to merge 4 commits into from

Conversation

marcinja
Copy link
Contributor

@marcinja marcinja commented Aug 24, 2018

Adds index to transactions by scriptPubKey. Based off of #2802. Stores hashes of scripts (hashed using Murmurhash3, with a hash seed that is stored in the index database), along with the COutPoint for the output which was spent/created, and the CDiskTxPos for the transaction in which this happened.

@marcinja marcinja mentioned this pull request Aug 24, 2018
@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 24, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed most of the code, but just skimmed tests. It looks to me like this PR could be merged basically in its current form, so I'm curious if you're intending to make the improvements cited in the PR description above here or in a separate PRs.

I left a few minor suggestions about the code, which you should feel free to ignore. The only changes I would definitely like to see here are:

  • adding some python code in test/functional/ to call the new rpc method
  • adding a blurb in doc/release notes.md to describe the feature and maybe mention some use-cases

src/index/base.h Outdated Show resolved Hide resolved
std::unique_ptr<AddrIndex> g_addrindex;

/**
* Access to the addrindex database (indexes/addrindex/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Introduce address index" (3c7cc3c)

Note: new index/addrindex.cpp, index/addrindex.h, and test/addrindex_tests.cpp files in this commit mirror existing index/txindex.cpp and index/txindex.h, test/txindex_tests.cpp files and have some code and comments in common. It can help to diff the addr files against the tx files when reviewing this PR.

src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/rpc/rawtransaction.cpp Outdated Show resolved Hide resolved
src/init.cpp Outdated Show resolved Hide resolved
"\nArguments:\n"
"1. \"address\" (string, required) The address to search for\n"
"2. \"verbose\" (bool, optional, default = false) If set to false, only returns data for hex-encoded `txid`s. \n"
"3. \"skip\" (numeric, optional, default = 0) If set, the result skips this number of initial values. \n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip and count probably make sense on an options object instead.

src/rpc/rawtransaction.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jimpo jimpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I'm glad someone's working on this. Concept ACK.

  1. The AddrIndex should return information about the outpoints and differentiate between outputs and spends, not just return the raw transactions. In fact, the AddrIndex could just return outpoints, then the client code could use the TxIndex could to fetch the tx bodies. It'd involve a separate lookup though.
  2. I don't think it's necessary to delete keys from the database when a block is disconnected. There's no harm in leaving it. The higher level methods to search the index can then filter for results that are on the main chain if that's what the client wants. It'd have to do this anyway to avoid races with reorgs and such.
  3. I'm worried about collisions on address IDs because they are only 64 bits. I can think of three options, 1) use a 32 byte cryptographic hash, 2) use a 20 byte cryptographic hash of the script plus some randomly generated, database-global salt, 3) use a 32- or 64-bit non-cryptographic hash (might as well use Murmur3 or SipHash, not truncated SHA256), then store the full script as the database value to double check against. Option 3 feels best to me.
  4. What's the purpose of having the first byte of the block hash as the value? It doesn't seem robust nor particularly useful.

src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.h Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
src/index/base.cpp Outdated Show resolved Hide resolved
src/index/base.h Outdated Show resolved Hide resolved
src/index/base.h Outdated Show resolved Hide resolved
src/index/addrindex.h Outdated Show resolved Hide resolved
src/index/addrindex.cpp Outdated Show resolved Hide resolved
@marcinja
Copy link
Contributor Author

Thanks for all the reviews.

To answer some of @jimpo's questions:
2 & 4. I included part of the block hash so that in BlockDisconnected we ae sure to remove the entries in the index from this block only (that's where filter_by_value is used). The reason I chose to remove entries from the database is to prevent reading into a block file using an old CDiskTxPos that may no longer be a valid position. Otherwise in FindTxsByScript you could run into errors. You're right that this problem would be better handled by higher level methods.

I think that returning just the outpoint is a better idea than the current choice so I'll switch to that and try to incorporate all the other feedback here.

@Talkless
Copy link

@marcinja will this allow other wallets like Electrum to utilize this feature (RPC credentials provided, of course) and avoid having to run ElecturmX (https://github.com/spesmilo/electrumx), EPS (https://github.com/chris-belcher/electrum-personal-server) or BWT (https://github.com/shesek/bwt) intermediary software?

@ecdsa @SomberNight @chris-belcher @shesek could you provide your input on how this feature might be useful (or not)?

@romanz
Copy link
Contributor

romanz commented Dec 13, 2020

The main issue AFAIU is that Electrum is using SHA256(scriptPubKey) while this PR is using MurmurHash3(scriptPubKey).
Also, ElectrumX & electrs are using RocksDB for the index storage - resulting in better performance and disk usage (compared to LevelDB).

@SomberNight
Copy link
Contributor

@Talkless note that while some Electrum users run their own bitcoind, many do not. Electrum wants to support both use cases, and in fact the suspicion is that most users just use a public server. When using a public server, the client cannot use bitcoind RPC, hence in that case I don't see how the middleware (e.g. ElectrumX) could be avoided.

For the own bitcoind use case, maybe the client could have another optional mode of operation where it uses bitcoind RPC directly, which is I guess what you are asking about. For that, an address-index in bitcoind is the main thing missing indeed, however not the only one. For one, the electrum protocol (the client<->"middleware server" connection) has address subscriptions - the client gets a notification when a history of one its addresses changes. We are also planning on soon adding another method into the protocol that allows txoutpoint->spender_txid lookups (and notifications); I guess that could be implemented using the index in this PR albeit in a very inefficient way for heavily reused addresses.

IMHO there are multiple upsides for having this middleware setup for Electrum:

  • for the project, it keeps the codebase simpler (again, we want to support users without own bitcoind)
  • for the project, it also allows for more flexibility for implementing new functionality: just consider present PR here, we have needed such an index for 9 years but bitcoind did not have it or want it, so we could just implement it ourselves
  • for the server operator, even if they don't want to open up the server for the public, they could share it with their friends and family. I don't think that's feasible with bitcoind RPC. I think this is a common use case.

Nevertheless if someone steps up and contributes patches, this kind of thing could be added.


The total size of the address index right now is 223GB

That sounds much larger than expected.
Even with the txoutpoint->spender_txid map I mentioned above, when using LevelDB, ElectrumX uses around 90 GiB of disk space.

The DB keys are structured as follows: <addr_id, key_type, outpoint>
The DB values are simply: <CDiskTxPos, CScript>

Instead of having both addr_id and CScript, why not just put a long hash, e.g. sha256(CScript) into the key?

Also, I expect most users of this index would also want txindex enabled. You might want to consider making address index dependent on txindex. Have you investigated how much space that would save? There would be no need to store CDiskTxPos.

Another trick that ElectrumX uses is that only the best chain is indexed. We have a tx_num->txid map as raw files on disk. A tx_num is the index of the transaction in the linear history of the blockchain, a 5-byte integer (so e.g. the genesis block coinbase tx has tx_num=0). This map uses around 17 GiB atm. With this, you can encode the txid part of the outpoint as 5 bytes instead of 32 bytes.

@maflcko
Copy link
Member

maflcko commented Dec 17, 2020

Concept ACK (might have already done that)

@sipa
Copy link
Member

sipa commented Dec 17, 2020

I'm concept -0 on this.

My primary objection is that I think it's a bad idea for any infrastructure to be built all that relies on having fully-indexed blockchain data available (this also applies to txindex, but we can't just remove support...).

However, it seems many people want something like this, and are going to use it anyway. The question is then whether it belongs in the bitcoin-core codebase. Alternative, and more performant presumably, like electrs exist already too, so it isn't exactly impossible to do this elsewhere.

Still, given that we now have the indexes infrastructure, it means that things like this are easy to add in a fairly modular way without invading consensus code. So if people really want this, fine.

Overall approach comment: I don't think MurmurHash should be used for anything new; there are strictly better hash functions available. I'd suggest SipHash if that's fast enough.

@jamesob
Copy link
Member

jamesob commented Dec 17, 2020

I'm also a little more negative on having this in Core than I previously was. After working in a few industrial contexts on wallet stuff, it's clear to me that an address index is really only required if you want to implement a block explorer or do chain analysis. For both of these applications, using something like electrs seems sufficient.

For personal wallet management, a full address index is not required. I think the origin of some confusion is that things like the Electrum Personal Server have become synonymous with this kind of usage, but in reality a full index is overkill when descriptor-specific rescans can be done once for a historical backfill and then per-block scanning can be done from there on out.

I want to point out that this is a nice implementation and good work by @marcinja, but I'm leaning slightly against the inclusion of such an index in Core at this point.

@jonasschnelli
Copy link
Contributor

My primary objection is that I think it's a bad idea for any infrastructure to be built that relies on having fully-indexed blockchain data available.

I agree on this.

IMO the only use cases to ever use a full address index are:

    1. Instant seed/xpriv backup recovery including spent history
    1. Backend service for thousands of wallets
    1. Debug/explore purposes

1 (instant backup recovery) could be solved with either scantxoutset (take a minute or two) or by an address-index for the utxo set only. But both would not restore the spent history.
A scalable non-enterprise solution to restore the spent history is using blockfilters. Scan through the filters and rescan only the relevant blocks (a matter of minutes), see #20664.

2 (a backend for thousands of wallets): out of scope for this project.

3 (explore purposes): I think this is a valid use case. Though adding this PR to Bitcoin Core will lead to many many projects using it in production increasing the traffic in this project and eventually steal time from existing contributors (rebase, maintenance, drag-along)

My main fear is that people are going to use this index (a full address index) to use it as an electrum(ish) backend for a handful of wallets.

With multiwallet, watch-only-wallets, PSBT, we have all tools to server multiple wallets in a scalable way for external applications.

I also think merging this as it is, would be in contradiction to the process- and repository-separation effort.

Therefore I'm ~0 (slightly towards NACK) to add this.
If this would be in another repository (still under bitcoin/*) and process separated, I would ACK it.

@marcinja
Copy link
Contributor Author

marcinja commented Jan 4, 2021

Hi all, thanks for the feedback and review. This was an enjoyable PR to work on and I learned a lot from all your comments.

I'm closing this PR because its size probably requires stronger support from contributors to get in. It also seems more clear now that all of the practical use-cases are covered by existing features and some lightweight alternatives (#20664) .

I also agree that it would be bad to incentivize using an address index to support an external electrum wallet, when it's not the intended use-case and would cause unnecessary burden on contributors and maintainers in this project, e.g. from users of those wallets wanting new features or updates.

@marcinja marcinja closed this Jan 4, 2021
@jonatack
Copy link
Contributor

jonatack commented Jan 4, 2021

@marcinja thank you and I hope to see more contributions of this quality from you.

vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 15, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 16, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Sep 17, 2021
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak)

Pull request description:

  Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053.

ACKs for top commit:
  jnewbery:
    utACK 7668db3
  promag:
    ACK 7668db3.

Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Sep 17, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 20, 2021
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak)

Pull request description:

  Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053.

ACKs for top commit:
  jnewbery:
    utACK 7668db3
  promag:
    ACK 7668db3.

Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 20, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 20, 2021
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak)

Pull request description:

  Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053.

ACKs for top commit:
  jnewbery:
    utACK 7668db3
  promag:
    ACK 7668db3.

Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 20, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 22, 2021
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak)

Pull request description:

  Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053.

ACKs for top commit:
  jnewbery:
    utACK 7668db3
  promag:
    ACK 7668db3.

Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Sep 22, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
kwvg pushed a commit to kwvg/dash that referenced this pull request Oct 12, 2021
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak)

Pull request description:

  Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053.

ACKs for top commit:
  jnewbery:
    utACK 7668db3
  promag:
    ACK 7668db3.

Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
kwvg pushed a commit to kwvg/dash that referenced this pull request Oct 12, 2021
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
gades pushed a commit to cosanta/cosanta-core that referenced this pull request May 2, 2022
7668db3 Move only: Move CDiskTxPos to its own file (Marcin Jachymiak)

Pull request description:

  Moves `CDiskTxPos` it its own file so it can be used without the `txindex.h` include elsewhere. Originally part of bitcoin#14053.

ACKs for top commit:
  jnewbery:
    utACK 7668db3
  promag:
    ACK 7668db3.

Tree-SHA512: b108e980ad04e43d1323410c3683a82bed70aee7795f5d8a2afbaf32a07ba598571f00b047bdde15048124b17178bcbd10654c48461beac988e9643cb2df664c
gades pushed a commit to cosanta/cosanta-core that referenced this pull request May 2, 2022
8ed2f1e Remove unused includes (Marcin Jachymiak)
cf095a5 Move comment about BaseIndex::DB from TxIndex::DB (Marcin Jachymiak)

Pull request description:

  Moves a comment about the `BaseIndex::DB` from the `TxIndex::DB` into the correct place. Originally part of bitcoin#14053.

ACKs for top commit:
  fanquake:
    ACK 8ed2f1e

Tree-SHA512: cb4e2b916c7ab996961cc2e1d910bc4b8a1700eb32b70fc1657ca720117a7a84f7337fe5e4fb30e047aa92c31eaa976eaaa5cb8f861877f2ff6f4a59bb94f4e9
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet