Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(chain): calculate DescriptorId as the sha256 hash of spk at index 0 #1486

Merged

Conversation

notmandatory
Copy link
Member

@notmandatory notmandatory commented Jun 24, 2024

Description

Rename DescriptorId to KeychainId and descriptor_id() to keychain_id().

Calculate keychain ids as the hash of the spk derived from its descriptor as index 0.

Added docs to Wallet and KeychainTxOutIndex::insert_descriptor() explaining that it's the users responsibility not to use wildcard and non-wildcard descriptors that can derive the same script pubkey. I also recommended for Wallet that legacy non-wildcard descriptors be added to a temporary Wallet and swept into a modern wildcard descriptor.

Notes to the reviewers

fixes #1483

Changelog notice

changed

  • Calculate DescriptorId as the sha256 hash of the spk at index 0.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@notmandatory notmandatory self-assigned this Jun 24, 2024
@notmandatory notmandatory added bug Something isn't working module-blockchain labels Jun 24, 2024
@notmandatory notmandatory added this to the 1.0.0-alpha milestone Jun 24, 2024
@LLFourn
Copy link
Contributor

LLFourn commented Jun 24, 2024

Description

This is a proof of concept and likely incomplete!

Refactored DescriptorExt::descriptor_id() to calculate descriptor ids as the hash of the SPK as index 0. For any xkeys in the descriptor that don't end in a wildcard, replaces the last derivation index with a wildcard for the purpose of calculating the descriptor id.

I don't feel doing a blind wildcard replacement is worth the trouble here and actually interferes with the design @evanlinjin is implementing I think. It should just be the SPK at index 0 while leaving the descriptor as is.

@evanlinjin
Copy link
Member

evanlinjin commented Jun 24, 2024

I'm not a fan of doing the wildcard-replacement as it creates a behavior that I find hard to reason with: We cannot have 2 non-wildcard descriptors that are part of the same keychain (even though their derived spks do not conflict).

I think it's better to disallow non-wildcard descriptors in KeychainTxOutIndex. This way, DesciptorId is always guaranteed to be unique. If the caller wishes to use single addresses, they can use SpkTxOutIndex. This means bdk_wallet cannot have non-wildcard descriptors. I think it's okay because bdk_wallet is meant to be opinionated and simple.

I'm going to provide a few suggestions that I think is enough for us to move into bdk_wallet beta territory.

  1. Rename DescriptorId into KeychainId. We only want unique id's for keychains (chains of keys). Single-spk descriptors are not keychains (by this definition).
  2. Not all descriptors have a KeychainId. Refactor DescriptorExt::descriptor_id() to DescriptorExt::keychain_id() -> Option<KeychainId>.
  3. KeychainTxOutIndex should error out when we try introduce a non-wildcard descriptor (therefore, so does bdk_wallet). If it is introduced in the changeset, we debug_assert! or ignore it (for non-debug).

I'm already working on renaming the K generic of KeychainTxOutIndex to L for "label" (which is used to label keychains). We can cherry-pick commits to combine these into one PR.

@notmandatory
Copy link
Member Author

Thanks for the feedback, I'll update this PR as suggested.

@evanlinjin I'd still like to have some way to sweep single SPK descriptors into a bdk_wallet, can that be done with SpkTxOutIndex somehow?

@notmandatory notmandatory changed the title refact(chain): Use hash of SPK at index 0 as Descriptor unique ID refactor(chain)!: use hash of spk at index 0 as keychain's unique id Jun 25, 2024
@notmandatory
Copy link
Member Author

There are still a few tests failing that I could use some help fixing.

@notmandatory notmandatory force-pushed the refact/spk_descriptor_id branch 2 times, most recently from 4377f16 to 74e498b Compare June 25, 2024 05:05
@notmandatory
Copy link
Member Author

I had to ignore 3 PSBT tests and 1 Wallet test, added TODOs to fix them during 1.0.0-beta phase.

@storopoli
Copy link
Contributor

storopoli commented Jun 25, 2024

I had to ignore 3 PSBT tests and 1 Wallet test, added TODOs to fix them during 1.0.0-beta phase.

We should also open an issue no (if this is merged)? TODO comments get lost and forgotten quite quickly

@notmandatory notmandatory force-pushed the refact/spk_descriptor_id branch 2 times, most recently from 2425c01 to 518f3e9 Compare June 25, 2024 16:48
@notmandatory
Copy link
Member Author

I simplified this PR back to allowing wildcard and non-wildcard descriptors in a Wallet or KeychainTxOutIndex. We determine the KeychainId based on the hash of the derived index 0 script pubkey. No new errors or other API changes are needed and all existing tests pass.

I update docs to explain that it's the developer's responsibility to ensure that wildcard and non-wildcard descriptors don't derive the same spk if used in the same Wallet or the same KeychainTxOutIndex.

@notmandatory notmandatory marked this pull request as ready for review June 25, 2024 17:29
@notmandatory
Copy link
Member Author

@evanlinjin if you're OK with this solution I'd like to move it to the beta milestone since it won't change the wallet API.

@evanlinjin
Copy link
Member

@notmandatory this is a breaking change because KeychainId is part of the changeset.

@notmandatory
Copy link
Member Author

@evanlinjin ah ok I wasn't thinking about the changeset, I'll keep this is the alpha milestone.

@evanlinjin
Copy link
Member

@notmandatory I've been thinking about this. The problem with your solution is that any descriptor/key can be turned into a DefiniteDescriptor/DefiniteDescriptorKey. If done so, we cannot detect conflicts of spks.

If we want to support single spk descriptors, I think we should do it properly. The way to do this is to have two versions of Wallet or do something like Wallet<Indexer>. Also, I imagine the behaviour of single-spk wallets will be quite different to keychain wallets.

I would first focus on a keychain wallet first and get it correct. Then we can build a single-spk wallet later (it shouldn't be too hard?).

@notmandatory
Copy link
Member Author

notmandatory commented Jun 26, 2024

@evanlinjin I'd rather not do any more redesign on Wallet right now for this when BDK pre-1.0 has also likely had this issue and it's never been an problem. The single key functionality can still be safely used if the few users who might need it mind the caveats I put in the docs. I'd be happy to make fixing this properly one of the 2.0 milestone goals.

@evanlinjin
Copy link
Member

evanlinjin commented Jun 26, 2024

The problem is the KeychainId definition cannot easily be changed later.

Edit: Okay I guess for actual keychains it'll be the same. We won't have KeychainIds for non-keychains in the future.

Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some docs to update, but looks good to me.

crates/chain/src/descriptor_ext.rs Outdated Show resolved Hide resolved
crates/chain/src/descriptor_ext.rs Outdated Show resolved Hide resolved
@notmandatory
Copy link
Member Author

The problem is the KeychainId definition cannot easily be changed later.

Edit: Okay I guess for actual keychains it'll be the same. We won't have KeychainIds for non-keychains in the future.

Maybe this is just semantics, but technically even a non-wildcard descriptor could be called a keychain since as currently coded it's pks still derive a key at each index, they're just all the same. 🙃

https://docs.rs/miniscript/latest/miniscript/descriptor/enum.DescriptorPublicKey.html#method.at_derivation_index

@evanlinjin
Copy link
Member

Maybe this is just semantics, but technically even a non-wildcard descriptor could be called a keychain since as currently coded it's pks still derive a key at each index, they're just all the same. 🙃

https://docs.rs/miniscript/latest/miniscript/descriptor/enum.DescriptorPublicKey.html#method.at_derivation_index

Sure, but as long as there is a pathway to achieve correctness (perfect uniqueness) of KeychainId, I am happy to have this as a compromise for now. Also, I would argue that having keychain-wallet and spk-wallet separate would be a better API. I.e. you don't need to "reveal" more spks if there is no keychain.

@evanlinjin
Copy link
Member

I would be happy to ACK and merge this soon so I can base my changes on this. I am moving keychains_added out of keychain::ChangeSet and into CombinedChangeSet (since we can't do this later on as it's a breaking change).

@notmandatory
Copy link
Member Author

I also added a little test to clarify what happens if overlapping keychains are used in a Wallet, at least in dev mode this situation will panic, and in non-dev the correct balance is still given. We can build on this test when developing the future redesign.

Copy link
Contributor

@LLFourn LLFourn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR was better when it didn't rename DescriptorId to KeychainId. There is not concept of a Keychain in this PR and only descriptors have a KeychainId apparently. In my mind this should have just been about calculating DescriptorId differently. Anyway @evanlinjin explicitly requested it....

I think it's better to disallow non-wildcard descriptors in KeychainTxOutIndex. This way, DesciptorId is always guaranteed to be unique.

Unfortunately this is not the case. wsh(multi(2,xpubA/42, xpubB/*) and wsh(multi(2, xpubA/*, xpubB/42) have the same spk at index 42 but different at every other index. It is actually impossible to ensure that a malicious party cannot generate two completely different descriptors that happen to have the same spks at two different derivation indices. This is because wpkh, pkh and sh descriptors all use 20 byte hashes which are not sufficiently collision resistant.

Once again in order to have this "zero collisions" property you must enforce it at the application level. If this needs to be enforced against a malicious adversary who can insert whatever descriptor they want then you must disallow all the 20-byte hash spk descriptor types and make sure that all keys are xpubs with wildcards...I think! We can document this but there's nothing else we can do. The design here is still right. If we can make what happens when there are collisions less surprising that's good but otherwise we should document and ignore this problem.

fn keychain_id(&self) -> KeychainId {
let spk = self.at_derivation_index(0).unwrap().script_pubkey();
let spk_bytes = <Vec<u8>>::from(spk.as_bytes());
KeychainId(Hash::hash(&spk_bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ can you put sha256::Hash::hash or something. It's not clear which hash algorithm you're using.

⛏️ you shouldn't need to allocate a Vec here. Why can't you just put the as_bytes into the hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@notmandatory
Copy link
Member Author

I think this PR was better when it didn't rename DescriptorId to KeychainId. There is not concept of a Keychain in this PR and only descriptors have a KeychainId apparently. In my mind this should have just been about calculating DescriptorId differently.

I rolled back the DescriptorId name change. If @evanlinjin or anyone else has really strong opinions about the name we can discuss but for now I agree it doesn't add anything to the main purpose of this PR.

@notmandatory notmandatory changed the title refactor(chain)!: use hash of spk at index 0 as keychain's unique id refactor(chain): use sha256 hash of spk at index 0 as keychain's unique id Jun 27, 2024
@notmandatory notmandatory changed the title refactor(chain): use sha256 hash of spk at index 0 as keychain's unique id refactor(chain): calculate DescriptorId as the sha256 hash of the spk at index 0 Jun 27, 2024
@notmandatory
Copy link
Member Author

Once again in order to have this "zero collisions" property you must enforce it at the application level. If this needs to be enforced against a malicious adversary who can insert whatever descriptor they want then you must disallow all the 20-byte hash spk descriptor types and make sure that all keys are xpubs with wildcards...I think! We can document this but there's nothing else we can do. The design here is still right. If we can make what happens when there are collisions less surprising that's good but otherwise we should document and ignore this problem.

Is my current warning for Wallet and KeychainTxOutIndex::insert_descriptor() adequate for now? For a typical wallet use case I expect the most likely collisions would be someone trying to use/misuse non-wildcard descriptors.

@LLFourn
Copy link
Contributor

LLFourn commented Jun 27, 2024

Thanks. The PR is now much easier to review without the name change.

Here's my attempt at documenting this:

KeychainTxOutIndex will prevent you from inserting two descriptors which derive the same script pubkey at index 0 but it's up to you to ensure that descriptors don't collide at other indices. If they do nothing catastrophic happens at the KeychainTxOutIndex level (one keychain just becomes the defacto owner of that spk arbitrarily) but this may have subtle implications up the application stack like one UTXO being missing from one keychain because it has been assigned to another which produces the same script pubkey.

Also update docs to explain how KeychainTxOutIndex handles descriptors that
generate the same spks.
@notmandatory
Copy link
Member Author

@LLFourn I updated the docs with your much more nuanced explanation.

Copy link
Contributor

@oleonardolima oleonardolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK 8f5b172

@LLFourn
Copy link
Contributor

LLFourn commented Jun 27, 2024

ACK 8f5b172

@notmandatory notmandatory merged commit 22368ab into bitcoindevkit:master Jun 28, 2024
12 checks passed
@notmandatory notmandatory mentioned this pull request Jul 20, 2024
30 tasks
@notmandatory notmandatory changed the title refactor(chain): calculate DescriptorId as the sha256 hash of the spk at index 0 refactor(chain): calculate DescriptorId as the sha256 hash of spk at index 0 Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-blockchain
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

DescriptorId is not unique enough
5 participants