Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement BIP 340-342 validation (Schnorr/taproot/tapscript) #17977

Open
wants to merge 19 commits into
base: master
from

Conversation

@sipa
Copy link
Member

sipa commented Jan 21, 2020

This is an implementation of the Schnorr/taproot consensus rules proposed by BIPs 340, 341, and 342 (see current bitcoin/bips#876).

It consists of:

  • #16902 to avoid the O(n^2) behavior in IF/ELSE/END handling that would be exacerbated by the BIP 342 changes.
  • Addition of Schnorr signatures and 32-byte pubkey support to libsecp256k1 subtree (bitcoin-core/secp256k1#558 PR 558), following BIP 340.
  • The taproot validation specified in BIP 341.
  • Script validation under taproot (aka tapscript), specified in BIP 342.
  • Addition of signing logic for Schnorr/Taproot to the Python test framework, and tests for the above.

This does not include any wallet support.

Merging this is obviously conditional on getting community support for the proposal. It's opened here to demonstrate the code changes that it would imply.

@fanquake fanquake added the Consensus label Jan 21, 2020
@sipa sipa force-pushed the sipa:taproot branch 4 times, most recently from c9b9958 to 7b11d12 Jan 21, 2020
@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jan 21, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #18071 (Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 by JeremyRubin)
  • #18011 (Replace current benchmarking framework with nanobench by martinus)
  • #18002 (Abstract out script execution out of VerifyWitnessProgram() by sipa)
  • #17319 (Tests: remove bignum module by jnewbery)
  • #16902 (O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation by sipa)
  • #16653 (script: add simple signature support (checker/creator) by kallewoof)
  • #16440 (BIP-322: Generic signed message format by kallewoof)
  • #16411 (BIP-325: Signet support by kallewoof)
  • #13533 ([tests] Reduced number of validations in tx_validationcache_tests by lucash-dev)
  • #13062 (Make script interpreter independent from storage type CScript by sipa)
  • #10729 (Wrap EvalScript in a ScriptExecution class by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa sipa force-pushed the sipa:taproot branch 2 times, most recently from 4ba9fcf to 1f499c5 Jan 22, 2020
src/script/interpreter.cpp Outdated Show resolved Hide resolved
src/chainparams.cpp Outdated Show resolved Hide resolved
@MaxHillebrand MaxHillebrand mentioned this pull request Jan 22, 2020
1 of 14 tasks complete
Copy link

MaxHillebrand left a comment

As the BIPs have now been assigned numbers, this changes...:

  • BIP-Schnorr to BIP-340
  • BIP-Taproot to BIP-341
  • BIP-Tapscript to BIP-342
src/consensus/params.h Outdated Show resolved Hide resolved
src/hash.h Outdated Show resolved Hide resolved
test/functional/test_framework/key.py Outdated Show resolved Hide resolved
// Taproot spend
const auto& stack = tx.vin[i].scriptWitness.stack;
size_t stack_size = stack.size();
if (stack_size >= 2 && !stack[stack_size - 1].empty() && stack[stack_size - 1][0] == ANNEX_TAG) {

This comment has been minimized.

Copy link
@skwp

skwp Jan 22, 2020

Is there a name for this condition that can be explained with a function name or at least a comment?

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 22, 2020

Member

I don't see the problem, personally.

This comment has been minimized.

Copy link
@skwp

skwp Jan 22, 2020

hey kalle, please see my comment below to pieter. I'm seeing if people are open to using a more consistent single level of abstraction to make the source easier to read.

e.g. this can be evaluated by a newb (ignore if I got the concepts wrong, I'm trying to illustrate a more literate / single level of abstraction style as per http://principles-wiki.net/principles:single_level_of_abstraction)

if (payToTapRoot)
  if annexOnStack
    stack_size--; // ignore annex
  else 
    if (OpWhateverOnStack)
      checkWhateverConditionWithLoop
    end

Again I could have totally gotten the details wrong but if the program looked like that at a single level of abstraction, with magic numbers given names and conditionals given names, many more people could actually participate in the PR process and grok what's going on.

This comment has been minimized.

Copy link
@sipa

sipa Jan 22, 2020

Author Member

I've added some comments. Better now?

This comment has been minimized.

Copy link
@skwp

skwp Jan 22, 2020

Thanks definitely helps, though I would still love to see some vars extracted to named for readability and deduplication. I'm not gonna nitpick your code for style, I'm just a humble bitcoin pleb. But I do know what it's like to scale codebases to be friendly to onboarding new devs. I do think readability should be a concern if we want more devs in bitcoin. e.g. stack[stack_size - 1] is repeated all over the place. call it top_of_stack and so much more readable! 🤷‍♂ 😄

This comment has been minimized.

Copy link
@sipa

sipa Jan 22, 2020

Author Member

Well the reason is that the stack_size variable changes, so a "top_of_stack" one would need to be updated, kinda defeating its meaning.

This comment has been minimized.

Copy link
@sipa

sipa Jan 23, 2020

Author Member

I've changed this code quite a bit. What do you think?

src/policy/policy.cpp Outdated Show resolved Hide resolved
@sipa

This comment has been minimized.

Copy link
Member Author

sipa commented Jan 22, 2020

@MaxHillebrand A few overall comments:

  • I don't think all references to (bip-)taproot/tapscript/schnorr should be changed to the BIP numbers; in some cases maybe we should just drop the "bip-" prefix (e.g. I think talking about a "taproot spend" is more clear than "bip341 spend").
  • All changes in the src/secp256k1 directory should go to bitcoin-core/secp256k1#558 instead (the src/secp256k1 is a git subtree imported from there).
  • The "BIPSchnorr" and "BIPSchnorrDerive" tagged hash tags are part of the spec, which I don't think should be changed.
@MaxHillebrand

This comment has been minimized.

Copy link

MaxHillebrand commented Jan 22, 2020

Thanks @sipa, I agree with your comments.
I have deleted my suggestions to change the tagged hashes, the others are still open. Please ACK/NACK and commit what you think is correct.

src/hash.cpp Outdated Show resolved Hide resolved
@sipa sipa force-pushed the sipa:taproot branch from 1f499c5 to 9f578bf Jan 22, 2020
src/consensus/params.h Outdated Show resolved Hide resolved
src/script/interpreter.h Outdated Show resolved Hide resolved
src/script/script.cpp Outdated Show resolved Hide resolved
src/script/script.h Outdated Show resolved Hide resolved
test/functional/p2p_segwit.py Outdated Show resolved Hide resolved
ss << epoch;

// Hash type
if ((hash_type > 3) && (hash_type < 0x81 || hash_type > 0x83)) return false;

This comment has been minimized.

Copy link
@Empact

Empact Jan 22, 2020

Member

nit: Perhaps more readable if we avoid magic numbers with something like:

static_assert(SIGHASH_TAPOUTPUTMASK < SIGHASH_TAPINPUTMASK);
if ((hash_type > SIGHASH_TAPOUTPUTMASK) && (hash_type <= SIGHASH_TAPINPUTMASK || hash_type > (SIGHASH_TAPINPUTMASK | SIGHASH_TAPOUTPUTMASK))) return false;

Alternatively, a comment could be helpful.

This comment has been minimized.

Copy link
@gmaxwell

gmaxwell Jan 23, 2020

Contributor

I find that much less readable, particularly since the symbolic values you define wouldn't be used anywhere else. More comments, however, would be fine.

This comment has been minimized.

Copy link
@sipa

sipa Jan 25, 2020

Author Member

I've rewritten this code in an overall more readable way, I think.

src/script/interpreter.cpp Outdated Show resolved Hide resolved
Copy link

MaxHillebrand left a comment

Similar to Taproot, this unifies the upper case Tapscript consistently in the comments.

src/policy/policy.cpp Outdated Show resolved Hide resolved
src/policy/policy.h Outdated Show resolved Hide resolved
src/script/interpreter.h Outdated Show resolved Hide resolved
src/script/script.h Outdated Show resolved Hide resolved
src/script/script.h Outdated Show resolved Hide resolved
test/functional/test_framework/script.py Outdated Show resolved Hide resolved
src/policy/policy.cpp Outdated Show resolved Hide resolved
@sipa sipa force-pushed the sipa:taproot branch from 7cd6e7d to b8048b8 Jan 25, 2020

from test_framework.blocktools import create_coinbase, create_block, add_witness_commitment
from test_framework.messages import CTransaction, CTxIn, CTxOut, COutPoint, CTxInWitness
from test_framework.script import CScript, TaprootSignatureHash, taproot_construct, OP_0, OP_1, OP_CHECKSIG, OP_CHECKSIGVERIFY, OP_CHECKSIGADD, OP_IF, OP_CODESEPARATOR, OP_ELSE, OP_ENDIF, OP_DROP, LEAF_VERSION_TAPSCRIPT, SIGHASH_SINGLE, is_op_success, CScriptOp, OP_RETURN, OP_VERIF, OP_1NEGATE, OP_EQUAL, OP_SWAP, OP_CHECKMULTISIG, OP_CHECKMULTISIGVERIFY, OP_NOTIF, OP_2DROP, OP_NOT, OP_2DUP, OP_1SUB, OP_DUP, MAX_SCRIPT_ELEMENT_SIZE, LOCKTIME_THRESHOLD, ANNEX_TAG

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 25, 2020

Member
Suggested change
from test_framework.script import CScript, TaprootSignatureHash, taproot_construct, OP_0, OP_1, OP_CHECKSIG, OP_CHECKSIGVERIFY, OP_CHECKSIGADD, OP_IF, OP_CODESEPARATOR, OP_ELSE, OP_ENDIF, OP_DROP, LEAF_VERSION_TAPSCRIPT, SIGHASH_SINGLE, is_op_success, CScriptOp, OP_RETURN, OP_VERIF, OP_1NEGATE, OP_EQUAL, OP_SWAP, OP_CHECKMULTISIG, OP_CHECKMULTISIGVERIFY, OP_NOTIF, OP_2DROP, OP_NOT, OP_2DUP, OP_1SUB, OP_DUP, MAX_SCRIPT_ELEMENT_SIZE, LOCKTIME_THRESHOLD, ANNEX_TAG
from test_framework.script import (
ANNEX_TAG,
CScript,
CScriptOp,
LEAF_VERSION_TAPSCRIPT,
LOCKTIME_THRESHOLD,
MAX_SCRIPT_ELEMENT_SIZE,
OP_0,
OP_1,
OP_1SUB,
OP_1NEGATE,
OP_2DROP,
OP_2DUP,
OP_CHECKMULTISIG,
OP_CHECKMULTISIGVERIFY,
OP_CHECKSIG,
OP_CHECKSIGADD,
OP_CHECKSIGVERIFY,
OP_CODESEPARATOR,
OP_DROP,
OP_DUP,
OP_ELSE,
OP_ENDIF,
OP_EQUAL,
OP_IF,
OP_NOT,
OP_NOTIF,
OP_RETURN,
OP_SWAP,
OP_VERIF,
SIGHASH_SINGLE,
TaprootSignatureHash,
is_op_success,
taproot_construct,
)

This comment has been minimized.

Copy link
@sipa

sipa Jan 25, 2020

Author Member

What a waste of lines. Is this more readable?

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 25, 2020

Member

Lines don't really cost anything. This is definitely readable, compared to the big blob it replaces.

This comment has been minimized.

Copy link
@sipa

sipa Jan 25, 2020

Author Member

They cost screen space, and this isn't something that really requires reading.

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 25, 2020

Member

It feels like people can just scroll down if they aren't interested, or not if they are. I prefer to let the reader decide whether they desire to read or not. Anyway, it's not big enough to waste time over. Do ignore.

CHashWriter ss_branch = HasherTapBranch;
auto node_begin = control.data() + TAPROOT_CONTROL_BASE_SIZE + TAPROOT_CONTROL_NODE_SIZE * i;
if (std::lexicographical_compare(k.begin(), k.end(), node_begin, node_begin + TAPROOT_CONTROL_NODE_SIZE)) {
ss_branch << k << Span<const unsigned char>(node_begin, TAPROOT_CONTROL_NODE_SIZE);

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 25, 2020

Member

Nit: indentation (here and L1781).

@bitcoin bitcoin deleted a comment from Tyrone1976 Jan 25, 2020

//! Whether m_codeseparator_pos is initialized.
bool m_codeseparator_pos_init = false;
//! Opcode position of the last executed OP_CODESEPARATOR (or -1 if none executed).

This comment has been minimized.

Copy link
@JeremyRubin

JeremyRubin Jan 25, 2020

Contributor

documentation nit: preference to not use -1 to indicate a sentinel for a unsigned type.

TAPSCRIPT = 3, //!< Witness v1 with non-P2SH 32 byte program (Taproot), script path spending, leaf version 0xc0 (Tapscript); see BIP 342
};

struct ScriptExecutionData

This comment has been minimized.

Copy link
@JeremyRubin

JeremyRubin Jan 25, 2020

Contributor

I'm generally not a fan of these Data interfaces which have all the bool x_init fields.

As an alternative, we could use an optional type wrapper.

However, optional types depend on boost until c++17, which we shouldn't introduce consensus dependencies on.

My proposal would be to copy the definition of c++17 optional into https://github.com/bitcoin/bitcoin/blob/99813a9745fe10a58bedd7a4cb721faf14f907a4/src/optional.h and drop the boost dependency, and then use this here. This can be done as a PR separate from Taproot so as not to further burden this PR.

Here, optionals express both the nothing set and not initialized state (hence dual-wrapped options) but perhaps we'd be OK in collapsing the wrapped ones to represent (not initialized or not set) OR value.

 struct ScriptExecutionData
 {
     //! The tapleaf hash.
     Option<uint256> m_tapleaf_hash;
     Option<Option<uint32_t>> m_codeseparator_pos;
     //! Hash of the annex data, if any
     Option<Option<uint256>> m_annex_hash;
     /** How much validation weight is left (decremented for every successful signature check). */
     Option<int64_t> m_validation_weight_left;
 };

This would seem, to me, to make the interface quite a bit safer and eliminates a bunch of initialization logic & prevents UB from cropping up.

It would also be, in my opinion, a larger project (perhaps worthwhile) to generally rethink this state object to be accessed through an interface that guarantees the fields are accessed correctly -- e.g., once a m_validation_weight_left is set it can only be read or decreased, and the function to decrease can set error if the condition isn't met on underflow. Maybe not worth that large of a refactor though, I think optionals would already be an improvement.

This comment has been minimized.

Copy link
@sipa

sipa Jan 25, 2020

Author Member

I agree to an extent that all the _init variables are ugly, but I don't think that they're that bad if you see them as just runtime checks for invalid code. The same can be accomplished without them using ubsan/valgrind, in many cases.

About using Option... It's arguably the right tool for the job, but there is no singular "definition of c++17 optional"; there is the libstdc++ implementation, and the libc++ one, both of which may be relying on platform specific extensions (I haven't checked). Even ignoring that, the libstdc++ one is 1500 lines of code, which I think is really overkill for what we'd get. Since the Boost one can't be used in libconsensus, I think the current code is not too bad.

This comment has been minimized.

Copy link
@StEvUgnIn

StEvUgnIn Jan 27, 2020

I examined libstdc++ implementation for RISC-V architechture and it seems portable and independent from any physical architecture
https://github.com/riscv/riscv-gcc/commits/riscv-gcc-8.3.0/libstdc%2B%2B-v3

This comment has been minimized.

Copy link
@sipa

sipa Jan 27, 2020

Author Member

@StEvUgnIn We can't just copy GPL code, unfortunately. Using libc++'s version may be an option, but it still seems overkill to me (it also contains lots of libc++ specific macros, that may require additional files to be included).

This comment has been minimized.

Copy link
@elichai

elichai Jan 30, 2020

Contributor

@JeremyRubin Don't forget that the std namespace get special treatment from the compiler.
so copying code from libstd without checking it might actually contain implementation defined behavior that won't necessarily work correctly with a different compiler(or even just by being outside of the std namespace).

*
* If the signature is not 64 bytes, or the public key is not fully valid, false is returned.
*/
bool VerifySchnorr(const uint256& hash, const std::vector<unsigned char>& vchSig) const;

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 27, 2020

Member

It would be nice if there was an IsValid() function to check if the pubkey is actually valid, as opposed to the signature being invalid.

This comment has been minimized.

Copy link
@sipa

sipa Jan 27, 2020

Author Member

There should be no need.

Public keys as specified by BIP340 are 32-byte arrays, not points or X coordinates or whatever. The signature validation algorithm takes as input a 32-byte public key, and 64-byte signature, and a 32-byte message.

This is intentionally different from how ECDSA works, where differences between encoding errors and invalid signatures are significant, and arguably the cause of all the trouble we went through with ECDSA encodings, and BIP66...

So, to be clear: there should never be a behavior difference between an invalid signature and invalidity due to invalid public keys. Both are just inputs for which the verification returns false. Having a way to distinguish the two would be a risk.

This comment has been minimized.

Copy link
@StEvUgnIn

StEvUgnIn Jan 27, 2020

It would be nice if there was an IsValid() function to check if the pubkey is actually valid, as opposed to the signature being invalid.

class XOnlyPubKey {
private:
    uint256 m_keydata;

public:
    XOnlyPubKey(const uint256& in) : m_keydata(in) {}

    /** Verify a 64-byte Schnorr signature.
     *
     * If the signature is not 64 bytes, or the public key is not fully valid, false is returned.
     */
    bool VerifySchnorr(const uint256& hash, const std::vector<unsigned char>& vchSig) const;
    bool CheckPayToContract(const XOnlyPubKey& base, const uint256& hash, bool sign) const;

    const unsigned char& operator[](int pos) const { return *(m_keydata.begin() + pos); }
    const unsigned char* data() const { return m_keydata.begin(); }
    size_t size() const { return 32; }
};

This on-line instruction is already doing the job
XOnlyPubKey(const uint256& in) : m_keydata(in) {}

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 27, 2020

Member

@sipa It is not useful to be able to distinguish between an invalid pubkey and an invalid signature? I don't understand the reasoning there. Indeed it should be helpful to be able to tell a user which part they fat fingered, if any.

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 27, 2020

Member

@StEvUgnIn No, I am talking about the call to secp256k1_xonly_pubkey_parse failing.

This comment has been minimized.

Copy link
@sipa

sipa Jan 27, 2020

Author Member

Sure, for users it may be useful to have a sanity check. But consensus code should never distinguish between invalid public key and invalid signature.

This comment has been minimized.

Copy link
@StEvUgnIn
assert(false);
}
assert(in_pos < tx_to.vin.size());
assert(cache.ready && cache.m_amounts_spent_ready);

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 28, 2020

Member

This assertion will be hit, because cache is null when using the below constructor:

GenericTransactionSignatureChecker(const T* txToIn, unsigned int nInIn, const CAmount& amountIn) : txTo(txToIn), nIn(nInIn), amount(amountIn), txdata(nullptr) {}

Maybe checking if txdata is non-null and complain if it isn't, at

bool ret = SignatureHashSchnorr(sighash, execdata, *txTo, nIn, hashtype, sigversion, *this->txdata);

This comment has been minimized.

Copy link
@sipa

sipa Jan 28, 2020

Author Member

Well yes it would assert because that would be invalid. It seems it's just doing its job?

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 28, 2020

Member

I'm not arguing against the assert, I'm saying that we should catch 'cache is null' before this point.

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 28, 2020

Member

Because it's misleading and confusing. The assert is not triggering for the right reasons (I don't even think it's triggering at all, though tbh I'm not sure how C++ deals with being passed a null value for a referenced non-reference like this).

It's checking whether the cache is ready, but in reality the cache is null. This method does not even take a ref to begin with. The one place it's called is L1638, which could just do an assert(txdata) before calling SignatureHashSchnorr.

This comment has been minimized.

Copy link
@sipa

sipa Jan 28, 2020

Author Member

I don't think that helps. If you reach this point without the cache initialized, you're using the code incorrectly.

What is perhaps not clear is that this is not just a cache - we can't recompute data in it if it missing based on other sources. We rely on it to pass in essential information about the spending that cannot be retrieved otherwise.

This comment has been minimized.

Copy link
@sipa

sipa Jan 28, 2020

Author Member

Oh, you're talking about null vs uninitialized. I'll investigate tomorrow. Dereferencing or invoking a member function of a null pointer is UB; if that can happen we should indeed catch it earlier. I can't check the code right now.

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 28, 2020

Member

Hm.. for v0 this is actually accomodated for

        const bool cacheready = cache && cache->ready;

whereas for v1 this is mandatory. I still think it's unnecessarily unhelpful, and a bit scary to have non-ref null values in code paths (even if as you say this is a hint of a bigger problem).

This comment has been minimized.

Copy link
@sipa

sipa Jan 28, 2020

Author Member

What do "non-ref null values" mean?

A reference to nullptr is illegal, always. Only pointers can be nullptr.

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 28, 2020

Member

In SignatureHashSchnorr, the last parameter, defined as const PrecomputedTransactionData& cache, is 0x0000000000000000 for the case where *this->txdata == nullptr above. This may be using the code incorrectly, but it seems worth catching before that call above.

{
m_spent_outputs = std::move(spent_outputs);

if (ready) return;

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 28, 2020

Member

If someone does txdata = PrecomputedTransactionData(tx); and then later tries to do txdata.Init(tx, spent_outputs); this line will terminate, despite m_amounts_spent_ready not being updated based on the new spent_outputs. Perhaps

if (ready && spent_outputs.empty()) return;

Edit: maybe even split into if !ready, make hashes and if !m_amounts_spend_ready, do spent stuff. (also fixed description; m_spent_outputs is updated, but the bool is not)


template <class T>
uint256 SignatureHash(const CScript& scriptCode, const T& txTo, unsigned int nIn, int nHashType, const CAmount& amount, SigVersion sigversion, const PrecomputedTransactionData* cache = nullptr);

template <class T>
bool SignatureHashTap(uint256& hash_out, const ScriptExecutionData& execdata, const T& tx_to, unsigned int in_pos, uint8_t hash_type, SigVersion sigversion, const PrecomputedTransactionData& cache);

This comment has been minimized.

Copy link
@kallewoof

kallewoof Jan 30, 2020

Member

I believe this should be called SignatureHashSchnorr and be <typename T> to match interpreter.cpp v.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.