txscript: Zero alloc optimization refactor. #1656

This function is only useful for internal consensus purposes within the script engine and as such should not be exported.

This function is only useful for internal consensus purposes within the script engine and as such should not be exported. While here, also add a comment to specify to the script version semantics.

This deprecates the GetMultisigMandN function which should never have been added since the CalcMultiSigStats function already existed for this purpose. While here, redefine the function in terms of CalcMultiSigStats.

This implements an efficient and zero-allocation script tokenizer that is exported to both provide a new capability to tokenize scripts to external consumers of the API as well as to serve as a base for refactoring the existing highly inefficient internal code. It is important to note that this tokenizer is intended to be used in consensus critical code in the future, so it must exactly follow the existing semantics. The current script parsing mechanism used throughout the txscript module is to fully tokenize the scripts into an array of internal parsed opcodes which are then examined and passed around in order to implement virtually everything related to scripts. While that approach does simplify the analysis of certain scripts and thus provide some nice properties in that regard, it is both extremely inefficient in many cases, and makes it impossible for external consumers of the API to implement any form of custom script analysis without manually implementing a bunch of error prone tokenizing code or, alternatively, the script engine exposing internal structures. For example, as shown by profiling the total memory allocations of an initial sync, the existing script parsing code allocates a total of around 295.12GB, which equates to around 50% of all allocations performed. The zero-alloc tokenizer this introduces will allow that to be reduced to virtually zero. The following is a before and after comparison of tokenizing a large script with a high opcode count using the existing code versus the tokenizer this introduces for both speed and memory allocations: benchmark old ns/op new ns/op delta ------------------------------------------------------------ BenchmarkScriptParsing 153099 961 -99.37% benchmark old allocs new allocs delta ------------------------------------------------------------ BenchmarkScriptParsing 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------ BenchmarkScriptParsing 466945 0 -100.00% The following is an overview of the changes: - Introduce new error code ErrUnsupportedScriptVersion - Implement zero-allocation script tokenizer - Add a full suite of tests to ensure the tokenizer works as intended and follows the required consensus semantics - Add an example of using the new tokenizer to count the number of opcodes in a script - Update README.md to include the new example - Update script parsing benchmark to use the new tokenizer

This converts the DisasmString function to make use of the new zero-allocation script tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. In order to facilitate this, the opcode disassembly functionality is split into a separate function called disasmOpcode that accepts the opcode struct and data independently as opposed to requiring a parsed opcode. The new function also accepts a pointer to a string builder so the disassembly can be more efficiently be built. While here, the comment is modified to explicitly call out the script version semantics. The following is a before and after comparison of a large script: benchmark old ns/op new ns/op delta ---------------------------------------------------------- BenchmarkDisasmString 288729 94157 -67.39% benchmark old bytes new bytes delta ---------------------------------------------------------- BenchmarkDisasmString 584611 177528 -69.63%

This introduces a new function named calcSignatureHashRaw which accepts the raw script bytes to calculate the script hash versus requiring the parsed opcode only to unparse them later in order to make it more flexible for working with raw scripts. Since there are several places in the rest of the code that currently only have access to the parsed opcodes, this modifies the existing calcSignatureHash to first unparse the script before calling the new function. Note that the code in the signature hash calculation to remove all instances of OP_CODESEPARATOR from the script is removed because that is a holdover from BTC code which does not apply to v0 Decred scripts since OP_CODESEPARATOR is completely disabled in Decred and thus there can never actually be one in the script. Finally, it removes the removeOpcode function and related tests since it is no longer used.

This modifies the CalcSignatureHash function to make use of the new signature hash calculation function that accepts raw scripts without needing to first parse them. Consequently, it also doubles as a slight optimization to the execution time and a significant reduction in the number of allocations. In order to convert the CalcScriptHash function and keep the same semantics, a new function named checkScriptParses is introduced which will quickly determine if a script can be fully parsed without failure and return the parse failure in the case it can't. The following is a before and after comparison of analyzing a large multiple input transaction: benchmark old ns/op new ns/op delta ------------------------------------------------------- BenchmarkCalcSigHash 2792057 2760042 -1.15% benchmark old allocs new allocs delta ------------------------------------------------------- BenchmarkCalcSigHash 1691 1068 -36.84% benchmark old bytes new bytes delta ------------------------------------------------------- BenchmarkCalcSigHash 521673 438604 -15.92%

This converts the tests for calculating signature hashes to use the exported function which handles the raw script versus the now deprecated variant requiring parsed opcodes.

This converts the isSmallInt function to accept an opcode as a byte instead of the internal opcode data struct in order to make it more flexible for raw script analysis. The comment is modified to explicitly call out the script version semantics. Finally, it updates all callers accordingly.

This converts the asSmallInt function to accept an opcode as a byte instead of the internal opcode data struct in order to make it more flexible for raw script analysis. It also updates all callers accordingly.

This converts the isStakeOpcode function to accept an opcode as a byte instead of the internal opcode data struct in order to make it more flexible for raw script analysis. It also updates all callers accordingly.

This converts the IsPayToScriptHash function to analyze the raw script instead of using the far less efficient parseScript thereby significantly optimizing the function. In order to accomplish this, it introduces two new functions. The first one is named extractScriptHash and works with the raw script bytes to simultaneously determine if the script is a p2sh script, and in the case it is, extract and return the hash. The second new function is named isScriptHashScript and is defined in terms of the former. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. Finally, this also deprecates the isScriptHash function that requires opcodes in favor of the new functions and modifies the comment on IsPayToScriptHash to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a large script that is not a p2sh script: benchmark old ns/op new ns/op delta --------------------------------------------------------------- BenchmarkIsPayToScriptHash 139961 0.66 -100.00% benchmark old allocs new allocs delta --------------------------------------------------------------- BenchmarkIsPayToScriptHash 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------- BenchmarkIsPayToScriptHash 466944 0 -100.00%

This converts the IsMultisigScript function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. In order to accomplish this, it introduces two new functions. The first one is named extractMultisigScriptDetails and works with the raw script bytes to simultaneously determine if the script is a multisignature script, and in the case it is, extract and return the relevant details. The second new function is named isMultisigScript and is defined in terms of the former. The extract function accepts the script version, raw script bytes, and a flag to determine whether or not the public keys should also be extracted. The flag is provided because extracting pubkeys results in an allocation that the caller might wish to avoid. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. It is important to note that this new implementation intentionally has a semantic difference from the existing implementation in that it will now correctly identify a multisig script with zero pubkeys whereas previously it incorrectly required at least one pubkey. This change is acceptable because the function only deals with standardness rather than consensus rules. Finally, this also deprecates the isMultiSig function that requires opcodes in favor of the new functions and deprecates the error return on the export IsMultisigScript function since it really does not make sense given the purpose of the function. The following is a before and after comparison of analyzing both a large script that is not a multisig script and a 1-of-2 multisig public key script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------- BenchmarkIsMultisigScriptLarge 121599 8.63 -99.99% BenchmarkIsMultisigScript 797 72.8 -90.87% benchmark old allocs new allocs delta ------------------------------------------------------------------- BenchmarkIsMultisigScriptLarge 1 0 -100.00% BenchmarkIsMultisigScript 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------- BenchmarkIsMultisigScriptLarge 466944 0 -100.00% BenchmarkIsMultisigScript 2304 0 -100.00%

This converts the IsMultisigSigScript function to analyze the raw script and make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. In order to accomplish this, it first rejects scripts that can't possibly fit the bill due to the final byte of what would be the redeem script not being the appropriate opcode or the overall script not having enough bytes. Then, it uses a new function that is introduced named finalOpcodeData that uses the tokenizer to return any data associated with the final opcode in the signature script (which will be nil for non-push opcodes or if the script fails to parse) and analyzes it as if it were a redeem script when it is non nil. It is also worth noting that this new implementation intentionally has the same semantic difference from the existing implementation as the updated IsMultisigScript function in regards to allowing zero pubkeys whereas previously it incorrectly required at least one pubkey. Finally, the comment is modified to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a large script that is not a multisig script and both a 1-of-2 multisig public key script (which should be false) and a signature script comprised of a pay-to-script-hash 1-of-2 multisig redeem script (which should be true): benchmark old ns/op new ns/op delta ----------------------------------------------------------------------- BenchmarkIsMultisigSigScriptLarge 158149 4 -100.00% BenchmarkIsMultisigSigScript 3445 202 -94.14% benchmark old allocs new allocs delta ----------------------------------------------------------------------- BenchmarkIsMultisigSigScriptLarge 9 0 -100.00% BenchmarkIsMultisigSigScript 3 0 -100.00% benchmark old bytes new bytes delta ----------------------------------------------------------------------- BenchmarkIsMultisigSigScriptLarge 533189 0 -100.00% BenchmarkIsMultisigSigScript 9472 0 -100.00%

This converts the GetSigOpCount function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. A new function named countSigOpsV0 which accepts the raw script is introduced to perform the bulk of the work so it can be reused for precise signature operation counting as well in a later commit. It retains the same semantics in terms of counting the number of signature operations either up to the first parse error or the end of the script in the case it parses successfully as required by consensus. Finally, this also deprecates the getSigOpCount function that requires opcodes in favor of the new function and modifies the comment on GetSigOpCount to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ----------------------------------------------------------- BenchmarkGetSigOpCount 163896 1048 -99.36% benchmark old allocs new allocs delta ----------------------------------------------------------- BenchmarkGetSigOpCount 1 0 -100.00% benchmark old bytes new bytes delta ----------------------------------------------------------- BenchmarkGetSigOpCount 466945 0 -100.00%

This adds tests to ensure the isAnyKindOfScriptHash function properly identifies the four stake-tagged pay-to-script-hash possibilities in addition to ensuring they are not misidentified as standard pay-to-script-hash scripts.

This converts the isAnyKindOfScriptHash function to analyze the raw script instead of requiring far less efficient parsed opcodes thereby significantly optimizing the function. Since the function relies on isStakeScriptHash to identify a stake tagged pay-to-script-hash, and is the only consumer of it, this also converts that function to analyze the raw script and renames it to isStakeScriptHashScript for more consistent naming. Finally, the tests are updated accordingly. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------- BenchmarkIsAnyKindOfScriptHash 101249 3.83 -100.00% benchmark old allocs new allocs delta ------------------------------------------------------------------- BenchmarkIsAnyKindOfScriptHash 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------- BenchmarkIsAnyKindOfScriptHash 466944 0 -100.00%

This converts the IsPushOnlyScript function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. It also deprecates the isPushOnly function that requires opcodes in favor of the new function and modifies the comment on IsPushOnlyScript to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta --------------------------------------------------------------- BenchmarkIsPayToScriptHash 139961 0.66 -100.00% benchmark old allocs new allocs delta --------------------------------------------------------------- BenchmarkIsPayToScriptHash 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------- BenchmarkIsPayToScriptHash 466944 0 -100.00%

This modifies the check for whether or not a pay-to-script-hash signature script is a push only script to make use of the new and more efficient raw script function. Also, since the script will have already been checked further above when the ScriptVerifySigPushOnly flags is set, avoid checking it again in that case.

This moves the check for non push-only pay-to-script-hash signature scripts before the script parsing logic when creating a new engine instance to avoid the extra overhead in the error case.

This converts the GetPreciseSigOpCount function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. In particular it uses the recently converted isScriptHashScript, IsPushOnlyScript, and countSigOpsV0 functions along with the recently added finalOpcodeData functions. It also modifies the comment to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------ BenchmarkGetPreciseSigOpCount 287939 1077 -99.63% benchmark old allocs new allocs delta ------------------------------------------------------------------ BenchmarkGetPreciseSigOpCount 3 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------ BenchmarkGetPreciseSigOpCount 934657 0 -100.00%

This converts the typeOfScript function to accept a script version and raw script instead of an array of internal parsed opcodes in order to make it more flexible for raw script analysis. Also, this adds a comment to CalcScriptInfo to call out the specific version semantics and deprecates the function since nothing currently uses it, and the relevant information can now be obtained by callers more directly through the use of the new script tokenizer. All other callers are updated accordingly.

This begins the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes with the intent of significantly optimizing the function. In order to ease the review process, each script type will be converted in a separate commit and the typeOfScript function will be updated such that the script is only parsed as a fallback for the cases that are not already converted to more efficient raw script variants. In particular, for this commit, since the ability to detect pay-to-script-hash via raw script analysis is now available, the function is simply updated to make use of it.

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, for this commit, since the ability to detect multisig scripts via the new tokenizer is now available, the function is simply updated to make use of it.

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of pay-to-pubkey scripts to use raw script analysis. In order to accomplish this, it introduces four new functions: extractCompressedPubKey, extractUncompressedPubKey, extractPubKey, and isPubKeyScript. The extractPubKey function makes use of extractCompressedPubKey and extractUncompressedPubKey to combine their functionality as a convenience and isPubKeyScript is defined in terms of extractPubKey. The extractCompressedPubKey works with the raw script bytes to simultaneously determine if the script is a pay-to-compressed-pubkey script, and in the case it is, extract and return the raw compressed pubkey bytes. Similarly, the extractUncompressedPubKey works in the same way except it determines if the script is a pay-to-uncompressed-pubkey script and returns the raw uncompressed pubkey bytes in the case it is. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ------------------------------------------------------------ BenchmarkIsPubKeyScript 124749 4.01 -100.00% benchmark old allocs new allocs delta ------------------------------------------------------------ BenchmarkIsPubKeyScript 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------ BenchmarkIsPubKeyScript 466944 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of pay-to-alt-pubkey scripts to use raw script analysis. In order to accomplish this, it introduces two new functions. The first one is named extractPubKeyAltDetails and works with the raw script bytes to simultaneously determine if the script is a pay-to-alt-pubkey script, and in the case it is, extract and return the relevant details. The second new function is named isPubKeyAltScript and is defined in terms of the former. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. It is important to note that this new implementation intentionally tightens the following semantics as compared to the existing implementation: - The signature type must now be one of the two supported types versus allowing any single byte data push - The public key must now be of the correct length for the given signature type versus allowing any size up to 512 bytes - The public key for schnorr secp256k1 pubkeys must now be a compressed public key and adhere to the strict encoding requirements for them The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta --------------------------------------------------------------- BenchmarkIsAltPubKeyScript 143449 2.99 -100.00% benchmark old allocs new allocs delta --------------------------------------------------------------- BenchmarkIsAltPubKeyScript 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------- BenchmarkIsAltPubKeyScript 466944 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of pay-to-pubkey-hash scripts to use raw script analysis. In order to accomplish this, it introduces two new functions. The first one is named extractPubKeyHash and works with the raw script bytes to simultaneously determine if the script is a pay-to-pubkey-hash script, and in the case it is, extract and return the hash. The second new function is named isPubKeyHashScript and is defined in terms of the former. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ---------------------------------------------------------------- BenchmarkIsPubKeyHashScript 165903 0.64 -100.00% benchmark old allocs new allocs delta ---------------------------------------------------------------- BenchmarkIsPubKeyHashScript 1 0 -100.00% benchmark old bytes new bytes delta ---------------------------------------------------------------- BenchmarkIsPubKeyHashScript 466945 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of pay-to-alt-pubkey-hash scripts to use raw script analysis. In order to accomplish this, it introduces two new functions. The first one is named extractPubKeyHashAltDetails and works with the raw script bytes to simultaneously determine if the script is a pay-to-alt-pubkey-hash script, and in the case it is, extract and return the hash and signature type. The second new function is named isPubKeyHashAltScript and is defined in terms of the former. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. It is important to note that this new implementation intentionally has a semantic difference from the existing implementation in that it will now only pass when one of two signature types currently supported by consensus are specified whereas previously it would allow any single byte data push. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------- BenchmarkIsAltPubKeyHashScript 107100 2.63 -100.00% benchmark old allocs new allocs delta ------------------------------------------------------------------- BenchmarkIsAltPubKeyHashScript 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------- BenchmarkIsAltPubKeyHashScript 466944 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of nulldata scripts to use both raw script analysis and the new tokenizer. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta -------------------------------------------------------------- BenchmarkIsNullDataScript 120800 3.81 -100.00% benchmark old allocs new allocs delta -------------------------------------------------------------- BenchmarkIsNullDataScript 1 0 -100.00% benchmark old bytes new bytes delta -------------------------------------------------------------- BenchmarkIsNullDataScript 466944 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of stake submission scripts to use raw script analysis. In order to accomplish this, it introduces three new functions. The first one is named extractStakePubKeyHash and works with the raw script bytes to simultaneously determine if the script is a stake-tagged pay-to-pubkey-hash script tagged with a specified stake opcode, and in the case it is, extract and return the hash. The second new function, named extractStakeScriptHash, is similar except it detect a stake-tagged pay-to-script-hash script tagged with a specified stake opcode. Finally, the third function is named isStakeSubmissionScript and is defined in terms of the former two functions. The extract function approach was chosen because it is common for callers to want to only extract relevant details from a script if the script is of the specific type. Extracting those details requires performing the exact same checks to ensure the script is of the correct type, so it is more efficient to combine the two into one and define the type determination in terms of the result so long as the extraction does not require allocations. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta --------------------------------------------------------------------- BenchmarkIsStakeSubmissionScript 140308 4.20 -100.00% benchmark old allocs new allocs delta --------------------------------------------------------------------- BenchmarkIsStakeSubmissionScript 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------------- BenchmarkIsStakeSubmissionScript 466944 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of stake generation scripts to use raw script analysis. In order to accomplish this, it introduces a new function named isStakeGenScript which makes of the recently added extractStakePubKeyHash and extractStakeScriptHash functions. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta --------------------------------------------------------------------- BenchmarkIsStakeGenerationScript 121043 4.26 -100.00% benchmark old allocs new allocs delta --------------------------------------------------------------------- BenchmarkIsStakeGenerationScript 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------------- BenchmarkIsStakeGenerationScript 466944 0 -100.00%

This continues the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of stake revocation scripts to use raw script analysis. In order to accomplish this, it introduces a new function named isStakeGenScript which makes of the recently added extractStakePubKeyHash and extractStakeScriptHash functions. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta --------------------------------------------------------------------- BenchmarkIsStakeRevocationScript 117699 4.58 -100.00% benchmark old allocs new allocs delta --------------------------------------------------------------------- BenchmarkIsStakeRevocationScript 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------------- BenchmarkIsStakeRevocationScript 466944 0 -100.00%

This completes the process of converting the typeOfScript function to use a combination of raw script analysis and the new tokenizer instead of the far less efficient parsed opcodes. In particular, it converts the detection of stake change scripts to use raw script analysis by introducing a new function named isStakeChangeScript which makes use of the recently added extractStakePubKeyHash and extractStakeScriptHash functions and removes the script parsing fallback from the typeOfScript function since this is the final case. The following is a before and after comparison of analyzing a large script for both the stake change script change and the overall GetScriptClass function which relies on the now fully converted typeOfScript function: benchmark old ns/op new ns/op delta ----------------------------------------------------------------- BenchmarkIsStakeChangeScript 133810 4.39 -100.00% BenchmarkGetScriptClass 145001 62.9 -99.96% benchmark old allocs new allocs delta ----------------------------------------------------------------- BenchmarkIsStakeChangeScript 1 0 -100.00% BenchmarkGetScriptClass 1 0 -100.00% benchmark old bytes new bytes delta ----------------------------------------------------------------- BenchmarkIsStakeChangeScript 466944 0 -100.00% BenchmarkGetScriptClass 466944 0 -100.00%

This converts the ContainsStakeOpCodes function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------ BenchmarkContainsStakeOpCodes 134599 968 -99.28% benchmark old allocs new allocs delta ------------------------------------------------------------------ BenchmarkContainsStakeOpCodes 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------ BenchmarkContainsStakeOpCodes 466944 0 -100.00%

This converts the ExtractCoinbaseNullData function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. The following is a before and after comparison of analyzing a typical coinbase script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------- BenchmarkExactCoinbaseNullData 227 31.0 -86.34% benchmark old allocs new allocs delta ------------------------------------------------------------------- BenchmarkExactCoinbaseNullData 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------- BenchmarkExactCoinbaseNullData 448 0 -100.00%

This converts CalcScriptInfo and dependent expectedInputs to make use of the new script tokenizer as well as several of the other recently added raw script analysis functions in order to remove the reliance on parsed opcodes as a step towards utlimately removing them altogether. It is worth noting that this has the side effect of significantly optimizing the function as well, however, since it is deprecated, no benchmarks are provided.

This converts the CalcMultiSigStats function to make use of the new extractMultisigScriptDetails function instead of the far less efficient parseScript thereby significantly optimizing the function. The tests are also updated accordingly. The following is a before and after comparison of analyzing a standard multisig script: benchmark old ns/op new ns/op delta --------------------------------------------------------------- BenchmarkCalcMultiSigStats 972 79.5 -91.82% benchmark old allocs new allocs delta --------------------------------------------------------------- BenchmarkCalcMultiSigStats 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------- BenchmarkCalcMultiSigStats 2304 0 -100.00%

This converts the MultisigRedeemScriptFromScriptSig function to make use of the new finalOpcodeData function instead of the far less efficient parseScript thereby significantly optimizing the function. It also deprecates the error return since it really does not make sense given the preconditions of the function. Finally, the comment is modified to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a very large script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------ BenchmarkMultisigRedeemScript 153623 1830 -98.81% benchmark old allocs new allocs delta ------------------------------------------------------------------ BenchmarkMultisigRedeemScript 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------ BenchmarkMultisigRedeemScript 466944 0 -100.00%

This converts GetScriptHashFromP2SHScript to make use of the new script tokenizer in order to remove the reliance on parsed opcodes as a step towards utlimately removing them altogether. It also deprecates the function since the current semantics are not really ideal in that they simply return the data push just after the first HASH160 opcode which is only valid in the case the script is already known to be of the correct form and the task can be done more efficiently via raw script analysis such as how it is done in the recently added extractScriptHash function. Finally, it modifies the comment to explicitly call out the script version semantics as well as the aforemention precondition. It is worth noting that this has the side effect of significantly optimizing the function as well, however, since it is deprecated, no benchmarks are provided.

This converts the PUshedData function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. Also, the comment is modified to explicitly call out the script version semantics. The following is a before and after comparison of extracting the data from a very large script: benchmark old ns/op new ns/op delta ------------------------------------------------------- BenchmarkPushedData 132400 1619 -98.78% benchmark old allocs new allocs delta ------------------------------------------------------- BenchmarkPushedData 5 4 -20.00% benchmark old bytes new bytes delta ------------------------------------------------------- BenchmarkPushedData 467320 368 -99.92%

This converts the IsUnspendable function to make use of a combination of raw script analysis and the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. It is important to note that this new implementation intentionally has a semantic difference from the existing implementation in that it will now report scripts that are larger than the max allowed script size are unspendable as well. Finally, the comment is modified to explicitly call out the script version semantics. The following is a before and after comparison of analyzing a large script: benchmark old ns/op new ns/op delta ----------------------------------------------------------- BenchmarkIsUnspendable 149899 860 -99.43% benchmark old allocs new allocs delta ----------------------------------------------------------- BenchmarkIsUnspendable 1 0 -100.00% benchmark old bytes new bytes delta ----------------------------------------------------------- BenchmarkIsUnspendable 466945 0 -100.00%

This renames the canonicalPush function to isCanonicalPush and converts it to accept an opcode as a byte and the associate data as a byte slice instead of the internal parse opcode data struct in order to make it more flexible for raw script analysis. It also updates all callers and tests accordingly.

This adds a fairly comprehensive set of tests to ensure the standard atomic swap script detection and extraction function works as intended.

This converts the ExtractAtomicSwapDataPushes function to make use of the new tokenizer instead of the far less efficient parseScript thereby significantly optimizing the function. The new implementation is designed such that it should be fairly easy to move the function into the atomic swap tools where it more naturally belongs now that the tokenizer makes it possible to analyze scripts outside of the txscript module. Consequently, this also deprecates the function. The following is a before and after comparison of attempting to extract from both a typical atomic swap script and a very large non-atomic swap script: benchmark old ns/op new ns/op delta ------------------------------------------------------------------------------ BenchmarkExtractAtomicSwapDataPushes 1330 410 -69.17% BenchmarkExtractAtomicSwapDataPushesLarge 136819 69.3 -99.95% benchmark old allocs new allocs delta ------------------------------------------------------------------------------ BenchmarkExtractAtomicSwapDataPushes 2 1 -50.00% BenchmarkExtractAtomicSwapDataPushesLarge 1 0 -100.00% benchmark old bytes new bytes delta ------------------------------------------------------------------------------ BenchmarkExtractAtomicSwapDataPushes 3168 96 -96.97% BenchmarkExtractAtomicSwapDataPushesLarge 466944 0 -100.00%

This begins the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In order to ease the review process, the detection of each script type will be converted in a separate commit such that the script is only parsed as a fallback for the cases that are not already converted to more efficient variants. In particular, this converts the detection for pay-to-script-hash scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for pay-to-pubkey-hash scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for pay-to-alt-pubkey-hash scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for pay-to-pubkey scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for pay-to-alt-pubkey scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for multisig scripts. Also, since the remaining slow path cases are all recursive calls, the parsed opcodes are no longer used, so parsing is removed.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for stake-submission-tagged pay-to-pubkey-hash and pay-to-script-hash scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for stake-generation-tagged pay-to-pubkey-hash and pay-to-script-hash scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for stake-revocation-tagged pay-to-pubkey-hash and pay-to-script-hash scripts.

This continues the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for stake-change-tagged pay-to-pubkey-hash and pay-to-script-hash scripts.

This completes the process of converting the ExtractPkScriptAddrs function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. In particular, this converts the detection for nulldata scripts, removes the slow path fallback code since it is the final case, and modifies the comment to call out the script version semantics. The following is a before and after comparison of analyzing both a typical standard script and a very large non-standard script: benchmark old ns/op new ns/op delta ----------------------------------------------------------------------- BenchmarkExtractPkScriptAddrsLarge 132400 44.4 -99.97% BenchmarkExtractPkScriptAddrs 1265 231 -81.74% benchmark old allocs new allocs delta ----------------------------------------------------------------------- BenchmarkExtractPkScriptAddrsLarge 1 0 -100.00% BenchmarkExtractPkScriptAddrs 5 2 -60.00% benchmark old bytes new bytes delta ----------------------------------------------------------------------- BenchmarkExtractPkScriptAddrsLarge 466944 0 -100.00% BenchmarkExtractPkScriptAddrs 1600 48 -97.00%

This converts the ExtractPkScriptAltSigType function to use the optimized extraction functions recently introduced as part of the typeOfScript conversion. It is important to note that this new implementation intentionally has the same semantic differences from the existing implementation as discussed in the relevant commits that introduced the extraction functions. The following is a before and after comparison of analyzing a typical script: benchmark old ns/op new ns/op delta --------------------------------------------------------------- BenchmarkExtractAltSigType 497 12.8 -97.42% benchmark old allocs new allocs delta --------------------------------------------------------------- BenchmarkExtractAltSigType 1 0 -100.00% benchmark old bytes new bytes delta --------------------------------------------------------------- BenchmarkExtractAltSigType 896 0 -100.00%

This moves the function definition for mergeMultiSig so it is more consistent with the preferred order used through the codebase. In particular, the functions are defined before they're first used and generally as close as possible to the first use when they're defined in the same file.

This converts RawTxInSignature to make use of the recently converted CalcSignatureHash function that works with raw scripts in order to remove the reliance on parsed opcodes as a step towards utlimately removing them altogether and updates the comment to explicitly call out the script version semantics. It is worth noting that this has the side effect of optimizing the function as well, however, since this change is not focused on the optimization aspects, no benchmarks are provided.

This converts RawTxInSignatureAlt to make use of the recently converted CalcSignatureHash function that works with raw scripts in order to remove the reliance on parsed opcodes as a step towards utlimately removing them altogether and updates the comment to explicitly call out the script version semantics. It is worth noting that this has the side effect of optimizing the function as well, however, since this change is not focused on the optimization aspects, no benchmarks are provided.

This converts SignTxOutput and supporting funcs, namely sign, mergeScripts and mergeMultiSig, to make use of the new tokenizer as well as some recently added funcs that deal with raw scripts in order to remove the reliance on parsed opcodes as a step towards utlimately removing them altogether and updates the comments to explicitly call out the script version semantics. It is worth noting that this has the side effect of optimizing the function as well, however, since this change is not focused on the optimization aspects, no benchmarks are provided.

This introduces a new function named removeOpcodeByDataRaw which accepts the raw scripts and data to remove versus requiring the parsed opcodes to both significantly optimize it as well as make it more flexible for working with raw scripts. There are several places in the rest of the code that currently only have access to the parsed opcodes, so this only introduces the function for use in the future and deprecates the existing one. Note that, in practice, the script will never actually contain the data that is intended to be removed since the function is only used during signature verification to remove the signature itself which would require some incredibly non-standard code to create. Thus, as an optimization, it avoids allocating a new script unless there is actually a match that needs to be removed. Finally, it updates the tests to use the new function.

This converts the isDisabled function defined on a parsed opcode to a standalone function which accepts an opcode as a byte instead in order to make it more flexible for raw script analysis. It also updates all callers accordingly.

This converts the alwaysIllegal function defined on a parsed opcode to a standalone function named isOpcodeAlwaysIllegal which accepts an opcode as a byte instead in order to make it more flexible for raw script analysis. It also updates all callers accordingly.

This converts the isConditional function defined on a parsed opcode to a standalone function named isOpcodeConditional which accepts an opcode as a byte instead in order to make it more flexible for raw script analysis. It also updates all callers accordingly.

This converts the checkMinimalDataPush function defined on a parsed opcode to a standalone function which accepts an opcode and data slice instead in order to make it more flexible for raw script analysis. It also updates all callers accordingly.

This converts the engine's current program counter disasembly to make use of the standalone disassembly function to remove the dependency on the parsed opcode struct. It also updates the tests accordingly.

This refactors the script engine to store and step through raw scripts by making using of the new zero-allocation script tokenizer as opposed to the less efficient method of storing and stepping through parsed opcodes. It also improves several aspects while refactoring such as optimizing the disassembly trace, showing all scripts in the trace in the case of execution failure, and providing additional comments describing the purpose of each field in the engine. It should be noted that this is a step towards removing the parsed opcode struct and associated supporting code altogether, however, in order to ease the review process, this retains the struct and all function signatures for opcode execution which make use of an individual parsed opcode. Those will be updated in future commits. The following is an overview of the changes: - Modify internal engine scripts slice to use raw scripts instead of parsed opcodes - Introduce a tokenizer to the engine to track the current script - Remove no longer needed script offset parameter from the engine since that is tracked by the tokenizer - Add an opcode index counter for disassembly purposes to the engine - Update check for valid program counter to only consider the script index - Update tests for bad program counter accordingly - Rework the NewEngine function - Store the raw scripts - Setup the initial tokenizer - Explicitly check against version 0 instead of DefaultScriptVersion which would break consensus if changed - Check the scripts parse according to version 0 semantics to retain current consensus rules - Improve comments throughout - Rework the Step function - Use the tokenizer and raw scripts - Create a parsed opcode on the fly for now to retain existing opcode execution function signatures - Improve comments throughout - Update the Execute function - Explicitly check against version 0 instead of DefaultScriptVersion which would break consensus if changed - Improve the disassembly tracing in the case of error - Update the CheckErrorCondition function - Modify clean stack error message to make sense in all cases - Improve the comments - Update the DisasmPC and DisasmScript functions on the engine - Use the tokenizer - Optimize construction via the use of strings.Builder - Modify the subScript function to return the raw script bytes since the parsed opcodes are no longer stored - Update the various signature checking opcodes to use the raw opcode data removal and signature hash calculation functions since the subscript is now a raw script - opcodeCheckSig - opcodeCheckMultiSig - opcodeCheckSigAlt

This renames the removeOpcodeByDataRaw to removeOpcodeByData now that the old version has been removed.

This renames the calcSignatureHashRaw to calcSignatureHash now that the old version has been removed.

Also remove tests associated with unparsing opcodes accordingly.

Also remove tests associated with the func accordingly.

This converts the executeOpcode function defined on the engine to accept an opcode and data slice instead of a parsed opcode as a step towards removing the parsed opcode struct and associated supporting code altogether. It also updates all callers accordingly.

This converts the callback function defined on the internal opcode struct to accept the opcode and data slice instead of a parsed opcode as the final step towards removing the parsed opcode struct and associated supporting code altogether. It also updates all of the callbacks and tests accordingly and finally removes the now unused parsedOpcode struct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

txscript: Zero alloc optimization refactor. #1656

txscript: Zero alloc optimization refactor. #1656

Commits on Mar 26, 2019