New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation #16902
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
It might be easier to review if the operations on class FalseCounter {
private:
std::vector<bool> flags;
public:
bool all_true() { return !count(flags.begin(), flags.end(), false); }
void push_back(bool f) { flags.push_back(f); }
void pop_back() { flags.pop_back(); }
bool empty() { return flags.empty(); }
void toggle_top() { flags.back() = !flags.back(); }
}; to class FalseCounter {
private:
int depth = 0;
int first_false = 0;
public:
bool all_true() { return first_false == 0; }
void push_back(bool f) {
++depth;
if (first_false == 0 && !f) first_false = depth;
}
void pop_back() {
if (first_false == depth) first_false = 0;
--depth;
}
bool empty() { return depth == 0; }
void toggle_top() {
if (first_false == 0) {
first_false = depth;
} else if (first_false == depth) {
first_false = 0;
} else {
// no action needed as change is unobservable
}
}
}; without changing the usage of the class. I think it'd be plausible to do a coq proof that those implementations are logically equivalent / implement the same API (at least given cf https://github.com/ajtowns/bitcoin/commits/201909-vfexec-optimise |
@ajtowns If there's interest in including this patch at some point we should pick up that approach. |
Somewhat related issue: sipa/miniscript#7 ("Policies with too high nesting depth are not rejected: It is possible to create small inputs that cause extreme memory usage (in practice: OOM kill or std::bad_alloc)") Context: Nested |
I've modified the code to follow @ajtowns's approach more closely, though with fewer commits, and extra comments. |
132225a
to
d4721cd
Compare
3d09358
to
59c0b4e
Compare
Includes comments added by Pieter Wuille.
This optimization was first suggested by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/. The implementation follows the suggested approach there, but with a slightly simpler representation.
…n#16902 because Ben Carman pointed it out
* Redid conditional interpreting without binary trees * Fixed ControlOperationsInterpreterTest * Implemented O(1) conditional handling as proposed here bitcoin/bitcoin#16902 because Ben Carman pointed it out * Added some docs * Responded to code review
public: | ||
bool empty() { return m_stack_size == 0; } | ||
bool all_true() { return m_first_false_pos == NO_FALSE; } | ||
void push_back(bool f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an assert(m_stack_size != NO_FALSE - 1)
is needed.
Also, does not seems caller are checking this. Unsure if this can realistically taken advantage of, but I think it does not hurt the scriptevaluator to check for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd need a script with 4 billion OP_IF's to hit that (so a tx with 4GB of data or over 1G weight), so taking advantage of it shouldn't be possible... Adding the assert seems plausible though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular Script length is limited to MAX_SIZE (2^25) bytes.
Code review ACK e6e622e I've run the bench and this change cuts time by ~75% before:
after:
I think the asserts in the final implementations of |
ACK e6e622e code review, build, benches, fuzzing
bench before
(HEAD detached at 89fb241c54)$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
WARNING: This is a debug build - may result in slower benchmarks.
# Benchmark, evals, iterations, total, min, max, median
VerifyNestedIfScript, 50, 100, 24.0408, 0.00434014, 0.00559945, 0.00475711
VerifyNestedIfScript, 50, 100, 23.5444, 0.00428717, 0.00569794, 0.00467757
VerifyNestedIfScript, 50, 100, 24.6077, 0.00431298, 0.00633760, 0.00485635
bench after
(pr/16902)$ src/bench/bench_bitcoin -filter=VerifyNestedIfScript -evals=50
WARNING: This is a debug build - may result in slower benchmarks.
# Benchmark, evals, iterations, total, min, max, median
VerifyNestedIfScript, 50, 100, 5.49356, 0.000954118, 0.00140947, 0.00106701
VerifyNestedIfScript, 50, 100, 6.64376, 0.000977855, 0.00202046, 0.00132529
VerifyNestedIfScript, 50, 100, 5.88235, 0.000968067, 0.00177160, 0.00110281
fuzzing results (#18127)
(pr/18127)$ src/test/fuzz/script_condition_stack
INFO: Seed: 3449044402
INFO: Loaded 1 modules (510932 inline 8-bit counters): 510932 [0x5642d9dc1678, 0x5642d9e3e24c),
INFO: Loaded 1 PC tables (510932 PCs): 510932 [0x5642d9e3e250,0x5642da609f90),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 177 ft: 178 corp: 1/1b exec/s: 0 rss: 117Mb
#3 NEW cov: 179 ft: 242 corp: 2/59b exec/s: 0 rss: 117Mb L: 58/58 MS: 1 InsertRepeatedBytes-
#4 NEW cov: 179 ft: 290 corp: 3/90b exec/s: 0 rss: 117Mb L: 31/58 MS: 1 EraseBytes-
.../...
#5136 REDUCE cov: 438 ft: 2238 corp: 116/218Kb exec/s: 7 rss: 152Mb L: 269/4096 MS: 1 EraseBytes-
I'll leave the fuzzer running for a while longer.
if (m_first_false_pos == NO_FALSE) { | ||
// The current stack is all true values; the first false will be the top. | ||
m_first_false_pos = m_stack_size - 1; | ||
} else if (m_first_false_pos == m_stack_size - 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pico-nit: perhaps execute m_stack_size - 1
between the assert line 330 and the start of the conditional line 331 and use the result in lines 333 and 334 in the conditional
I like the changes in @jnewbery's branch mentioned in #16902 (comment)) esp. this code comment. |
ACK e6e622e Reviewed code, compiled, ran tests and benchmarks I received better results from the benchmarks after the change, although improvements are not nearly as significant as the others reported. Not sure why that is, I ran them multiple times with different configs, without Before (HEAD detached at 89fb241):
After:
|
Can anyone explain me why everyone is benchmarking with optimizations disabled? I hope all nodes on mainnet are running -O2, at least for applications where performance matters. |
I did both (with optimizations first). Then I tested it without because my results were different than the others here and ended up posting those because of the differences to make them more comparable. Here are some results with optimizations: Before:
After:
|
Fair enough. Compiled with before
after
|
ACK, I was wondering about the assert I suggested on #16902 (comment) I don't really know if it is good idea or not. While theorically possible, even if it was practically possible, I guess it would be more desirable to get a |
ACK e6e622e I don't understand fjahr's results and investigating further didn't reveal anything; seems like any quadratic behaviour is getting completely lost in the noise there somehow. Might be worth adding additional benchmarks once tapscript removes the opcode limit to report the performance of very large scripts; say 10k nested IFs and 20k OP_NOPs vs 40k OP_NOPs etc? |
concept and code review ACK e6e622e |
e6e622e Implement O(1) OP_IF/NOTIF/ELSE/ENDIF logic (Pieter Wuille) d0e8f4d [refactor] interpreter: define interface for vfExec (Anthony Towns) 89fb241 Benchmark script verification with 100 nested IFs (Pieter Wuille) Pull request description: While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there. This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes. This PR mostly serves as a proof of concept that it's possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable. This improves upon bitcoin#14245 by completely removing the `vfExec` vector. ACKs for top commit: jnewbery: Code review ACK e6e622e MarcoFalke: ACK e6e622e🐴 fjahr: ACK e6e622e ajtowns: ACK e6e622e laanwj: concept and code review ACK e6e622e jonatack: ACK e6e622e code review, build, benches, fuzzing Tree-SHA512: 1dcfac3411ff04773de461959298a177f951cb5f706caa2734073bcec62224d7cd103767cfeef85cd129813e70c14c74fa8f1e38e4da70ec38a0f615aab1f7f7
e6e622e Implement O(1) OP_IF/NOTIF/ELSE/ENDIF logic (Pieter Wuille) d0e8f4d [refactor] interpreter: define interface for vfExec (Anthony Towns) 89fb241 Benchmark script verification with 100 nested IFs (Pieter Wuille) Pull request description: While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there. This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes. This PR mostly serves as a proof of concept that it's possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable. This improves upon bitcoin#14245 by completely removing the `vfExec` vector. ACKs for top commit: jnewbery: Code review ACK e6e622e MarcoFalke: ACK e6e622e🐴 fjahr: ACK e6e622e ajtowns: ACK e6e622e laanwj: concept and code review ACK e6e622e jonatack: ACK e6e622e code review, build, benches, fuzzing Tree-SHA512: 1dcfac3411ff04773de461959298a177f951cb5f706caa2734073bcec62224d7cd103767cfeef85cd129813e70c14c74fa8f1e38e4da70ec38a0f615aab1f7f7
Summary: This is a backport of Core [[bitcoin/bitcoin#16902 | PR16902]] [1/3] bitcoin/bitcoin@89fb241 `src/bench/verify_script.cpp` was removed from BitcoinABC in https://reviews.bitcoinabc.org/rSTAGING1ff7b561eb7c6f30f1ec788e9503c30bc6485752 Test Plan: ninja all && src/bench/bitcoin-bench -filter=VerifyNestedIfScript Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8835
Summary: Includes comments added by Pieter Wuille. This is a backport of Core [[bitcoin/bitcoin#16902 | PR16902]] [2/3] bitcoin/bitcoin@d0e8f4d Depends on D8835 Test Plan: ``` ninja all check-all src/bench/bitcoin-bench -filter=VerifyNestedIfScript ``` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D8836
Summary: Commit message: > This optimization was first suggested by Sergio Demian Lerner in > https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/. > The implementation follows the suggested approach there, but with a slightly > simpler representation. PR description: > While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there. > > This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes. > > This PR mostly serves as a proof of concept that it's possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable. This is a backport of Core [[bitcoin/bitcoin#16902 | PR16902]] [3/3] bitcoin/bitcoin@e6e622e Depends on D8836 Test Plan: `ninja all check-all` Before this commit: ``` $ src/bench/bitcoin-bench -filter=VerifyNestedIfScript # Benchmark, evals, iterations, total, min, max, median VerifyNestedIfScript, 5, 100, 0.0473822, 9.28665e-05, 9.67629e-05, 9.47128e-05 ``` After: ``` $ src/bench/bitcoin-bench -filter=VerifyNestedIfScript # Benchmark, evals, iterations, total, min, max, median VerifyNestedIfScript, 5, 100, 0.0138046, 2.68691e-05, 3.00776e-05, 2.70254e-05 ``` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8837
Summary: This is a backport of Core [[bitcoin/bitcoin#16902 | PR16902]] [1/3] bitcoin/bitcoin@89fb241 `src/bench/verify_script.cpp` was removed from BitcoinABC in https://reviews.bitcoinabc.org/rSTAGING1ff7b561eb7c6f30f1ec788e9503c30bc6485752 Test Plan: ninja all && src/bench/bitcoin-bench -filter=VerifyNestedIfScript Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8835
Summary: Includes comments added by Pieter Wuille. This is a backport of Core [[bitcoin/bitcoin#16902 | PR16902]] [2/3] bitcoin/bitcoin@d0e8f4d Depends on D8835 Test Plan: ``` ninja all check-all src/bench/bitcoin-bench -filter=VerifyNestedIfScript ``` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D8836
Summary: Commit message: > This optimization was first suggested by Sergio Demian Lerner in > https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/. > The implementation follows the suggested approach there, but with a slightly > simpler representation. PR description: > While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there. > > This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes. > > This PR mostly serves as a proof of concept that it's possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable. This is a backport of Core [[bitcoin/bitcoin#16902 | PR16902]] [3/3] bitcoin/bitcoin@e6e622e Depends on D8836 Test Plan: `ninja all check-all` Before this commit: ``` $ src/bench/bitcoin-bench -filter=VerifyNestedIfScript VerifyNestedIfScript, 5, 100, 0.0473822, 9.28665e-05, 9.67629e-05, 9.47128e-05 ``` After: ``` $ src/bench/bitcoin-bench -filter=VerifyNestedIfScript VerifyNestedIfScript, 5, 100, 0.0138046, 2.68691e-05, 3.00776e-05, 2.70254e-05 ``` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8837
* Redid conditional interpreting without binary trees * Fixed ControlOperationsInterpreterTest * Implemented O(1) conditional handling as proposed here bitcoin/bitcoin#16902 because Ben Carman pointed it out * Added some docs * Responded to code review
While investigating what mechanisms are possible to maximize the per-opcode verification cost of scripts, I noticed that the logic for determining whether a particular opcode is to be executed is O(n) in the nesting depth. This issue was also pointed out by Sergio Demian Lerner in https://bitslog.wordpress.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/, and this PR implements a variant of the O(1) algorithm suggested there.
This is not a problem currently, because even with a nesting depth of 100 (the maximum possible right now due to the 201 ops limit), the slowdown caused by this on my machine is around 70 ns per opcode (or 0.25 s per block) at worst, far lower than what is possible with other opcodes.
This PR mostly serves as a proof of concept that it's possible to avoid it, which may be relevant in discussions around increasing the opcode limits in future script versions. Without it, the execution time of scripts can grow quadratically with the nesting depth, which very quickly becomes unreasonable.
This improves upon #14245 by completely removing the
vfExec
vector.