Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CScript.FindAndDelete #7884

Closed

Conversation

Projects
None yet
6 participants
@gavinandresen
Copy link
Contributor

commented Apr 15, 2016

The FindAndDelete function could move the same bytes multiple times if the signature being deleted appeared in the script multiple times.

These two commits add a unit test for FindAndDelete and optimize it so that bytes are moved at most once.

@kazcw

View changes

src/script/script.h Outdated
opcodetype opcode;
do
{
while (end() - pc >= (long)b.size() && memcmp(&pc[0], &b[0], b.size()) == 0)
{
pc = erase(pc, pc + b.size());
// Lazy copy-in-place if there is a match:
while (copied != pc-nShift) {

This comment has been minimized.

Copy link
@kazcw

kazcw Apr 15, 2016

Contributor

A test for !nShift here would prevent copying everything upto the first match to itself

@kazcw

View changes

src/script/script.h Outdated

if (nShift > 0)
{
while (copied != pc-nShift) {

This comment has been minimized.

Copy link
@kazcw

kazcw Apr 15, 2016

Contributor

I think this changes the result of FindAndDelete for invalid CScripts that end in a pushdata without enough data. Replacing "pc" with "end()" would keep the original behavior in this case.

@sipa

This comment has been minimized.

Copy link
Member

commented Apr 16, 2016

Concept ACK on the improvement, but there are maybe a few more edge cases to test:

  • Removing a byte sequence that spans multiple opcodes
  • Trying to remove a sequence that only occurs in the middle of opcodes (which doesn't work)
  • Trying to remove "OP_0 OP_1" from "OP_0 OP_0 OP_1 OP_1" leaves "OP_0 OP_1" (because we don't go backward after a replacement)

@gavinandresen gavinandresen force-pushed the gavinandresen:optimize_FindAndDelete branch 2 times, most recently Apr 18, 2016

gavinandresen added some commits Feb 3, 2016

Optimize CScript::FindAndDelete
CScript::FindAndDelete, called by the CHECKSIG Script opcodes, could
end up moving the bytes of the Script multiple times if there were
multiple matches for the Find.

This commit tweaks the algorithm so it never moves bytes in the Script
more than once per call to FindAndDelete.

@gavinandresen gavinandresen force-pushed the gavinandresen:optimize_FindAndDelete branch to aae53f2 Apr 18, 2016

@gavinandresen

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2016

Added unit tests as suggested by @sipa.
Fixed to be compatible with invalid CScripts that end with partial PUSHDATA's (and added unit tests) as suggested by @kazcw

And saved a couple of lines of code (and probably made faster, but copying is never done in normal operation with reasonable transactions) by switching to using std::copy instead of an explicit loop.

Thanks for the code reviews!

// This is an odd edge case: strip of the push-three-bytes
// prefix, leaving 02ff03 which is push-two-bytes:
s = ScriptFromHex("0302ff030302ff03");
d = ScriptFromHex("03");

This comment has been minimized.

Copy link
@TheBlueMatt

TheBlueMatt Apr 19, 2016

Contributor

I'd prefer to not have tests that check that FindAndDelete do exactly what we currently do when called against invalid scripts when we dont require that in any callers.

This comment has been minimized.

Copy link
@gavinandresen

gavinandresen Apr 25, 2016

Author Contributor

RE: behavior with invalid Scripts: I don't care one way or the other; seems safer to behave exactly the same, since this is consensus-critical code.

opcodetype opcode;
do
{
while (end() - pc >= (long)b.size() && memcmp(&pc[0], &b[0], b.size()) == 0)
{
pc = erase(pc, pc + b.size());
// Lazy copy-in-place if there is a match:
if (nShift == 0) copied = pc;

This comment has been minimized.

Copy link
@dcousens

dcousens Apr 21, 2016

Contributor

Isn't nShift always == 0 at the beginning? Why bother initializing copied above?

This comment has been minimized.

Copy link
@gavinandresen

gavinandresen Apr 21, 2016

Author Contributor

Why initialize copied instead of just eliminating the if/else and always copying? Because std::copy behavior is undefined if the first and last arguments point to the same element.

This comment has been minimized.

Copy link
@kazcw

kazcw Apr 21, 2016

Contributor

Why would that be? The Standard's definition of std::copy doesn't seem to say anything that would prohibit an empty range.

This comment has been minimized.

Copy link
@gavinandresen

gavinandresen via email Apr 21, 2016

Author Contributor

This comment has been minimized.

Copy link
@kazcw

kazcw Apr 21, 2016

Contributor

last is not in the range [first,last)

This comment has been minimized.

Copy link
@gavinandresen

gavinandresen Apr 25, 2016

Author Contributor

@kazcw : The first time a match is found, copied will be begin(). nShift will be zero, and pc will be pointing into the array somewhere.

If we removed copied = pc and just did:
copied = std::copy(copied+nShift, pc, copied);

... then first is copied, last is pc, and result is copied.
copied IS in the range [copied, pc)

@sipa

This comment has been minimized.

Copy link
Member

commented May 17, 2016

Superseded by #7907.

@sipa sipa closed this May 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.