Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster duplicate input check in CheckTransaction (alternative to #14387) #14397

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
10 participants
@sipa
Copy link
Member

commented Oct 5, 2018

This is a simple improvement to the performance of the duplicate input check in CheckTransaction.

It includes the benchmark from #14387, for which it shows about a 2x speedup compared to master (while #14387 shows a 4.5x speedup, but with a lot more complexity).

@sipa sipa force-pushed the sipa:201810_fastduplicate branch Oct 5, 2018

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2018

cr-ack 6d1779d -- one of my earlier versions looked a lot like that.

I'd also suggest another optimization to make the vector a prevector with say 20 outputs so we can avoid the allocation on most txns.

May I suggest using adjacent_find with std::equal instead of your own loop?
https://en.cppreference.com/w/cpp/algorithm/adjacent_find

@jnewbery

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

I prefer this to #14387, but I have the same question for @sipa as I asked Jeremy in that PR:

This makes block propagation faster, but do we have an understanding of how much these milliseconds matter? Is there a way we can determine whether the increased complexity introduced is a reasonable price to pay for the performance win?

Granted, the complexity added here is minimal compared to 14387, but I'd still like to understand the cost/benefit analysis.

@sipa

This comment has been minimized.

Copy link
Member Author

commented Oct 5, 2018

@jnewbery Yeah, that's fair. I just wanted to show an alternative which is a smaller improvement but at much lower review complexity. I'm not convinced we need either approach.

@jamesob

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

For what it's worth, bitcoinperf didn't show any significant performance regression for IBD or reindex after the merge of #14247, so it's not clear that either changeset is needed.

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

@jamesob I think this is not so much about ibd but rather propagation

@JeremyRubin

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2018

@MarcoFalke correct.

It's something like 13ms slower.

@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #14400 (Add Benchmark to test input de-duplication worst case by JeremyRubin)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Oct 5, 2018

Even block propagation is no longer critical for this: When the prior optimization went in, we didn't have relay-before-validate. I'm not opposed to this change-- as it's pretty straight forward to review-- but since this particular validity criteria is pure (a function of the tx itself with no dependency on external state) effort might be better spent reorging things so that this test ends up covered by the wtxid validity caching which, although complementary, would likely give a bigger improvement.

@ryanofsky

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2018

Is there an explanation of what purpose the duplicate input check will serve if the connect block code is fixed to check for valid spends?

@gmaxwell

This comment has been minimized.

Copy link
Member

commented Oct 7, 2018

@ryanofsky Making block connection alone check for duplicate inputs isn't sufficient: The whole reason this code was initially added in PR #443 was to keep duplicate inputs out of the mempool-- block connection did originally prevent the dupes. If both block-connection and mempool processing prevented duplicate inputs, then, indeed, there should be no reason for this code.

@jl2012

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

I think both set+insert (the existing one) and sort (this PR) are O(NlogN). So sort is faster as it is more optimized?

@sipa sipa force-pushed the sipa:201810_fastduplicate branch Oct 12, 2018

@sipa

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2018

@jl2012 You get better memory locality and lower allocation overhead by representing things as a vector rather than a set. The set permits much faster updating though, but that's not something we need here.

@sipa sipa force-pushed the sipa:201810_fastduplicate branch to 7a9ac18 Oct 12, 2018

@@ -182,11 +182,14 @@ bool CheckTransaction(const CTransaction& tx, CValidationState &state, bool fChe

// Check for duplicate inputs - note that this check is slow so we skip it in CheckBlock
if (fCheckDuplicateInputs) {

This comment has been minimized.

Copy link
@jl2012

jl2012 Oct 13, 2018

Contributor

Tx with 1 input is quite common. Would it be faster if we skip those?

MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Nov 25, 2018

Merge bitcoin#14400: Add Benchmark to test input de-duplication worst…
… case

e4eee7d Add Benchmark to test input de-duplication worst case (Jeremy Rubin)

Pull request description:

  Because there are now 2PRs referencing this benchmark commit, we may as well add it independently as it is worth landing the benchmark even if neither patch is accepted.

  bitcoin#14397
  bitcoin#14387

Tree-SHA512: 4d947323c02297b0d8f5871f9e7cc42488c0e1792a8b10dc174a25f4dd53da8146fd276949a5dbacf4083f0c6a7235cb6f21a8bc35caa499bc2508f8a048b987
@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Nov 25, 2018

Needs rebase
@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

There hasn't been much activity lately and the patch still needs rebase, so I am closing this for now. Please let me know when you want to continue working on this, so the pull request can be re-opened.

@MarcoFalke MarcoFalke closed this Apr 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.