Add pre-allocated vector type and use it for CScript #6914

Merged
merged 1 commit into from Dec 1, 2015

Conversation

Projects
None yet
9 participants
@sipa
Member

sipa commented Oct 30, 2015

This is likely a controversial change, and will need very careful review and testing.

It adds a new basic data type (prevector<N, T>) which is a fully API compatible drop-in replacement for std::vector<T> (except it does not support custom allocators, does not have at(), and misses a few comparison operators). It will allocate up to N elements inside the parent container directly, only switching over to heap-allocated storage if more is needed. The data structures for the N elements and the pointer/size metadata for the heap-allocated storage are shared, making it very efficient for small N.

CScript is switched to use this new type, reducing the memory consumption of mempool and chainstate.

A benchmark (reindex with script validation disabled):

2015-10-30 01:10:49   - Load block from disk: 0.00ms [0.07s]
2015-10-30 01:10:49       - Connect 595 transactions: 2.76ms (0.005ms/tx, 0.003ms/txin) [82.40s]
2015-10-30 01:10:49     - Verify 1096 txins: 2.80ms (0.003ms/txin) [88.73s]
2015-10-30 01:10:49     - Index writing: 1.29ms [61.31s]
2015-10-30 01:10:49     - Callbacks: 0.04ms [6.78s]
2015-10-30 01:10:49   - Connect total: 4.78ms [182.16s]
2015-10-30 01:10:49   - Flush: 0.77ms [38.90s]
2015-10-30 01:10:49   - Writing chainstate: 0.03ms [5.57s]
2015-10-30 01:10:49 UpdateTip: new best=000000000000046b1fdf00440a86e34886b2b0bd56472b22edb2805c92bbe4ca  height=223133  log2_work=69.460921  tx=13407260  date=2013-02-25 23:16:10 progress=0
.064773  cache=666.5MiB(1658277tx)
2015-10-30 01:10:49   - Connect postprocess: 1.46ms [46.22s]
2015-10-30 01:10:49 - Connect block: 7.03ms [272.92s]

2015-10-30 01:22:03       - Connect 595 transactions: 2.86ms (0.005ms/tx, 0.003ms/txin) [69.53s]
2015-10-30 01:22:03     - Verify 1096 txins: 2.90ms (0.003ms/txin) [75.80s]
2015-10-30 01:22:03     - Index writing: 1.26ms [56.37s]
2015-10-30 01:22:03     - Callbacks: 0.04ms [6.77s]
2015-10-30 01:22:03   - Connect total: 4.87ms [164.33s]
2015-10-30 01:22:03   - Flush: 1.14ms [24.95s]
2015-10-30 01:22:03   - Writing chainstate: 0.03ms [5.12s]
2015-10-30 01:22:03 UpdateTip: new best=000000000000046b1fdf00440a86e34886b2b0bd56472b22edb2805c92bbe4ca  height=223133  log2_work=69.460921  tx=13407260  date=2013-02-25 23:16:10 progress=0
.064773  cache=508.6MiB(1658277tx)
2015-10-30 01:22:03   - Connect postprocess: 1.36ms [43.18s]
2015-10-30 01:22:03 - Connect block: 7.41ms [237.66s]

So for this reindex up to 223133, the chainstate needs 23% less memory, and is 13% faster.

There are likely several other places where this datatype can be used.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Oct 30, 2015

Contributor

tentative concept ACK, once-over utACK (not in depth)

As discussed on IRC, the trade offs [currently] are API compatibility (how invasive the change is) and the complexity of the change to achieve the desired memory/performance characteristics.

Contributor

dcousens commented Oct 30, 2015

tentative concept ACK, once-over utACK (not in depth)

As discussed on IRC, the trade offs [currently] are API compatibility (how invasive the change is) and the complexity of the change to achieve the desired memory/performance characteristics.

@jonasschnelli

This comment has been minimized.

Show comment
Hide comment
@jonasschnelli

jonasschnelli Oct 30, 2015

Member

Concept ACK.
23% less memory and 13% faster seems after a reason for doing a such change. Hows the performance boost biased through disabled verification, what performance benefit would be expected "in normal operation"?

Member

jonasschnelli commented Oct 30, 2015

Concept ACK.
23% less memory and 13% faster seems after a reason for doing a such change. Hows the performance boost biased through disabled verification, what performance benefit would be expected "in normal operation"?

@btcdrak

This comment has been minimized.

Show comment
Hide comment
@btcdrak

btcdrak Oct 30, 2015

Member

@sipa Not sure why you'd consider this controversial. Seems like a big win to me.

Member

btcdrak commented Oct 30, 2015

@sipa Not sure why you'd consider this controversial. Seems like a big win to me.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Oct 30, 2015

Member

@jonasschnelli well, disabled should (eventually) be representative of initial sync impact... runtime impact is a bit harder to measure, I expect, because it will depend greatly on the signature cache. When the hitrate is high I expect it to be closer to these figures, and when it's low... the signature validation would dwarf this improvement.

@btcdrak because it's hard to review, if it weren't such a big improvement in speed/memory it wouldn't be worth considering.

Member

gmaxwell commented Oct 30, 2015

@jonasschnelli well, disabled should (eventually) be representative of initial sync impact... runtime impact is a bit harder to measure, I expect, because it will depend greatly on the signature cache. When the hitrate is high I expect it to be closer to these figures, and when it's low... the signature validation would dwarf this improvement.

@btcdrak because it's hard to review, if it weren't such a big improvement in speed/memory it wouldn't be worth considering.

@jgarzik

This comment has been minimized.

Show comment
Hide comment
@jgarzik

jgarzik Oct 30, 2015

Contributor

This is a standard technique for vectors. I don't find it controversial. concept ACK

Contributor

jgarzik commented Oct 30, 2015

This is a standard technique for vectors. I don't find it controversial. concept ACK

@btcdrak

This comment has been minimized.

Show comment
Hide comment
@btcdrak

btcdrak Oct 30, 2015

Member

@gmaxwell Understood, but I was meaning, if there is clearly such a benefit, it makes it uncontroversial to me.

Anyway, Concept ACK, will try to review this weekend.

Member

btcdrak commented Oct 30, 2015

@gmaxwell Understood, but I was meaning, if there is clearly such a benefit, it makes it uncontroversial to me.

Anyway, Concept ACK, will try to review this weekend.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Oct 30, 2015

Member
Member

sipa commented Oct 30, 2015

@laanwj laanwj added the Validation label Oct 30, 2015

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 2, 2015

Member

I like the speedup. I do think it introduces a lot of somewhat hard to review code, and by sake of being part of CScript it becomes part of the consensus code.

At some point we should separate how scripts are allocated/stored, which is not consensus critical, from the code used to evaluate them, which is consensus critical, but could act on any slice of memory.

Member

laanwj commented Nov 2, 2015

I like the speedup. I do think it introduces a lot of somewhat hard to review code, and by sake of being part of CScript it becomes part of the consensus code.

At some point we should separate how scripts are allocated/stored, which is not consensus critical, from the code used to evaluate them, which is consensus critical, but could act on any slice of memory.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 2, 2015

Member
Member

sipa commented Nov 2, 2015

@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Nov 4, 2015

Member

@sipa Not sure if you were aware and/or based your work on this, but the "dynarray" was proposed for c++14, but didn't make it.

http://en.cppreference.com/w/cpp/container/dynarray

I only mention because using a reference implementation of that may ease the minds of reviewers.

Member

theuni commented Nov 4, 2015

@sipa Not sure if you were aware and/or based your work on this, but the "dynarray" was proposed for c++14, but didn't make it.

http://en.cppreference.com/w/cpp/container/dynarray

I only mention because using a reference implementation of that may ease the minds of reviewers.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 4, 2015

Member

@cfields: dynarray is very different

  • It still allocates everything on the heap
  • Its size is fixed, so it doesn't behave like a vector, but like an array
Member

sipa commented Nov 4, 2015

@cfields: dynarray is very different

  • It still allocates everything on the heap
  • Its size is fixed, so it doesn't behave like a vector, but like an array
@theuni

This comment has been minimized.

Show comment
Hide comment
@theuni

theuni Nov 4, 2015

Member

@sipa ah sorry, I see. That's what i get for skimming.

Member

theuni commented Nov 4, 2015

@sipa ah sorry, I see. That's what i get for skimming.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 6, 2015

Member

This is a huge speedup, and tests clean in valgrind. For reindex w/ libsecp256k1 this takes the time down from 3 hours 16 minutes to 2 hours 7 minutes. (With signature tests disabled completely this gets it down to 1h 17m.). Size of the UTXO set in memory is reduced from 5486MB to 4012MB.

I intend to test this further, but I support this, and it looks generally okay to me. I'm surprised we got this much improvement just for cscript, I was expecting we'd have to make the entire cached transaction a single allocation to see this kind of benefit.

Member

gmaxwell commented Nov 6, 2015

This is a huge speedup, and tests clean in valgrind. For reindex w/ libsecp256k1 this takes the time down from 3 hours 16 minutes to 2 hours 7 minutes. (With signature tests disabled completely this gets it down to 1h 17m.). Size of the UTXO set in memory is reduced from 5486MB to 4012MB.

I intend to test this further, but I support this, and it looks generally okay to me. I'm surprised we got this much improvement just for cscript, I was expecting we'd have to make the entire cached transaction a single allocation to see this kind of benefit.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Nov 7, 2015

Contributor

I was expecting we'd have to make the entire cached transaction a single allocation to see this kind of benefit.

Why isn't that the case OOI? I feel like CScript could just be 2 pointers, begin and end?

edit: Is CScript really mutated that often?

Contributor

dcousens commented Nov 7, 2015

I was expecting we'd have to make the entire cached transaction a single allocation to see this kind of benefit.

Why isn't that the case OOI? I feel like CScript could just be 2 pointers, begin and end?

edit: Is CScript really mutated that often?

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 7, 2015

Member

Inside the coins cache the whole entry gets mutated, e.g. to delete outputs when they're spent... but never in a way which needs to increase their size. The cscripts themselves in these entries are never mutated at all.

The mutation that decreases the entry size could be addressed by flagging their deletion, and a separate operation which compacts the coins cache (e.g. takes txouts which many deleted entries and packs then reallocs them to a smaller size)-- e.g. it could be run on flush (which scans for modified entries) as well as periodically when flusing is infrequent (e.g. db cache set to infinity), perhaps triggered by an overhead counter (which would be cheap to maintain). This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

Member

gmaxwell commented Nov 7, 2015

Inside the coins cache the whole entry gets mutated, e.g. to delete outputs when they're spent... but never in a way which needs to increase their size. The cscripts themselves in these entries are never mutated at all.

The mutation that decreases the entry size could be addressed by flagging their deletion, and a separate operation which compacts the coins cache (e.g. takes txouts which many deleted entries and packs then reallocs them to a smaller size)-- e.g. it could be run on flush (which scans for modified entries) as well as periodically when flusing is infrequent (e.g. db cache set to infinity), perhaps triggered by an overhead counter (which would be cheap to maintain). This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Nov 7, 2015

Contributor

This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

Indeed, as discussed with @sipa originally, the biggest benefit about this change is how isolated it is, it just hot swaps an existing interface, while netting a huge performance increase.

Contributor

dcousens commented Nov 7, 2015

This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

Indeed, as discussed with @sipa originally, the biggest benefit about this change is how isolated it is, it just hot swaps an existing interface, while netting a huge performance increase.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 7, 2015

Member

Added some comments.

Member

sipa commented Nov 7, 2015

Added some comments.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 8, 2015

Member

@dcousens CScript is hardly ever mutated. But CCoins which contains a vector of CTxouts, which contain a CScript is mutated often. The proposal is to make CCoins allocate its entire memory (including several CScripts) at once.

Member

sipa commented Nov 8, 2015

@dcousens CScript is hardly ever mutated. But CCoins which contains a vector of CTxouts, which contain a CScript is mutated often. The proposal is to make CCoins allocate its entire memory (including several CScripts) at once.

@sipa sipa changed the title from [WIP] Add pre-allocated vector type and use it for CScript to Add pre-allocated vector type and use it for CScript Nov 13, 2015

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 13, 2015

Member

Added a few more iterator tests (which caught (compile-time) errors), and removed the [WIP] marker.

Member

sipa commented Nov 13, 2015

Added a few more iterator tests (which caught (compile-time) errors), and removed the [WIP] marker.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Nov 28, 2015

Member

Do we want this for 0.12?

Member

sipa commented Nov 28, 2015

Do we want this for 0.12?

@gmaxwell gmaxwell added this to the 0.12.0 milestone Nov 28, 2015

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 28, 2015

Member

I think so, it's a large performance improvement and memory usage reduction.

Member

gmaxwell commented Nov 28, 2015

I think so, it's a large performance improvement and memory usage reduction.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Nov 30, 2015

Contributor

ACK for 0.12

Contributor

dcousens commented Nov 30, 2015

ACK for 0.12

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Nov 30, 2015

Member

I'd prefer to merge a large, reasonably risky change like this after 0.12 branch, but if you are confident enough about this, I'm ok with it.

Member

laanwj commented Nov 30, 2015

I'd prefer to merge a large, reasonably risky change like this after 0.12 branch, but if you are confident enough about this, I'm ok with it.

@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Nov 30, 2015

Member

i've done a light code review and have used this pull extensively.
i'm in favor of merging and will continue to do a more extensive code review.

Member

morcos commented Nov 30, 2015

i've done a light code review and have used this pull extensively.
i'm in favor of merging and will continue to do a more extensive code review.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Nov 30, 2015

Member

I have tested this extensively-- including operation in valgrind, perhaps more than I've tested the current tip without it. I have read the patch and think it looks fine but I feel anything doing this exceeds my knowledge of the subtle behaviors of C++ so I do not consider my code review to have much value on this pull.

I think it's okay to merge, and considering our interactions if it causes trouble during the 0.12 pre-release cycle its also not hard to back out: absent it; we're likely to ignore some performance problems in 0.12 chalking it up to "well 6914 is a big fix". It will also see a lot more testing if its merged for 0.12 (as I will have multiple people working full time on testing the 0.12 and the next elements update that will be 0.12 based).

Member

gmaxwell commented Nov 30, 2015

I have tested this extensively-- including operation in valgrind, perhaps more than I've tested the current tip without it. I have read the patch and think it looks fine but I feel anything doing this exceeds my knowledge of the subtle behaviors of C++ so I do not consider my code review to have much value on this pull.

I think it's okay to merge, and considering our interactions if it causes trouble during the 0.12 pre-release cycle its also not hard to back out: absent it; we're likely to ignore some performance problems in 0.12 chalking it up to "well 6914 is a big fix". It will also see a lot more testing if its merged for 0.12 (as I will have multiple people working full time on testing the 0.12 and the next elements update that will be 0.12 based).

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens Dec 1, 2015

Contributor

I'll pull this on to my own nodes for extended testing, and will look over it once more.

Contributor

dcousens commented Dec 1, 2015

I'll pull this on to my own nodes for extended testing, and will look over it once more.

@laanwj laanwj merged commit 114b581 into bitcoin:master Dec 1, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Dec 1, 2015

Merge pull request #6914
114b581 Prevector type (Pieter Wuille)

@UdjinM6 UdjinM6 referenced this pull request in dashpay/dash Sep 18, 2016

Merged

Implement a way to sync only needed winners #1028

@str4d str4d referenced this pull request in zcash/zcash Apr 17, 2018

Merged

Upstream serialization improvements #3180

zkbot added a commit to zcash/zcash that referenced this pull request Apr 17, 2018

Auto merge of #3180 - str4d:transaction-serialization, r=<try>
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.

zkbot added a commit to zcash/zcash that referenced this pull request Apr 18, 2018

Auto merge of #3180 - str4d:transaction-serialization, r=<try>
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.

zkbot added a commit to zcash/zcash that referenced this pull request Apr 19, 2018

Auto merge of #3180 - str4d:transaction-serialization, r=ebfull
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment