Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-allocated vector type and use it for CScript #6914

Merged
merged 1 commit into from
Dec 1, 2015

Conversation

sipa
Copy link
Member

@sipa sipa commented Oct 30, 2015

This is likely a controversial change, and will need very careful review and testing.

It adds a new basic data type (prevector<N, T>) which is a fully API compatible drop-in replacement for std::vector<T> (except it does not support custom allocators, does not have at(), and misses a few comparison operators). It will allocate up to N elements inside the parent container directly, only switching over to heap-allocated storage if more is needed. The data structures for the N elements and the pointer/size metadata for the heap-allocated storage are shared, making it very efficient for small N.

CScript is switched to use this new type, reducing the memory consumption of mempool and chainstate.

A benchmark (reindex with script validation disabled):

2015-10-30 01:10:49   - Load block from disk: 0.00ms [0.07s]
2015-10-30 01:10:49       - Connect 595 transactions: 2.76ms (0.005ms/tx, 0.003ms/txin) [82.40s]
2015-10-30 01:10:49     - Verify 1096 txins: 2.80ms (0.003ms/txin) [88.73s]
2015-10-30 01:10:49     - Index writing: 1.29ms [61.31s]
2015-10-30 01:10:49     - Callbacks: 0.04ms [6.78s]
2015-10-30 01:10:49   - Connect total: 4.78ms [182.16s]
2015-10-30 01:10:49   - Flush: 0.77ms [38.90s]
2015-10-30 01:10:49   - Writing chainstate: 0.03ms [5.57s]
2015-10-30 01:10:49 UpdateTip: new best=000000000000046b1fdf00440a86e34886b2b0bd56472b22edb2805c92bbe4ca  height=223133  log2_work=69.460921  tx=13407260  date=2013-02-25 23:16:10 progress=0
.064773  cache=666.5MiB(1658277tx)
2015-10-30 01:10:49   - Connect postprocess: 1.46ms [46.22s]
2015-10-30 01:10:49 - Connect block: 7.03ms [272.92s]

2015-10-30 01:22:03       - Connect 595 transactions: 2.86ms (0.005ms/tx, 0.003ms/txin) [69.53s]
2015-10-30 01:22:03     - Verify 1096 txins: 2.90ms (0.003ms/txin) [75.80s]
2015-10-30 01:22:03     - Index writing: 1.26ms [56.37s]
2015-10-30 01:22:03     - Callbacks: 0.04ms [6.77s]
2015-10-30 01:22:03   - Connect total: 4.87ms [164.33s]
2015-10-30 01:22:03   - Flush: 1.14ms [24.95s]
2015-10-30 01:22:03   - Writing chainstate: 0.03ms [5.12s]
2015-10-30 01:22:03 UpdateTip: new best=000000000000046b1fdf00440a86e34886b2b0bd56472b22edb2805c92bbe4ca  height=223133  log2_work=69.460921  tx=13407260  date=2013-02-25 23:16:10 progress=0
.064773  cache=508.6MiB(1658277tx)
2015-10-30 01:22:03   - Connect postprocess: 1.36ms [43.18s]
2015-10-30 01:22:03 - Connect block: 7.41ms [237.66s]

So for this reindex up to 223133, the chainstate needs 23% less memory, and is 13% faster.

There are likely several other places where this datatype can be used.

@dcousens
Copy link
Contributor

tentative concept ACK, once-over utACK (not in depth)

As discussed on IRC, the trade offs [currently] are API compatibility (how invasive the change is) and the complexity of the change to achieve the desired memory/performance characteristics.

@jonasschnelli
Copy link
Contributor

Concept ACK.
23% less memory and 13% faster seems after a reason for doing a such change. Hows the performance boost biased through disabled verification, what performance benefit would be expected "in normal operation"?

@btcdrak
Copy link
Contributor

btcdrak commented Oct 30, 2015

@sipa Not sure why you'd consider this controversial. Seems like a big win to me.

@gmaxwell
Copy link
Contributor

@jonasschnelli well, disabled should (eventually) be representative of initial sync impact... runtime impact is a bit harder to measure, I expect, because it will depend greatly on the signature cache. When the hitrate is high I expect it to be closer to these figures, and when it's low... the signature validation would dwarf this improvement.

@btcdrak because it's hard to review, if it weren't such a big improvement in speed/memory it wouldn't be worth considering.

@jgarzik
Copy link
Contributor

jgarzik commented Oct 30, 2015

This is a standard technique for vectors. I don't find it controversial. concept ACK

@btcdrak
Copy link
Contributor

btcdrak commented Oct 30, 2015

@gmaxwell Understood, but I was meaning, if there is clearly such a benefit, it makes it uncontroversial to me.

Anyway, Concept ACK, will try to review this weekend.

@sipa
Copy link
Member Author

sipa commented Oct 30, 2015 via email

@laanwj
Copy link
Member

laanwj commented Nov 2, 2015

I like the speedup. I do think it introduces a lot of somewhat hard to review code, and by sake of being part of CScript it becomes part of the consensus code.

At some point we should separate how scripts are allocated/stored, which is not consensus critical, from the code used to evaluate them, which is consensus critical, but could act on any slice of memory.

@sipa
Copy link
Member Author

sipa commented Nov 2, 2015 via email

@theuni
Copy link
Member

theuni commented Nov 4, 2015

@sipa Not sure if you were aware and/or based your work on this, but the "dynarray" was proposed for c++14, but didn't make it.

http://en.cppreference.com/w/cpp/container/dynarray

I only mention because using a reference implementation of that may ease the minds of reviewers.

@sipa
Copy link
Member Author

sipa commented Nov 4, 2015

@cfields: dynarray is very different

  • It still allocates everything on the heap
  • Its size is fixed, so it doesn't behave like a vector, but like an array

@theuni
Copy link
Member

theuni commented Nov 4, 2015

@sipa ah sorry, I see. That's what i get for skimming.

@gmaxwell
Copy link
Contributor

gmaxwell commented Nov 6, 2015

This is a huge speedup, and tests clean in valgrind. For reindex w/ libsecp256k1 this takes the time down from 3 hours 16 minutes to 2 hours 7 minutes. (With signature tests disabled completely this gets it down to 1h 17m.). Size of the UTXO set in memory is reduced from 5486MB to 4012MB.

I intend to test this further, but I support this, and it looks generally okay to me. I'm surprised we got this much improvement just for cscript, I was expecting we'd have to make the entire cached transaction a single allocation to see this kind of benefit.

@dcousens
Copy link
Contributor

dcousens commented Nov 7, 2015

I was expecting we'd have to make the entire cached transaction a single allocation to see this kind of benefit.

Why isn't that the case OOI? I feel like CScript could just be 2 pointers, begin and end?

edit: Is CScript really mutated that often?

@gmaxwell
Copy link
Contributor

gmaxwell commented Nov 7, 2015

Inside the coins cache the whole entry gets mutated, e.g. to delete outputs when they're spent... but never in a way which needs to increase their size. The cscripts themselves in these entries are never mutated at all.

The mutation that decreases the entry size could be addressed by flagging their deletion, and a separate operation which compacts the coins cache (e.g. takes txouts which many deleted entries and packs then reallocs them to a smaller size)-- e.g. it could be run on flush (which scans for modified entries) as well as periodically when flusing is infrequent (e.g. db cache set to infinity), perhaps triggered by an overhead counter (which would be cheap to maintain). This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

@dcousens
Copy link
Contributor

dcousens commented Nov 7, 2015

This would eliminate quite a few more malloc operations and further reduce fragmentation; but I suspect it would be a much bigger change than just changing the type of cscripts.

Indeed, as discussed with @sipa originally, the biggest benefit about this change is how isolated it is, it just hot swaps an existing interface, while netting a huge performance increase.

@sipa sipa force-pushed the prevector branch 2 times, most recently from 15ece3f to 710442e Compare November 7, 2015 02:39
@sipa
Copy link
Member Author

sipa commented Nov 7, 2015

Added some comments.

@sipa
Copy link
Member Author

sipa commented Nov 8, 2015

@dcousens CScript is hardly ever mutated. But CCoins which contains a vector of CTxouts, which contain a CScript is mutated often. The proposal is to make CCoins allocate its entire memory (including several CScripts) at once.

@sipa sipa changed the title [WIP] Add pre-allocated vector type and use it for CScript Add pre-allocated vector type and use it for CScript Nov 13, 2015
@sipa
Copy link
Member Author

sipa commented Nov 13, 2015

Added a few more iterator tests (which caught (compile-time) errors), and removed the [WIP] marker.

@sipa
Copy link
Member Author

sipa commented Nov 28, 2015

Do we want this for 0.12?

@gmaxwell gmaxwell added this to the 0.12.0 milestone Nov 28, 2015
@gmaxwell
Copy link
Contributor

I think so, it's a large performance improvement and memory usage reduction.

@dcousens
Copy link
Contributor

ACK for 0.12

@laanwj
Copy link
Member

laanwj commented Nov 30, 2015

I'd prefer to merge a large, reasonably risky change like this after 0.12 branch, but if you are confident enough about this, I'm ok with it.

@morcos
Copy link
Member

morcos commented Nov 30, 2015

i've done a light code review and have used this pull extensively.
i'm in favor of merging and will continue to do a more extensive code review.

@gmaxwell
Copy link
Contributor

I have tested this extensively-- including operation in valgrind, perhaps more than I've tested the current tip without it. I have read the patch and think it looks fine but I feel anything doing this exceeds my knowledge of the subtle behaviors of C++ so I do not consider my code review to have much value on this pull.

I think it's okay to merge, and considering our interactions if it causes trouble during the 0.12 pre-release cycle its also not hard to back out: absent it; we're likely to ignore some performance problems in 0.12 chalking it up to "well 6914 is a big fix". It will also see a lot more testing if its merged for 0.12 (as I will have multiple people working full time on testing the 0.12 and the next elements update that will be 0.12 based).

@dcousens
Copy link
Contributor

dcousens commented Dec 1, 2015

I'll pull this on to my own nodes for extended testing, and will look over it once more.

@laanwj laanwj merged commit 114b581 into bitcoin:master Dec 1, 2015
laanwj added a commit that referenced this pull request Dec 1, 2015
114b581 Prevector type (Pieter Wuille)
zkbot added a commit to zcash/zcash that referenced this pull request Apr 18, 2018
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.
zkbot added a commit to zcash/zcash that referenced this pull request Apr 19, 2018
Upstream serialization improvements

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#5264
- bitcoin/bitcoin#6914
- bitcoin/bitcoin#6215
- bitcoin/bitcoin#8068
  - Only the `COMPACTSIZE` wrapper commit
- bitcoin/bitcoin#8658
- bitcoin/bitcoin#8708
  - Only the serializer variadics commit
- bitcoin/bitcoin#9039
- bitcoin/bitcoin#9125
  - Only the first two commits (the last two block on other upstream PRs)

Part of #2074.
Fuzzbawls added a commit to PIVX-Project/PIVX that referenced this pull request May 19, 2020
a8b965e Explicitly initialize prevector _union (random-zebra)
bdd98e8 [Trivial] Add a comment on the use of prevector in script. (random-zebra)
bbaa6e3 Clarify prevector::erase and avoid swap-to-clear (random-zebra)
28a8435 prevector: assert successful allocation (random-zebra)
de41cb5 Only call clear on prevector if it isn't trivially destructible (random-zebra)
8784df3 Make CScript (and prevector) c++11 movable. (random-zebra)
83f9ac6 serialize: Deprecate `begin_ptr` / `end_ptr` (random-zebra)
c3ecc12 prevector: add C++11-like data() method (random-zebra)
5490aa0 test prevector::swap (random-zebra)
035760e prevector::swap: fix (unreached) data corruption (random-zebra)
0e71400 prevector: destroy elements only via erase() (random-zebra)
9811a68 [Core] Prevector type (random-zebra)

Pull request description:

  Based on:
  - [x] #1554

  This introduces `prevector<N, T>`, a new basic data type which is a fully API compatible drop-in replacement for `std::vector<T>` and uses it for the script, to reduce the considerable memory overhead of vectors in cases where they normally contain a small number of small elements.

  Original tests in Bitcoin showed a reduction in dbcache memory usage by **23%**, and made an initial sync **13%** faster.

  Backported from the following upstream's PRs, with only additional edits in the 2nd layer scripts (`masternode-payments`, `masternode-budget`) and in the unit tests (to address the differences in `insecure_rand`):

  - bitcoin#6914
  - bitcoin#7888
  - bitcoin#8850
  - bitcoin#9349
  - bitcoin#9505
  - bitcoin#9856
  - bitcoin#10534
  - bitcoin#11011
  - bitcoin#14028

  ~~NOTE: Updates to `memusage.h` needed as well, when #1531 is merged.~~

ACKs for top commit:
  furszy:
    Tested ACK a8b965e .
  Fuzzbawls:
    ACK a8b965e

Tree-SHA512: 203b660dd8d9f98780be660dae0ac951766ba438836bd6f0ec921e7749c1a84143f3b920fe49f2cc399f4aa7e763098f78746d3b6ff845bcf913bb13de984bc1
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants