Possible savings in `dumptxoutset` serialization format (~20%) #25675

RCasatta · 2022-07-22T11:43:39Z

Is your feature request related to a problem? Please describe.

Size of the serialized UTXO set

Describe the solution you'd like

The serialization format of the serialized UTXO set from dumptxoutset is a list of (COutPoint, Coin).

However, many out points refer to the same transaction so we can group by Txid with:
list(Txid, list((vout,Coin))

Since the cursor is already iterating through sorted Txid this doesn't add complexity to the serialization code.

Considering the UTXO at height 745995:

serialized_size: ~5.3Gb
total_elements: 83_082_178
uniques_txids: 49_517_483
bytes_lost: total_elements // due to the additional byte for the length of the inner list
bytes_savings: (total_elements-unique_txids)*32 - bytes_lost ~= 1Gb

Describe alternatives you've considered

Additional bytes could be saved by leveraging the duplications in scripts (address reuse). However, this is not considered worthy because the format would lose the streaming property and also because we don't want to optimize on something which is not recommended.

The byte lost for expressing the length of the inner list could be optimized with a special byte containing both the length of the list and the first vout, since usually both vout and this value are very small they should fit on a single byte most of the time having a fallback for edge cases.

Additional context

Tagging @jamesob author of #16899
assumeutxo summary #15606

The text was updated successfully, but these errors were encountered:

luke-jr · 2024-03-14T20:27:10Z

Outputs are also created sequentially, so it seems likely list(Txid, list(Coin,...) might improve things too?

RCasatta · 2024-03-14T20:56:45Z

Shouldn't be list(Txid, list(Option<Coin>,...) to recompute the vout?
In this case I don't think so because it needs a byte for every None?

RCasatta added the Feature label Jul 22, 2022

aureleoules mentioned this issue Sep 8, 2022

rpc: Optimize serialization disk space of dumptxoutset #26045

Closed

This was referenced Mar 10, 2024

rpc: Optimize serialization and enhance metadata of dumptxoutset output #29612

Open

AssumeUTXO Mainnet Readiness Tracking #29616

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible savings in `dumptxoutset` serialization format (~20%) #25675

Possible savings in `dumptxoutset` serialization format (~20%) #25675

RCasatta commented Jul 22, 2022

luke-jr commented Mar 14, 2024

RCasatta commented Mar 14, 2024

Possible savings in dumptxoutset serialization format (~20%) #25675

Possible savings in dumptxoutset serialization format (~20%) #25675

Comments

RCasatta commented Jul 22, 2022

luke-jr commented Mar 14, 2024

RCasatta commented Mar 14, 2024

Possible savings in `dumptxoutset` serialization format (~20%) #25675

Possible savings in `dumptxoutset` serialization format (~20%) #25675