You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Size of the serialized UTXO set
Describe the solution you'd like
The serialization format of the serialized UTXO set from dumptxoutset is a list of (COutPoint, Coin).
However, many out points refer to the same transaction so we can group by Txid with: list(Txid, list((vout,Coin))
Since the cursor is already iterating through sorted Txid this doesn't add complexity to the serialization code.
Considering the UTXO at height 745995:
serialized_size: ~5.3Gb
total_elements: 83_082_178
uniques_txids: 49_517_483
bytes_lost: total_elements // due to the additional byte for the length of the inner list
bytes_savings: (total_elements-unique_txids)*32 - bytes_lost ~= 1Gb
Describe alternatives you've considered
Additional bytes could be saved by leveraging the duplications in scripts (address reuse). However, this is not considered worthy because the format would lose the streaming property and also because we don't want to optimize on something which is not recommended.
The byte lost for expressing the length of the inner list could be optimized with a special byte containing both the length of the list and the first vout, since usually both vout and this value are very small they should fit on a single byte most of the time having a fallback for edge cases.
Is your feature request related to a problem? Please describe.
Size of the serialized UTXO set
Describe the solution you'd like
The serialization format of the serialized UTXO set from
dumptxoutset
is a list of(COutPoint, Coin)
.However, many out points refer to the same transaction so we can group by
Txid
with:list(Txid, list((vout,Coin))
Since the cursor is already iterating through sorted
Txid
this doesn't add complexity to the serialization code.Considering the UTXO at height 745995:
Describe alternatives you've considered
Additional bytes could be saved by leveraging the duplications in scripts (address reuse). However, this is not considered worthy because the format would lose the streaming property and also because we don't want to optimize on something which is not recommended.
The byte lost for expressing the length of the inner list could be optimized with a special byte containing both the length of the list and the first vout, since usually both vout and this value are very small they should fit on a single byte most of the time having a fallback for edge cases.
Additional context
Tagging @jamesob author of #16899
assumeutxo summary #15606
The text was updated successfully, but these errors were encountered: