Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: Rework utxoset/view to use outpoints. #2540

Merged
merged 1 commit into from
Jan 14, 2021

Conversation

rstaudt2
Copy link
Member

@rstaudt2 rstaudt2 commented Jan 1, 2021

NOTE: This contains a database migration, so if you plan to test it, please make a copy of your data directory first.


This modifies the utxoset in the database and related UtxoViewpoint to store and work with unspent transaction outputs on a per-output basis instead of at a transaction level. This is a port of btcsuite/btcd#1045 with some significant differences due to the stake tree and transactions in Decred.

The primary motivation is to simplify the code, pave the way for a utxo cache, and generally focus on optimizing runtime performance.

The tradeoff is that this approach does somewhat increase the size of the serialized utxoset since it means that the transaction hash is duplicated for each output as a part of the key and some additional details are duplicated in each output. The details duplicated in each output include flags encoded into a single byte that specify whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type. Additionally, the containing block height and index are stored in each output.

However, in practice, the size difference isn't all that large, disk space is relatively cheap, certainly cheaper than memory, and it is much more important to provide more efficient runtime operation since that is the ultimate purpose of the daemon.

While performing this conversion, it also simplifies the code to remove the transaction version information from the utxoset as well as the spend journal. The logic for only serializing it under certain circumstances is complicated, and it was only used for the gettxout RPC, where it has already been removed.

The utxo set and spend journal in the database are automatically migrated to the new format with this commit and it is possible to interrupt and resume the migration process.

Finally, it also updates all references and tests that previously dealt with transaction hashes to use outpoints instead.

An overview of the changes are as follows:

  • Remove transaction version from both spent and unspent output entries
    • Update utxo serialization format to exclude the version
    • Update spend journal serialization format to exclude the version
  • Convert UtxoEntry to represent a specific utxo instead of a transaction with all remaining utxos
    • Optimize for memory usage with an eye toward a utxo cache
      • Combine fields such as whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type, into a single byte
      • Align entry fields to eliminate extra padding since ultimately there will be a lot of these in memory
      • Introduce a free list for serializing an outpoint to the database key format to significantly reduce pressure on the GC
    • Update entries to be keyed by a <hash><tree><output index> outpoint rather than just a tx hash
    • Update all related functions that previously dealt with transaction hashes to accept outpoints instead
    • Update all callers accordingly
    • Only add individually requested outputs from the mempool when constructing a mempool view
  • Modify the spend journal to always store the encoded flags with every spent txout
    • Combine fields such as whether the containing transaction is a coinbase, whether the containing transaction has an expiry, and the transaction type into a single byte
      • Use 4 bits instead of 3 for the transaction type to be consistent with utxos. The extra bit was already unused so this doesn't take any additional space
    • Remove the fully spent flag
  • Introduce ticketMinOuts in place of stakeExtra
    • Renamed stakeExtra as ticketMinOuts and updated all comments to make the purpose of the field clearer
    • Only store ticketMinOuts for ticket submission outputs
    • Add TicketMinimalOutputs function on UtxoEntry in place of ConvertUtxosToMinimalOutputs
  • Always decompress data loaded from the database now that a utxo entry only consists of a specific output
  • Introduce upgrade code to migrate the utxo set and spend journal to the new format
    • Update current database version to 9
    • Update current utxo set version to 3
    • Update current spend journal version to 3
    • Introduce the ability to run upgrades after the block index has been loaded
  • Update all tests to expect the correct encodings, remove tests that no longer apply, and add new ones for the new expected behavior
    • Convert old tests for the legacy utxo format deserialization code to test the new function that is used during upgrade
  • Introduce a few new functions on UtxoViewpoint
    • AddTxOut for adding an individual txout versus all of them
    • addTxOut to handle the common code between the new AddTxOut and existing AddTxOuts
    • RemoveEntry for removing an individual txout
  • Remove the ErrDiscordantTxTree error
    • Since utxos are now retrieved using an outpoint, which includes the tree, it is no longer possible to hit this error path

A couple other notes on the implementation:

  • Unfortunately, it wasn't really possible to break this into smaller working commits, so these changes are all in one large commit.
  • The spend journal migration does take awhile to run (~20 mins, due to the fact that it needs to read all block entries).

Part of #1145.

@davecgh davecgh added this to the 1.7.0 milestone Jan 1, 2021
@davecgh davecgh added database upgrade Issues and/or pull requests that involve a new database version. optimization labels Jan 1, 2021
@davecgh davecgh mentioned this pull request Jan 1, 2021
28 tasks
@davecgh davecgh added the non-forking consensus Changes that involve modifying consensus code without causing any forking changes. label Jan 1, 2021
Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 1.

I still have a lot of review and testing to do, but I gave this a very quick first pass to find anything that stood out before really digging into the details and doing in-depth testing that I wanted to go ahead and provide feedback for first.

This is fantastic overall and great to see it ported over. I really appreciate the time put into creating the PR description with all of the relevant details and references, important notes, testing implications, etc. That really helps not only the initial review, but for historical purposes as well. Also, from having done the PR this one is porting over, I know how much effort it takes to make this type of changes that have far reaching implications and thus also make a big impact, so I thank you for taking this on!

blockchain/fullblocktests/generate.go Show resolved Hide resolved
blockchain/scriptval.go Outdated Show resolved Hide resolved
blockchain/go.mod Outdated Show resolved Hide resolved
internal/mempool/mempool.go Show resolved Hide resolved
internal/mempool/mempool_test.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Show resolved Hide resolved
blockchain/upgrade.go Outdated Show resolved Hide resolved
blockchain/upgrade.go Show resolved Hide resolved
blockchain/upgrade.go Show resolved Hide resolved
blockchain/upgrade.go Outdated Show resolved Hide resolved
@rstaudt2 rstaudt2 force-pushed the utxoentry-outpoints branch 4 times, most recently from cb72606 to 2f4d524 Compare January 5, 2021 14:46
@rstaudt2
Copy link
Member Author

rstaudt2 commented Jan 5, 2021

I made an update to store ticketMinOuts in all ticket outputs rather than only in ticket submission outputs. This was required because the legacy spend journal stored that data for every ticket output, so to maintain backward compatibility, it needs to remain the same (otherwise, there will be an issue in the unlikely case of deep reorgs).

I am going to start taking a look at implementing a migration that recreates the spend journal data in the new format from scratch (in a separate PR) since I'd prefer not to have to keep the legacy handling code around for long.

Edit: If feasible, I may include that spend journal migration in this PR. Having this migration would allow us to only keep ticket minimal outputs in the submission output, but that would need to modify the utxo migration, and ideally we don’t want two utxo migrations back to back.

internal/netsync/manager.go Outdated Show resolved Hide resolved
rpcadaptors.go Outdated Show resolved Hide resolved
@rstaudt2 rstaudt2 force-pushed the utxoentry-outpoints branch 3 times, most recently from 87a1d67 to 335e895 Compare January 7, 2021 14:57
@rstaudt2
Copy link
Member Author

rstaudt2 commented Jan 7, 2021

Found one more issue related to processing deep reorgs with the legacy stxo handling (diff). This is actually a bug in the btcd code as well - it is fetching a utxo using the tx hash rather than the hash of the input transaction (though it doesn't really matter, since that code path would never be hit at this point since its only run when encountering legacy stxo entires).

I should be able to get the stxo migration in place relatively quickly, which will avoid this case altogether (though unfortunately, since it needs to read all historical blocks/txns, it will likely take in the range of 10-20 mins to complete).

blockchain/upgrade.go Outdated Show resolved Hide resolved
blockchain/upgrade.go Outdated Show resolved Hide resolved
blockchain/upgrade.go Outdated Show resolved Hide resolved
blockchain/upgrade.go Outdated Show resolved Hide resolved
Copy link
Member

@sefbkn sefbkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this PR. Only have a few minor points right now.

blockchain/chainio.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
@rstaudt2
Copy link
Member Author

rstaudt2 commented Jan 8, 2021

Rebased to master and also made the comment updates that @sefbkn pointed out.

Also, as a heads up, I have the stxo migration ready and will be pushing it to this PR shortly. Sorry in advance for the churn, but I think it is worth bringing into this PR directly because it avoids having another utxo migration and completely removes the legacy handling cases. I'll call out the changes/diff when I push that up.

@davecgh
Copy link
Member

davecgh commented Jan 8, 2021

Also, as a heads up, I have the stxo migration ready and will be pushing it to this PR shortly. Sorry in advance for the churn, but I think it is worth bringing into this PR directly because it avoids having another utxo migration and completely removes the legacy handling cases. I'll call out the changes/diff when I push that up.

Agreed. I was going to suggest this myself if it was done prior to this one being finalized, so glad to see you're of the same mindset.

@rstaudt2 rstaudt2 force-pushed the utxoentry-outpoints branch 2 times, most recently from d138150 to 4da0eda Compare January 8, 2021 16:16
@rstaudt2
Copy link
Member Author

rstaudt2 commented Jan 8, 2021

The spend journal migration has now been added (diff). The updates include the following:

  • Added a upgradeDBPostBlockIndexLoad function to run migrations after the block index has been loaded.
  • Added the migrateSpendJournalVersion2To3 migration. It takes about ~20 mins to run since it has to read all block entries.
  • Updated ticket minimal outputs to only be stored in ticket submission outputs (since that could now be handled properly with the spend journal migration in place).
  • Removed the reserved field from the spend journal entries since it is no longer needed for backward compatibility.
  • Removed the spent flag from the spend journal entries since it is no longer needed for backward compatibility.
  • Removed the code that was handling legacy v2 spend journal entries since they are now migrated.

@rstaudt2
Copy link
Member Author

rstaudt2 commented Jan 8, 2021

The spend journal migration has now been added (diff). The updates include the following:

* Added a `upgradeDBPostBlockIndexLoad` function to run migrations after the block index has been loaded.

* Added the `migrateSpendJournalVersion2To3` migration.  It takes about ~20 mins to run since it has to read all block entries.

* Updated ticket minimal outputs to only be stored in ticket _submission_ outputs (since that could now be handled properly with the spend journal migration in place).

* Removed the reserved field from the spend journal entries since it is no longer needed for backward compatibility.

* Removed the spent flag from the spend journal entries since it is no longer needed for backward compatibility.

* Removed the code that was handling legacy v2 spend journal entries since they are now migrated.

I also tested a deep reorg (invalidated/reconsidered a block thousands of blocks deep into the migrated data) to ensure that the migrated spent journal entries could be resurrected properly.

Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2.

I've done quite a bit of testing at this point including very large reorgs and the functionality in terms of correctness is looking great thus far.

I still need to dig into one or two more areas for a final round of review, but here is some more feedback in the mean time.

blockchain/compress.go Outdated Show resolved Hide resolved
blockchain/chainio.go Show resolved Hide resolved
blockchain/chainio.go Outdated Show resolved Hide resolved
blockchain/chainio.go Outdated Show resolved Hide resolved
blockchain/chainio.go Outdated Show resolved Hide resolved
blockchain/chainio.go Outdated Show resolved Hide resolved
internal/mempool/mempool_test.go Outdated Show resolved Hide resolved
}

// Upgrade the database post block index load as needed.
return upgradeDBPostBlockIndexLoad(ctx, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a bit offline, but I'll comment here for future reference as well.

This will end up causing issues with future multi-version upgrades because it means that the upgrade process will not be able to complete uninterrupted and the need for it to run the normal code which relies on upgrades having happened.

For example, consider an upgrade from say version 7 to a future version 10. It will complete the version 7 to 8 upgrade, but then not be able to get to version 9 because that can only happen once the normal upgrade process completes and the block index is fully loaded. However, the code will otherwise depend on the database version having been upgraded to version 10 at that point, but it won't be.

We decided to go ahead and leave this in for now as the plan is to address it later by a more robust general reindexing solution.

@rstaudt2 rstaudt2 force-pushed the utxoentry-outpoints branch 2 times, most recently from 9f0470d to 376805d Compare January 9, 2021 02:31
blockchain/chainio_test.go Outdated Show resolved Hide resolved
blockchain/chainio_test.go Show resolved Hide resolved
blockchain/chainio_test.go Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/utxoviewpoint.go Outdated Show resolved Hide resolved
blockchain/chainio_test.go Outdated Show resolved Hide resolved
},
{
name: "outputs 0 and 2 not coinbase",
serialized: hexToBytes("df3982a7310132000496b538e853519c726a2c91e61ec11" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have manually verified all of these serializations in TestUtxoSerialization are accurate.

blockchain/chainio_test.go Outdated Show resolved Hide resolved
@rstaudt2 rstaudt2 force-pushed the utxoentry-outpoints branch 2 times, most recently from 5b1fc34 to c547579 Compare January 9, 2021 21:17
blockchain/upgrade.go Outdated Show resolved Hide resolved
blockchain/upgrade.go Outdated Show resolved Hide resolved
blockchain/upgrade.go Show resolved Hide resolved
@rstaudt2 rstaudt2 force-pushed the utxoentry-outpoints branch 3 times, most recently from 9cb1e3c to b606b4c Compare January 10, 2021 19:38
Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only outstanding issue as far as I can tell is the migration code issue surrounding side chain blocks. Once that is in, I'll give it another round of review and testing and assuming that goes well, this will be ready.

blockchain/chainio.go Outdated Show resolved Hide resolved
@rstaudt2
Copy link
Member Author

The only outstanding issue as far as I can tell is the migration code issue surrounding side chain blocks. Once that is in, I'll give it another round of review and testing and assuming that goes well, this will be ready.

This should be resolved now, did further testing around side chain blocks for the migration as well as invalidating/reconsidering the main chain block at the same height as a side chain block.

Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All updates reviewed and test.

Migration stats:

SSD:

2021-01-10 15:26:08.780 [INF] CHAN: Done migrating utxoset.  Total entries: 660256 in 14.265s
2021-01-10 15:26:08.780 [INF] CHAN: Removing old utxoset entries...
2021-01-10 15:26:11.109 [INF] CHAN: Deleted 660256 keys (660256 total) from old utxoset
2021-01-10 15:26:11.109 [INF] CHAN: Done removing old utxoset entries in 2.328s
...
2021-01-10 15:41:54.722 [INF] CHAN: Done migrating spend journal.  Total entries: 496670 in 15m37.916s
2021-01-10 15:41:54.722 [INF] CHAN: Removing temp data...
...
2021-01-10 15:42:39.690 [INF] CHAN: Done removing temp data in 44.968s
...
2021-01-10 15:42:43.155 [INF] CHAN: Done removing old spend journal in 3.464s
2021-01-10 15:42:43.155 [INF] CHAN: Done upgrading database in 16m48.644s.
2021-01-10 15:42:43.158 [INF] CHAN: Blockchain database version info: chain: 9, compression: 1, block index: 3
...

HDD:

2021-01-10 16:16:48.859 [INF] CHAN: Done migrating utxoset.  Total entries: 659062 in 14.846s
2021-01-10 16:16:48.885 [INF] CHAN: Removing old utxoset entries...
2021-01-10 16:16:51.157 [INF] CHAN: Deleted 659062 keys (659062 total) from old utxoset
2021-01-10 16:16:51.180 [INF] CHAN: Done removing old utxoset entries in 2.295s
...
2021-01-10 16:57:41.320 [INF] CHAN: Done migrating spend journal.  Total entries: 518326 in 40m29.163s
2021-01-10 16:57:41.320 [INF] CHAN: Removing temp data...
...
2021-01-10 16:59:50.088 [INF] CHAN: Done removing temp data in 2m8.767s
...
2021-01-10 17:00:43.481 [INF] CHAN: Done removing old spend journal in 53.393s
2021-01-10 17:00:47.981 [INF] CHAN: Done upgrading database in 44m13.969s.
2021-01-10 17:00:48.520 [INF] CHAN: Blockchain database version info: chain: 9, compression: 1, block index: 3

Memory usage peaked around 800MB.

Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than a typo.

blockchain/upgrade.go Outdated Show resolved Hide resolved
This modifies the utxoset in the database and related UtxoViewpoint to
store and work with unspent transaction outputs on a per-output basis
instead of at a transaction level.

The primary motivation is to simplify the code, pave the way for a
utxo cache, and generally focus on optimizing runtime performance.

The tradeoff is that this approach does somewhat increase the size of
the serialized utxoset since it means that the transaction hash is
duplicated for each output as a part of the key and some additional
details are duplicated in each output.  The details duplicated in each
output include flags encoded into a single byte that specify whether the
containing transaction is a coinbase, whether the containing transaction
has an expiry, and the transaction type.  Additionally, the containing
block height and index are stored in each output.

However, in practice, the size difference isn't all that large, disk
space is relatively cheap, certainly cheaper than memory, and it is much
more important to provide more efficient runtime operation since that is
the ultimate purpose of the daemon.

While performing this conversion, it also simplifies the code to remove
the transaction version information from the utxoset as well as the
spend journal.  The logic for only serializing it under certain
circumstances is complicated, and it was only used for the gettxout RPC,
where it has already been removed.

The utxo set and spend journal in the database are automatically
migrated to the new format with this commit and it is possible to
interrupt and resume the migration process.

Finally, it also updates all references and tests that previously dealt
with transaction hashes to use outpoints instead.

An overview of the changes are as follows:

- Remove transaction version from both spent and unspent output entries
  - Update utxo serialization format to exclude the version
  - Update spend journal serialization format to exclude the version
- Convert UtxoEntry to represent a specific utxo instead of a
  transaction with all remaining utxos
  - Optimize for memory usage with an eye toward a utxo cache
    - Combine fields such as whether the containing transaction is a
      coinbase, whether the containing transaction has an expiry, and
      the transaction type into a single byte
    - Align entry fields to eliminate extra padding since ultimately
      there will be a lot of these in memory
    - Introduce a free list for serializing an outpoint to the database
      key format to significantly reduce pressure on the GC
  - Update entries to be keyed by a <hash><tree><output index> outpoint
    rather than just a tx hash
  - Update all related functions that previously dealt with transaction
    hashes to accept outpoints instead
  - Update all callers accordingly
  - Only add individually requested outputs from the mempool when
    constructing a mempool view
- Modify the spend journal to always store the encoded flags with every
  spent txout
  - Combine fields such as whether the containing transaction is a
    coinbase, whether the containing transaction has an expiry, and the
    transaction type into a single byte
    - Use 4 bits instead of 3 for the transaction type to be consistent
      with utxos. The extra bit was already unused so this doesn't take
      any additional space
  - Remove the fully spent flag
- Introduce ticketMinOuts in place of stakeExtra
  - Renamed stakeExtra as ticketMinOuts and updated all comments to make
    the purpose of the field clearer
  - Only store ticketMinOuts for ticket submission outputs
  - Add TicketMinimalOutputs function on UtxoEntry in place of
    ConvertUtxosToMinimalOutputs
- Always decompress data loaded from the database now that a utxo entry
  only consists of a specific output
- Introduce upgrade code to migrate the utxo set and spend journal to
  the new format
  - Update current database version to 9
  - Update current utxo set version to 3
  - Update current spend journal version to 3
  - Introduce the ability to run upgrades after the block index has been
    loaded
- Update all tests to expect the correct encodings, remove tests that no
  longer apply, and add new ones for the new expected behavior
  - Convert old tests for the legacy utxo format deserialization code to
    test the new function that is used during upgrade
- Introduce a few new functions on UtxoViewpoint
  - AddTxOut for adding an individual txout versus all of them
  - addTxOut to handle the common code between the new AddTxOut and
    existing AddTxOuts
  - RemoveEntry for removing an individual txout
- Remove the ErrDiscordantTxTree error
  - Since utxos are now retrieved using an outpoint, which includes the
    tree, it is no longer possible to hit this error path
@davecgh davecgh merged commit e9460eb into decred:master Jan 14, 2021
@rstaudt2 rstaudt2 deleted the utxoentry-outpoints branch January 15, 2021 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database upgrade Issues and/or pull requests that involve a new database version. non-forking consensus Changes that involve modifying consensus code without causing any forking changes. optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants