Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build txindex in parallel with validation #13033

Merged
merged 12 commits into from Apr 26, 2018

Conversation

jimpo
Copy link
Contributor

@jimpo jimpo commented Apr 20, 2018

I'm re-opening #11857 as a new pull request because the last one stopped loading for people


This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the compact filters). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

DB changes

At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

Open questions

  • Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that getrawtransaction would return an error message saying the txindex is syncing while the migration is running.

Impact

In a sample size n=1 test where I synced nodes from scratch, the average time Index writing was 3.36ms in master and 1.72ms in this branch. The average time between UpdateTip log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

@Sjors
Copy link
Member

Sjors commented Apr 20, 2018

If anyone wants to read the old PR and has difficulty loading the page, it helps to use another browser session that's logged out.

Only comment of mine I'd like to export to this PR is the suggestion to backport "the migration code that removes txindex without requiring a reindex would be useful for folks who regret having set txindex=1 on a node with slow hardware."

Tested 3272b17637c5, including kill -9ing the migration from legacy to the new db.

@jamesob
Copy link
Member

jamesob commented Apr 23, 2018

Reverse chron. summary of ACKs from #11857:

Copy link
Member

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK 3272b17 (round 1).

@@ -0,0 +1,11 @@
Transaction index changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename file? PR changed.

@@ -0,0 +1,308 @@
// Copyright (c) 2017 The Bitcoin Core developers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, 2018? (and 2 more files)

@@ -47,6 +48,8 @@ void TxToJSON(const CTransaction& tx, const uint256 hashBlock, UniValue& entry)
TxToUniv(tx, uint256(), entry, true, RPCSerializationFlags());

if (!hashBlock.IsNull()) {
LOCK(cs_main);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid locking and lookup here, how about adding the argument CBlockIndex* pindex to TxToJSON? (I know this is only called from getrawtransaction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty against adding more to this PR. Seems like a good follow-on cleanup item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair.

/// @param[out] block_hash The hash of the block the transaction is found in.
/// @param[out] tx The raw transaction itself.
/// @return true if transaction is found, false otherwise
bool FindTx(const uint256& tx_hash, uint256& block_hash, CTransactionRef& tx) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, LookupTransaction.

/// up from far behind, this method does not block and immediately returns false.
bool BlockUntilSyncedToCurrentChain();

/// Look up a raw transaction by hash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop raw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your wish is my command.

///
/// @param[in] tx_hash The hash of the transaction to be returned.
/// @param[out] block_hash The hash of the block the transaction is found in.
/// @param[out] tx The raw transaction itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop raw?

// Need to register this ValidationInterface before running Init(), so that
// callbacks are not missed if Init sets m_synced to true.
RegisterValidationInterface(this);
if (!Init()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this case, should exit instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that logging and running without the TxIndex is better than FatalError, but I could see it the other way too. Probably a good idea to add a log line at the very least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say running without the desired functionality should exit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @promag. If the daemon can't provide the service that is requested, it should fail early. Otherwise you may just complicate debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to FatalError.

if (!coin.IsSpent()) {
pblockindex = chainActive[coin.nHeight];
break;
LOCK(cs_main);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of nesting, repeat the lock:

if (!request.params[1].isNull()) {
    hashBlock = uint256S(request.params[1].get_str());
    LOCK(cs_main);
    ...
} else {
    LOCK(cs_main);
    ...
}

}

if (last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time) {
WriteBestBlock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is called in the middle of ThreadSync(), shouldn't this be writing a locator for pindex rather than chainActive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, wow, oops. Good catch.

int64_t last_locator_write_time = 0;
while (true) {
if (m_interrupt) {
WriteBestBlock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function takes an argument now.

Jim Posen added 12 commits April 25, 2018 11:25
The new TxIndexDB class will be used by a future commit in this
change set.
The TxIndex will be responsible for building the transaction index
concurrently with the main validation thread by implementing
ValidationInterface. This does not process blocks concurrently yet.
TxIndex starts up a background thread to get in sync with the block
index before blocks are processed through the ValidationInterface.
In order to preserve getrawtransaction RPC behavior, there needs to be
a way for a thread to ensure the transaction index is in sync with the
current state of the blockchain.
Now that the transaction index is updated asynchronously, in order to
preserve the current behavior of public interfaces, the code blocks
until the transaction index is caught up with the current state of the
blockchain.
@sipa
Copy link
Member

sipa commented Apr 26, 2018

utACK 9b27047

Also compared with a re-merge of master with 806b2f1 (which had utACK from @ryanofsky and @Sjors), with a re-merge of master with 523dd76 (which had utACK from @TheBlueMatt), and with a re-merge of master with ea8be45 (which had Tested ACK from @jamesob) to find the only differences are:

  • Adding an abstracted-out function TxIndex::WriteBestBlock, which is invoked additionally after interrupting the background sync and periodically.
  • Checks in TxIndex::SetBestChain to see if callbacks are received after BlockConnected.
  • Push down cs_main locks in gettxoutproof.
  • Simplification of TxIndexDB::ReadBestBlock.
  • Changes in comments and logging

@sipa sipa merged commit 9b27047 into bitcoin:master Apr 26, 2018
sipa added a commit that referenced this pull request Apr 26, 2018
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen)
ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen)
6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen)
a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen)
e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen)
8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen)
f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen)
70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen)
94b4f8b [index] TxIndex initial sync thread. (Jim Posen)
34d68bf [index] Create new TxIndex class. (Jim Posen)
c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen)
0cb8303 [db] Create separate database for txindex. (Jim Posen)

Pull request description:

  I'm re-opening #11857 as a new pull request because the last one stopped loading for people

  -------------------------------

  This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

  ### DB changes

  At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

  ### Open questions

  - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running.

  ### Impact

  In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
@jamesob
Copy link
Member

jamesob commented Apr 26, 2018

🎉 nice work, @jimpo.

@fanquake fanquake removed this from Blockers in High-priority for review Apr 26, 2018
@jimpo jimpo deleted the txindex-refactor-take2 branch May 1, 2018 23:56
@mgrychow mgrychow mentioned this pull request Aug 23, 2018
@Sjors Sjors mentioned this pull request Sep 8, 2018
2 tasks
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 21, 2021
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen)
ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen)
6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen)
a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen)
e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen)
8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen)
f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen)
70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen)
94b4f8b [index] TxIndex initial sync thread. (Jim Posen)
34d68bf [index] Create new TxIndex class. (Jim Posen)
c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen)
0cb8303 [db] Create separate database for txindex. (Jim Posen)

Pull request description:

  I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people

  -------------------------------

  This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

  ### DB changes

  At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

  ### Open questions

  - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running.

  ### Impact

  In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 25, 2021
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen)
ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen)
6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen)
a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen)
e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen)
8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen)
f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen)
70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen)
94b4f8b [index] TxIndex initial sync thread. (Jim Posen)
34d68bf [index] Create new TxIndex class. (Jim Posen)
c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen)
0cb8303 [db] Create separate database for txindex. (Jim Posen)

Pull request description:

  I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people

  -------------------------------

  This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

  ### DB changes

  At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

  ### Open questions

  - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running.

  ### Impact

  In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 25, 2021
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen)
ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen)
6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen)
a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen)
e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen)
8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen)
f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen)
70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen)
94b4f8b [index] TxIndex initial sync thread. (Jim Posen)
34d68bf [index] Create new TxIndex class. (Jim Posen)
c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen)
0cb8303 [db] Create separate database for txindex. (Jim Posen)

Pull request description:

  I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people

  -------------------------------

  This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

  ### DB changes

  At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

  ### Open questions

  - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running.

  ### Impact

  In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Jun 5, 2021
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen)
ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen)
6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen)
a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen)
e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen)
8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen)
f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen)
70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen)
94b4f8b [index] TxIndex initial sync thread. (Jim Posen)
34d68bf [index] Create new TxIndex class. (Jim Posen)
c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen)
0cb8303 [db] Create separate database for txindex. (Jim Posen)

Pull request description:

  I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people

  -------------------------------

  This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

  ### DB changes

  At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

  ### Open questions

  - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running.

  ### Impact

  In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
UdjinM6 added a commit to dashpay/dash that referenced this pull request Jun 5, 2021
Merge bitcoin#13033: Build txindex in parallel with validation
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants