Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexers: asynchronous index updates. #2219

Merged
merged 12 commits into from Oct 12, 2021
Merged

Conversation

dnldd
Copy link
Member

@dnldd dnldd commented Jun 3, 2020

This reworks index updates by introducing an index subscriber which allows optional indexers to be asynchronously updated. The Index manager which synchronously updated optional indexes has been removed.

The changes made are as follows:

  • indexers: remove index manager.

    • This removes index manager and moves helper functions defined with
      the manager to common.go.
  • indexers: add index subscriber.

    • This adds the IndexSubscriber which allows subscribing for index
      notifications as well specifying an index notification dependency or
      prerequisite for a subscriber. The IndexSubscriber relays received
      IndexNtfns to its IndexSubscriptions for processing.

    • An Index subscription can either be prerequisite or a dependency to another
      index subscription, however only independent subscriptions directly receive
      index updates from the subscriber. Dependent subscriptions process index
      notifications synchronously with their prerequisite subscription, they
      receive index subscriptions after their prerequisites are done processing.

  • indexers: refactor interfaces.

    • This removes the IndexManager interface and reworks the
      ChainQueryer and Indexer interfaces to facilitate index
      notification subscriptions and asynchronous index notifications.
  • blockchain/indexers: async transaction index.

    • This reworks the TransactionIndex to implement the updated Indexer
      interface for asynchronous index updates. This index subscribes
      with no prerequisites.

    • Associated tests for the transaction index have also been added.

  • blockchain/indexers: update address index.

    • This updates the AddressIndex to implement the updated Indexer
      interface for index updates. The index however subscribes with the
      TransactionIndex as a prerequisite.

      • This also adds the spend consumer for use by indexers
        that need to lookup spend journal entries.
    • Associated tests for the address index have also been added.

  • blockchain/indexers: async transaction index.

    • This reworks the ExistsAddressIndex to implement the updated Indexer
      interface for asynchoronous index updates. This index subscribes
      with no prerequisites.

    • Associated tests for the exists addresss index have also been added.

  • multi: integrate index subscriber.

    • This removes last set of references to IndexManager and integrates
      the IndexSubscriber into the chain. All indexes have now been updated to
      subscribe to the index subscriber for updates.
  • internal/rpcserver: wait for sync on rpc request.

    • This updates rpc handlers to wait for their associated indexes to
      sync if needed before responding to a request.

    • Associated handler tests have been updated to cover the added scenarios.

  • cmd/addblock: update block importer.

    • This updates the blockImporter to use an IndexSubscriber in order to
      leverage async indexes.

Resolves #1470.

Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you expanding a bit on how this fits into the bigger picture? It really doesn't make sense to me, because an index will need to track the current tip, but that needs to be done by hash, not height. A height is not unique and thus does not provide enough information to tell you how to recover at startup.

@dnldd
Copy link
Member Author

dnldd commented Jun 5, 2020

I didn't factor in recover on startup via the tip height, that's why I'm tracking the tip height here and not the hash, will correct. My plan is to update the other optional indexes to track the tip and update the index manager to asynchronously receive updates for processing by enabled indexes.

@dnldd dnldd changed the title indexers: track txindex tip height. indexers: aynchronous index updates. Jun 10, 2020
@dnldd dnldd changed the title indexers: aynchronous index updates. indexers: asynchronous index updates. Jun 10, 2020
@dnldd dnldd marked this pull request as draft June 10, 2020 17:11
@davecgh
Copy link
Member

davecgh commented Jun 14, 2020

I'll wait a bit until you get further before making a final judgement, but, my gut feeling is that the approach of trying to make the manager asynchronous with a message channel for connect/disconnect is not going to work out super well. Without going all of the various reasons why, I think it's sufficient to consider that the primary issue is likely to be that it is still relying on being able to send connect/disconnect messages from within the database transaction and, even though it does the indexing of that particular block asynchronous, it will just end up blocking on the channel send when the next block is processed if the previous index operation is ongoing.

More generally, the existing infrastructure is designed specifically for synchronous indexing and under the assumption that it all happens under an atomic database transaction. The entire goal #1470 is to make that no longer required, which means just making the existing infrastructure run in a goroutine isn't likely to work well, since it will almost positively lead to needing a bunch of less than ideal hacks and can kicking to make it work instead of actually having a solid asynchronous design.

Further, there are indexes which are required to be in sync for proper consensus operation, such as cfilterv2, so you can't really just make them all async.

@dnldd
Copy link
Member Author

dnldd commented Jun 15, 2020

Noted, thanks for taking a look. Will rethink my approach and tinker some more.

@dnldd dnldd force-pushed the track_tip_txindex branch 4 times, most recently from 617e5b1 to 107811f Compare June 26, 2020 17:45
@dnldd dnldd marked this pull request as ready for review June 26, 2020 17:48
@davecgh davecgh added this to the 1.7.0 milestone Jun 27, 2020
Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to review this and noticed something that I figured I'd push a review for right away so you can fix it.

blockchain/chain.go Outdated Show resolved Hide resolved
@dnldd
Copy link
Member Author

dnldd commented Nov 19, 2020

Coverages for updated rpc tests.

❯ go tool cover -func=cov.out | grep -E "handleExistsAddress"
github.com/decred/dcrd/internal/rpcserver/rpcserver.go:1479:    handleExistsAddress              100.0%


❯ go tool cover -func=cov.out | grep -E "handleExistsAddresses" 
github.com/decred/dcrd/internal/rpcserver/rpcserver.go:1531:    handleExistsAddresses            100.0%


❯ go tool cover -func=cov.out | grep -E "handleGetCFilter" 
github.com/decred/dcrd/internal/rpcserver/rpcserver.go:2318:    handleGetCFilter         100.0%

❯ go tool cover -func=cov.out | grep -E "handleGetCFilterHeader"github.com/decred/dcrd/internal/rpcserver/rpcserver.go:2387:    handleGetCFilterHeader           100.0%

❯ go tool cover -func=cov.out | grep -E "handleSearchRawTransactions"
github.com/decred/dcrd/internal/rpcserver/rpcserver.go:4382:    handleSearchRawTransactions      99.3%

@dnldd
Copy link
Member Author

dnldd commented Nov 19, 2020

handleGetRawTransaction does not have an rpc test yet, will get that done and make the needed modifications.

Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed all updates and tested several cases such as dropping the transaction index and ensuring it also drops the dependent address index, performing a full reindex, and ensuring the blocking behavior works as intended.

The final review commit needs to be squashed into the appropriate commits to which the changes apply and then I'll get this merged.

@davecgh
Copy link
Member

davecgh commented Sep 15, 2021

Well, actually I see @rstaudt2 still has some changes requested that probably have been addressed, but I'd like to give him an opportunity to confirm before merging.

Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is missing calling RemoveSpendConsumerDependency from the indexers (so spend journal entries are never removed for disconnected blocks when indexes are enabled).

Outside of that, this looks good overall to me. I also confirmed that the panic that can be reproduced on master with these steps is now fixed.

blockchain/indexers/common.go Outdated Show resolved Hide resolved
blockchain/indexers/common.go Outdated Show resolved Hide resolved
blockchain/indexers/addrindex_test.go Outdated Show resolved Hide resolved
blockchain/indexers/addrindex_test.go Outdated Show resolved Hide resolved
blockchain/indexers/txindex_test.go Outdated Show resolved Hide resolved
blockchain/chain.go Outdated Show resolved Hide resolved
@davecgh
Copy link
Member

davecgh commented Sep 15, 2021

I was doing some profiling on this with a full index and it appears it is holding onto memory that it shouldn't be. See:

image

@rstaudt2
Copy link
Member

rstaudt2 commented Sep 15, 2021

I was doing some profiling on this with a full index and it appears it is holding onto memory that it shouldn't be. See:

image

Probably also due to RemoveSpendConsumerDependency not being called since that is what removes the dependencies being tracked in memory.

Edit: Actually, it’s probably unrelated if you were just doing a full sync without disconnecting blocks.

@davecgh
Copy link
Member

davecgh commented Sep 15, 2021

The relevant source from the profile:

github.com/decred/dcrd/blockchain/v4/indexers.(*IndexSubscriber).CatchUp

src/github.com/decred/dcrd/blockchain/indexers/indexsubscriber.go

Total:           0   171.37MB (flat, cum) 27.24%
275            .          .           			} 
276            .          .           		} else { 
277            .          .           			parent = cachedParent 
278            .          .           		} 
279            .          .            
280            .   169.36MB           		child, err := queryer.BlockByHash(hash) 
281            .          .           		if err != nil { 
282            .          .           			return err 
283            .          .           		} 
284            .          .            
285            .          .           		// Construct and send the index notification. 
286            .          .           		var prevScripts PrevScripter 
287            .     2.01MB           		err = db.View(func(dbTx database.Tx) error { 

In other words it's holding onto ~170M worth of block data. I haven't dug into it at all, but my first guess would be that the indexing is creating and holding on to references to the scripts within the blocks and because they're interior pointers, that naturally means the blocks can't be garbage collected.

@dnldd
Copy link
Member Author

dnldd commented Oct 5, 2021

CatchUp should not be holding on too long to pkScript references with the latest commit.

blockchain/chain.go Outdated Show resolved Hide resolved
Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retested with the latest and did not see any issues or significant difference in memory usage. Looks good to me.

Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the latest changes as well as tested everything and can confirm the memory issue is resolved. With all indexes enabled as well as both cpu and mem profiling, the peak usage reported by the OS on a full initial chain sync is 1.19GiB.

So, code and functionality wise, everything looks good to go.

The last outstanding thing is the commit messages. I see some of them aren't properly wrapped per the code contribution guidelines, and others look to be wrapped too short. Please amend them to make them properly conform and I'll get this merged.

Nice work!

@dnldd
Copy link
Member Author

dnldd commented Oct 12, 2021

Thanks, will get the outstanding issues sorted this evening.

This removes index manager and moves helper functions defined with the
manager to common.go.
This adds the IndexSubscriber which allows subscribing for index
notifications as well as specifying an index notification dependency
or prerequisite for a subscriber.
This removes the IndexManager interface and reworks the ChainQueryer
and Indexer interfaces to facilitate index notification subscriptions
and asynchronous index notifications.
This reworks the TransactionIndex to implement the updated Indexer
interface for asynchoronous index updates. This index subscribes with
no prequisites.
This updates the AddressIndex to implement the updated Indexer
interface for index updates. The index however subscribes with the
transaction index as a prerequisite.

This also adds the spend consumer for use by indexers that need to
lookup spend journal entries.
This reworks the ExistsAddressIndex to implement the updated Indexer
interface for asynchoronous index updates. This index subscribes with
no prerequisites.
This removes last set of references to IndexManager and integrates the
IndexSubscriber into the chain. All indexes have now been updated to
subscribe to the index subscriber for updates.
This updates rpc handlers to wait for their associated indexes to sync
if needed before responding to a request.
This updates the blockImporter to use an IndexSubscriber in order to
leverage async indexes.
This updates the index subscriber catchup function to update
subscriptions directly with notification instead of going through the
Run lifecycle process which avoids a case of receiving out of sync
index notifications while catch up is in progress.

Multiple typos in error type tests and review issues have also been
fixed in this commit.
This integrates the spend pruner's RemoveSpendConsumerDependency method
in the chaing query adapter to allow consumers access to the
functionality. The address index has been updated as such to remove its
spend journal dependencies when it processes block disconnections.
This refactors finding the lowest index tip height out of CatchUp into
findLowestIndexTipHeight. The prevscripter snapshot now makes copies of
the pkScript to avoid holding on to references too long during the
catch up process.
@davecgh davecgh merged commit 6f41582 into decred:master Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blockchain: Make indexing asynchronous.
3 participants