Skip to content

Conversation

@rmulhol
Copy link

@rmulhol rmulhol commented Sep 11, 2019

Looking for early feedback on this very early stage WIP PR.

The idea is to have a minimal rpc pubsub interface for emitting account diffs, so that we could try to incrementally upstream things like storage diffs, proofs, etc.

I structured the approach based on how I understand the logs pubsub rpc interface to be working - basically the BlockChain collects all logs created during block processing and posts them to the EventSystem, which is responsible for filtering down to relevant logs and notifying subscribers.

Still need to figure out how the eth/filters/api.go functions are called, which should enable testing this out locally. After that I'd like to add in some metadata (block number/hash) and enable clients to filter down to specific watched accounts.

@rmulhol rmulhol force-pushed the pubsub-rpc-account-diffing branch from 252655f to 9b0042f Compare September 11, 2019 21:13
@i-norden
Copy link
Collaborator

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?

@rmulhol
Copy link
Author

rmulhol commented Sep 12, 2019

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?

Hey @i-norden - not at all! If anything, I'd say we're more likely to scrap this 🙂

My main motivation with this PR is just to see if we might be able to reduce the diff and stage the PRs incrementally, with the ultimate goal of merging in all of your work. The approach here is a little bit different than the statediffing service only because this seemed like the absolute minimum needed if all we care about are account diffs.

That said I haven't had a chance to run this code or write tests, so I was hoping to pick your and Elizabeth's brain about whether it looked like this approach could plausibly bear fruit.

Definitely no worries if you don't have the time to give this a thorough review, but I'd appreciate any thoughts you can spare!

Copy link

@elizabethengelman elizabethengelman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems like it could be a cool way to implement this. A couple of questions I have:

  • Instead of subscribing to chain events in the state diff loop, is the idea that we could instead subscribe to new state diffs, and use that as an indication that we should compute storage diffs?
  • I like the idea of breaking the state diff service up into smaller PRs to open upstream - hopefully that'll help the geth maintainers review them a bit easer. But, the one thing that I'm thinking is I wonder if this functionality (subscribing to state diffs) can/should stand on it's own, or if we should also include the storage diff computation in this PR as well. And now that I've typed that out, I think I've convinced myself that the storage diff piece could be a separate PR - being able to subscribe to changed accounts seems really valuable in and of itself. I guess I'd love to open a state diff PR and a storage diff PR in quick succession to add some weight to their utility. Also we need them both for mcd work. :)

"root", block.Root())

coalescedLogs = append(coalescedLogs, logs...)
stateDiffs = processedStateDiffs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm reading this method correctly, but it seems like everything above this point would end up returning an empty collection of stateDiffs, is that correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's accurate - I'm trying to follow the pattern that's used for logs, where the code returns the empty coalescedLogs until it reaches the line directly above this one where logs are appended to the result set.

A bit of context, basically this entire approach is designed to mimic my understanding of how the logs subscription works. Walking through it, my understanding is that:

So, big picture, my goal was to do the same thing - hook into the state processor to return all modified accounts, and enable the rpc subscription to apply arbitrary filters.

s.refund = 0
}

func (s *StateDB) GetDirtyAccounts() map[common.Address]Account {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this way of determining state diffs is really cool, and seems like it could be more efficient than using the trie iterator. Though at the moment I'm a bit weary about using this over the trie iterator, because I'm worried we may missing something. This worry is almost entirely because I don't fully understand it yet, but I wonder if there's a way we could compare the dirty state diffs this returns, vs what the trie iterator returns to make sure the results are the same.

Also, is GetDirtyAccounts called before or after the block is finalized? I wonder if that could have any implications on what is considered a dirty account.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely down for comparing results to make sure we're not missing anything 👍

I think that this approach should be at least as reliable as a pubsub rpc subscription to filtered logs, but am not sure whether that meets our requirements (have heard anecdotes about that subscription failing to return desired data).

In terms of efficiency, I do think there's some additional cost to the sync process in iterating through all the modified accounts again (something that also happens when state changes are committed after block processing), so maybe it's a good idea to hook into that call and derive modified accounts from there? 🤔

Right now, GetDirtyAccounts is called after the block is finalized but before the state is committed (since committing clears the cache). I think all accounts involved in a transaction should be marked as dirty as transactions are applied, but that's a little bit hard to trace so I'll see if I can dig in and document that thoroughly.

return fb.bc.SubscribeLogsEvent(ch)
}

func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription {
Copy link

@elizabethengelman elizabethengelman Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason that we're adding SubscribeStateDiffs to a bunch of places is because they all implement the Backend interface?

It's confusing to know which backend we'll actually want to use for VDB. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you got it. I think we want to (and do) use the EthApiBackend with a subscription to a full (non-fast-syncing) node, where all transactions are applied/verified.

logs: logs,
hashes: make(chan []common.Hash),
headers: make(chan *types.Header),
stateDiffs: make(chan map[common.Address]state.Account),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we are adding stateDiffs to the subscribeLogs? Is it because it's in the subscription struct, but it can't be nil so we're adding an empty map?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just following the pattern here, seems like every subscription creates empty channels for all the things it's not subscribing to.

"time"

ethereum "github.com/ethereum/go-ethereum"
"github.com/ethereum/go-ethereum/common"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what the filters package is for - is it just another way to subscribe to new events in the chain? I think I've previously only interacted with new event subscriptions directly from the blockchain package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's designed to provide a public API for the blockchain's internal subscriptions, but one big question I've had is how functions in this namespace (like SubscribeNewHeads) are actually called - will definitely need to sort that out before we can upstream

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the filters api is being loaded under the "eth" namespace here. So I think you would call it like:

cli, _ := rpc.Dial("ipcPathOrWsURL")
stateDiffChan := make(chan map[common.Address]state.Account, 20000)
rpcSub, err := cli.Subscribe(context.Background(), "eth", stateDiffChan, "NewStateDiffs"})

or use the EthSubscribe method since it is under that namespace.

@rmulhol
Copy link
Author

rmulhol commented Sep 12, 2019

Thanks for taking a look @elizabethengelman - and sorry for my novella-level responses! 😂

The basic idea is that we could indeed support a subscription in the API without needing to configure via special CLI flags, and I'd like to have a storage diffs PR ready to roll in short order after this submitted. Hoping that will just be adding a function similar to GetDirtyAccounts (or returning more data from statedb.Commit if we end up deriving the data from there - an idea I'm inclined to pursue).

@i-norden
Copy link
Collaborator

Hey @rmulhol thanks for the quick response! And I really like this approach, it is much cleaner than the separate service and I think much more likely to be accepted into the main branch. My only concern is that in order to recapitulate the statediff service in full we end up adding in more complexity with this approach than with the service, that concern is based on the assumption that we create a separate subscription for storage diffs and then would also need to create a separate subscription for the rest of the block data (I believe there is already a subscription for headers and transactions but there is not one for receipts), which would mean upwards of 5 subscriptions to recapitulate the single subscription. And then splitting the data up across subscriptions also means additional checks on the seed node's end to match the data back together again.

@rmulhol
Copy link
Author

rmulhol commented Sep 12, 2019

@i-norden that makes sense. I think for the interim we should probably setup the seed node with your fork, and then see what we can get upstreamed to determine whether we eventually want to make the switch. If we were going to get there incrementally, I would think it would entail having just one or a small number of subscriptions that included all the things. So like next up I was thinking would be storage diffs, and I'm hoping it'll be possible to make one subscription that's state and storage diffs filtered by address

- avoids looping through dirty state objects twice
i-norden pushed a commit that referenced this pull request Aug 31, 2020
* Adding CD to geth
@i-norden i-norden closed this Sep 8, 2020
@AFDudley AFDudley deleted the pubsub-rpc-account-diffing branch September 30, 2020 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants