-
Notifications
You must be signed in to change notification settings - Fork 4
WIP: add rpc pubsub interface for account diffs #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
252655f to
9b0042f
Compare
|
Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data? |
Hey @i-norden - not at all! If anything, I'd say we're more likely to scrap this 🙂 My main motivation with this PR is just to see if we might be able to reduce the diff and stage the PRs incrementally, with the ultimate goal of merging in all of your work. The approach here is a little bit different than the statediffing service only because this seemed like the absolute minimum needed if all we care about are account diffs. That said I haven't had a chance to run this code or write tests, so I was hoping to pick your and Elizabeth's brain about whether it looked like this approach could plausibly bear fruit. Definitely no worries if you don't have the time to give this a thorough review, but I'd appreciate any thoughts you can spare! |
elizabethengelman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems like it could be a cool way to implement this. A couple of questions I have:
- Instead of subscribing to chain events in the state diff loop, is the idea that we could instead subscribe to new state diffs, and use that as an indication that we should compute storage diffs?
- I like the idea of breaking the state diff service up into smaller PRs to open upstream - hopefully that'll help the geth maintainers review them a bit easer. But, the one thing that I'm thinking is I wonder if this functionality (subscribing to state diffs) can/should stand on it's own, or if we should also include the storage diff computation in this PR as well. And now that I've typed that out, I think I've convinced myself that the storage diff piece could be a separate PR - being able to subscribe to changed accounts seems really valuable in and of itself. I guess I'd love to open a state diff PR and a storage diff PR in quick succession to add some weight to their utility. Also we need them both for mcd work. :)
core/blockchain.go
Outdated
| "root", block.Root()) | ||
|
|
||
| coalescedLogs = append(coalescedLogs, logs...) | ||
| stateDiffs = processedStateDiffs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I'm reading this method correctly, but it seems like everything above this point would end up returning an empty collection of stateDiffs, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's accurate - I'm trying to follow the pattern that's used for logs, where the code returns the empty coalescedLogs until it reaches the line directly above this one where logs are appended to the result set.
A bit of context, basically this entire approach is designed to mimic my understanding of how the logs subscription works. Walking through it, my understanding is that:
- the event system in eth/filters/filter_system.go subscribes to log events
- the eth api backend in that receives that call creates a subscription on the blockchain
- the blockchain adds the subscription to its internal logs feed
- that logs feed is sent data when chain events are posted
- chain events are posted after data is inserted to the chain
- inserting data into the chain and returning logs involves executing the blockchain's state processor
- the state processor that yields the logs is responsible for applying all transactions and finalizing the block
- after the filter system receives all logs, it is responsible for filtering down to logs matching the subscription
So, big picture, my goal was to do the same thing - hook into the state processor to return all modified accounts, and enable the rpc subscription to apply arbitrary filters.
core/state/statedb.go
Outdated
| s.refund = 0 | ||
| } | ||
|
|
||
| func (s *StateDB) GetDirtyAccounts() map[common.Address]Account { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this way of determining state diffs is really cool, and seems like it could be more efficient than using the trie iterator. Though at the moment I'm a bit weary about using this over the trie iterator, because I'm worried we may missing something. This worry is almost entirely because I don't fully understand it yet, but I wonder if there's a way we could compare the dirty state diffs this returns, vs what the trie iterator returns to make sure the results are the same.
Also, is GetDirtyAccounts called before or after the block is finalized? I wonder if that could have any implications on what is considered a dirty account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely down for comparing results to make sure we're not missing anything 👍
I think that this approach should be at least as reliable as a pubsub rpc subscription to filtered logs, but am not sure whether that meets our requirements (have heard anecdotes about that subscription failing to return desired data).
In terms of efficiency, I do think there's some additional cost to the sync process in iterating through all the modified accounts again (something that also happens when state changes are committed after block processing), so maybe it's a good idea to hook into that call and derive modified accounts from there? 🤔
Right now, GetDirtyAccounts is called after the block is finalized but before the state is committed (since committing clears the cache). I think all accounts involved in a transaction should be marked as dirty as transactions are applied, but that's a little bit hard to trace so I'll see if I can dig in and document that thoroughly.
| return fb.bc.SubscribeLogsEvent(ch) | ||
| } | ||
|
|
||
| func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason that we're adding SubscribeStateDiffs to a bunch of places is because they all implement the Backend interface?
It's confusing to know which backend we'll actually want to use for VDB. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you got it. I think we want to (and do) use the EthApiBackend with a subscription to a full (non-fast-syncing) node, where all transactions are applied/verified.
| logs: logs, | ||
| hashes: make(chan []common.Hash), | ||
| headers: make(chan *types.Header), | ||
| stateDiffs: make(chan map[common.Address]state.Account), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we are adding stateDiffs to the subscribeLogs? Is it because it's in the subscription struct, but it can't be nil so we're adding an empty map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just following the pattern here, seems like every subscription creates empty channels for all the things it's not subscribing to.
| "time" | ||
|
|
||
| ethereum "github.com/ethereum/go-ethereum" | ||
| "github.com/ethereum/go-ethereum/common" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what the filters package is for - is it just another way to subscribe to new events in the chain? I think I've previously only interacted with new event subscriptions directly from the blockchain package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's designed to provide a public API for the blockchain's internal subscriptions, but one big question I've had is how functions in this namespace (like SubscribeNewHeads) are actually called - will definitely need to sort that out before we can upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the filters api is being loaded under the "eth" namespace here. So I think you would call it like:
cli, _ := rpc.Dial("ipcPathOrWsURL")
stateDiffChan := make(chan map[common.Address]state.Account, 20000)
rpcSub, err := cli.Subscribe(context.Background(), "eth", stateDiffChan, "NewStateDiffs"})or use the EthSubscribe method since it is under that namespace.
|
Thanks for taking a look @elizabethengelman - and sorry for my novella-level responses! 😂 The basic idea is that we could indeed support a subscription in the API without needing to configure via special CLI flags, and I'd like to have a storage diffs PR ready to roll in short order after this submitted. Hoping that will just be adding a function similar to |
|
Hey @rmulhol thanks for the quick response! And I really like this approach, it is much cleaner than the separate service and I think much more likely to be accepted into the main branch. My only concern is that in order to recapitulate the statediff service in full we end up adding in more complexity with this approach than with the service, that concern is based on the assumption that we create a separate subscription for storage diffs and then would also need to create a separate subscription for the rest of the block data (I believe there is already a subscription for headers and transactions but there is not one for receipts), which would mean upwards of 5 subscriptions to recapitulate the single subscription. And then splitting the data up across subscriptions also means additional checks on the seed node's end to match the data back together again. |
|
@i-norden that makes sense. I think for the interim we should probably setup the seed node with your fork, and then see what we can get upstreamed to determine whether we eventually want to make the switch. If we were going to get there incrementally, I would think it would entail having just one or a small number of subscriptions that included all the things. So like next up I was thinking would be storage diffs, and I'm hoping it'll be possible to make one subscription that's state and storage diffs filtered by address |
- avoids looping through dirty state objects twice
Looking for early feedback on this very early stage WIP PR.
The idea is to have a minimal rpc pubsub interface for emitting account diffs, so that we could try to incrementally upstream things like storage diffs, proofs, etc.
I structured the approach based on how I understand the logs pubsub rpc interface to be working - basically the
BlockChaincollects all logs created during block processing and posts them to theEventSystem, which is responsible for filtering down to relevant logs and notifying subscribers.Still need to figure out how the
eth/filters/api.gofunctions are called, which should enable testing this out locally. After that I'd like to add in some metadata (block number/hash) and enable clients to filter down to specific watched accounts.