WIP: add rpc pubsub interface for account diffs #11

rmulhol · 2019-09-11T21:10:17Z

Looking for early feedback on this very early stage WIP PR.

The idea is to have a minimal rpc pubsub interface for emitting account diffs, so that we could try to incrementally upstream things like storage diffs, proofs, etc.

I structured the approach based on how I understand the logs pubsub rpc interface to be working - basically the BlockChain collects all logs created during block processing and posts them to the EventSystem, which is responsible for filtering down to relevant logs and notifying subscribers.

Still need to figure out how the eth/filters/api.go functions are called, which should enable testing this out locally. After that I'd like to add in some metadata (block number/hash) and enable clients to filter down to specific watched accounts.

i-norden · 2019-09-12T00:06:56Z

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?

rmulhol · 2019-09-12T01:11:40Z

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?

Hey @i-norden - not at all! If anything, I'd say we're more likely to scrap this 🙂

My main motivation with this PR is just to see if we might be able to reduce the diff and stage the PRs incrementally, with the ultimate goal of merging in all of your work. The approach here is a little bit different than the statediffing service only because this seemed like the absolute minimum needed if all we care about are account diffs.

That said I haven't had a chance to run this code or write tests, so I was hoping to pick your and Elizabeth's brain about whether it looked like this approach could plausibly bear fruit.

Definitely no worries if you don't have the time to give this a thorough review, but I'd appreciate any thoughts you can spare!

elizabethengelman

Overall this seems like it could be a cool way to implement this. A couple of questions I have:

Instead of subscribing to chain events in the state diff loop, is the idea that we could instead subscribe to new state diffs, and use that as an indication that we should compute storage diffs?
I like the idea of breaking the state diff service up into smaller PRs to open upstream - hopefully that'll help the geth maintainers review them a bit easer. But, the one thing that I'm thinking is I wonder if this functionality (subscribing to state diffs) can/should stand on it's own, or if we should also include the storage diff computation in this PR as well. And now that I've typed that out, I think I've convinced myself that the storage diff piece could be a separate PR - being able to subscribe to changed accounts seems really valuable in and of itself. I guess I'd love to open a state diff PR and a storage diff PR in quick succession to add some weight to their utility. Also we need them both for mcd work. :)

elizabethengelman · 2019-09-12T13:44:02Z

core/blockchain.go

 				"root", block.Root())

 			coalescedLogs = append(coalescedLogs, logs...)
+			stateDiffs = processedStateDiffs


Not sure if I'm reading this method correctly, but it seems like everything above this point would end up returning an empty collection of stateDiffs, is that correct?

yeah that's accurate - I'm trying to follow the pattern that's used for logs, where the code returns the empty coalescedLogs until it reaches the line directly above this one where logs are appended to the result set.

A bit of context, basically this entire approach is designed to mimic my understanding of how the logs subscription works. Walking through it, my understanding is that:

the event system in eth/filters/filter_system.go subscribes to log events

the eth api backend in that receives that call creates a subscription on the blockchain

the blockchain adds the subscription to its internal logs feed

that logs feed is sent data when chain events are posted

chain events are posted after data is inserted to the chain

inserting data into the chain and returning logs involves executing the blockchain's state processor

the state processor that yields the logs is responsible for applying all transactions and finalizing the block

after the filter system receives all logs, it is responsible for filtering down to logs matching the subscription

So, big picture, my goal was to do the same thing - hook into the state processor to return all modified accounts, and enable the rpc subscription to apply arbitrary filters.

elizabethengelman · 2019-09-12T14:08:43Z

core/state/statedb.go

 	s.refund = 0
 }

+func (s *StateDB) GetDirtyAccounts() map[common.Address]Account {


I think that this way of determining state diffs is really cool, and seems like it could be more efficient than using the trie iterator. Though at the moment I'm a bit weary about using this over the trie iterator, because I'm worried we may missing something. This worry is almost entirely because I don't fully understand it yet, but I wonder if there's a way we could compare the dirty state diffs this returns, vs what the trie iterator returns to make sure the results are the same.

Also, is GetDirtyAccounts called before or after the block is finalized? I wonder if that could have any implications on what is considered a dirty account.

Definitely down for comparing results to make sure we're not missing anything 👍

I think that this approach should be at least as reliable as a pubsub rpc subscription to filtered logs, but am not sure whether that meets our requirements (have heard anecdotes about that subscription failing to return desired data).

In terms of efficiency, I do think there's some additional cost to the sync process in iterating through all the modified accounts again (something that also happens when state changes are committed after block processing), so maybe it's a good idea to hook into that call and derive modified accounts from there? 🤔

Right now, GetDirtyAccounts is called after the block is finalized but before the state is committed (since committing clears the cache). I think all accounts involved in a transaction should be marked as dirty as transactions are applied, but that's a little bit hard to trace so I'll see if I can dig in and document that thoroughly.

elizabethengelman · 2019-09-12T14:26:02Z

accounts/abi/bind/backends/simulated.go

 	return fb.bc.SubscribeLogsEvent(ch)
 }

+func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription {


Is the reason that we're adding SubscribeStateDiffs to a bunch of places is because they all implement the Backend interface?

It's confusing to know which backend we'll actually want to use for VDB. 🤔

Yeah, you got it. I think we want to (and do) use the EthApiBackend with a subscription to a full (non-fast-syncing) node, where all transactions are applied/verified.

elizabethengelman · 2019-09-12T14:43:08Z

eth/filters/filter_system.go

+		logs:       logs,
+		hashes:     make(chan []common.Hash),
+		headers:    make(chan *types.Header),
+		stateDiffs: make(chan map[common.Address]state.Account),


How come we are adding stateDiffs to the subscribeLogs? Is it because it's in the subscription struct, but it can't be nil so we're adding an empty map?

Yeah just following the pattern here, seems like every subscription creates empty channels for all the things it's not subscribing to.

elizabethengelman · 2019-09-12T15:17:15Z

eth/filters/filter_system.go

 	"time"

 	ethereum "github.com/ethereum/go-ethereum"
 	"github.com/ethereum/go-ethereum/common"


I'm not sure I understand what the filters package is for - is it just another way to subscribe to new events in the chain? I think I've previously only interacted with new event subscriptions directly from the blockchain package.

I think it's designed to provide a public API for the blockchain's internal subscriptions, but one big question I've had is how functions in this namespace (like SubscribeNewHeads) are actually called - will definitely need to sort that out before we can upstream

It looks like the filters api is being loaded under the "eth" namespace here. So I think you would call it like:

cli, _ := rpc.Dial("ipcPathOrWsURL") stateDiffChan := make(chan map[common.Address]state.Account, 20000) rpcSub, err := cli.Subscribe(context.Background(), "eth", stateDiffChan, "NewStateDiffs"})

or use the EthSubscribe method since it is under that namespace.

rmulhol · 2019-09-12T16:26:30Z

Thanks for taking a look @elizabethengelman - and sorry for my novella-level responses! 😂

The basic idea is that we could indeed support a subscription in the API without needing to configure via special CLI flags, and I'd like to have a storage diffs PR ready to roll in short order after this submitted. Hoping that will just be adding a function similar to GetDirtyAccounts (or returning more data from statedb.Commit if we end up deriving the data from there - an idea I'm inclined to pursue).

i-norden · 2019-09-12T17:59:09Z

Hey @rmulhol thanks for the quick response! And I really like this approach, it is much cleaner than the separate service and I think much more likely to be accepted into the main branch. My only concern is that in order to recapitulate the statediff service in full we end up adding in more complexity with this approach than with the service, that concern is based on the assumption that we create a separate subscription for storage diffs and then would also need to create a separate subscription for the rest of the block data (I believe there is already a subscription for headers and transactions but there is not one for receipts), which would mean upwards of 5 subscriptions to recapitulate the single subscription. And then splitting the data up across subscriptions also means additional checks on the seed node's end to match the data back together again.

rmulhol · 2019-09-12T18:19:10Z

@i-norden that makes sense. I think for the interim we should probably setup the seed node with your fork, and then see what we can get upstreamed to determine whether we eventually want to make the switch. If we were going to get there incrementally, I would think it would entail having just one or a small number of subscriptions that included all the things. So like next up I was thinking would be storage diffs, and I'm hoping it'll be possible to make one subscription that's state and storage diffs filtered by address

- avoids looping through dirty state objects twice

* Adding CD to geth

rmulhol requested review from elizabethengelman and i-norden September 11, 2019 21:10

WIP: add rpc pubsub interface for account diffs

9b0042f

rmulhol force-pushed the pubsub-rpc-account-diffing branch from 252655f to 9b0042f Compare September 11, 2019 21:13

elizabethengelman reviewed Sep 12, 2019

View reviewed changes

Derive state diffs from Commit rather than Process

9691ef8

- avoids looping through dirty state objects twice

rmulhol mentioned this pull request Oct 1, 2019

Emit storage diffs during full sync to rpc subscription #13

Closed

i-norden pushed a commit that referenced this pull request Aug 31, 2020

Adding CD to geth (#11)

fa42b78

* Adding CD to geth

i-norden closed this Sep 8, 2020

AFDudley deleted the pubsub-rpc-account-diffing branch September 30, 2020 13:24

WIP: add rpc pubsub interface for account diffs #11

WIP: add rpc pubsub interface for account diffs #11

Uh oh!

Conversation

rmulhol commented Sep 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

i-norden commented Sep 12, 2019

Uh oh!

rmulhol commented Sep 12, 2019

Uh oh!

elizabethengelman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elizabethengelman Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmulhol commented Sep 12, 2019

Uh oh!

i-norden commented Sep 12, 2019

Uh oh!

rmulhol commented Sep 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rmulhol commented Sep 11, 2019 •

edited

Loading

elizabethengelman Sep 12, 2019 •

edited

Loading