Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R4R]implement diff sync #376

Merged
merged 22 commits into from
Sep 28, 2021
Merged

[R4R]implement diff sync #376

merged 22 commits into from
Sep 28, 2021

Conversation

unclezoro
Copy link
Collaborator

@unclezoro unclezoro commented Aug 22, 2021

Description

Diff Sync of BSC

Abstract

The increasing adoption of BSC leads to a more active network. Blocks on BSC start hitting the gasceil daily and we are planning to increase the capacity of BSC further. On the other hand, the node maintainer had a hard time keeping their node catching up with the chain. We need a light sync mode to lower the hardware requirement for running a bsc fullnode.

Currently bsc has three kinds of sync mode: 1.Snap sync. 2. Fast sync. 3. Full sync.

1 and 2 are used for the initial synchronization, once the client has the entire state and all historical block data, it will switch to full sync automatically.

It takes several steps to process a block when doing full sync:

  1. Block fetcher or downloader get blocks from other peers.
  2. Verify header and block body.
  3. Execute transactions within EVM, including read data from cache/disk.
  4. Calculate the root hash of MPT.
  5. Commit MPT to memory db, persist snapshot to disk.

According to our profile, step 3 occupied 70+% of the block processing time.

This design wants to propose a diff sync protocol without executing transactions, in exchange, the security of a fullnode will degrade to light client.

Spec

WorkFlow

diffsync

Cache Difflayer and Persist Difflayer

The size of one difflayer is around 100K bytes, it is totally fine to cache 10K diff layers, which means a client can still diff sync if it is down for 8 hours.

We have to persist difflayer to a new database so that a node can still get difflayer even after a long downtime.

Fallback to Full Sync && Switch to Diff Sync

When fallback to full sync:

  1. Can not receive difflayer in time.
  2. No peers have that diff layer.
  3. Randomly switch to full sync(every 21 blocks) to ensure security.

When switch diff sync:

  1. Periodically detect the network, whether any peer has the latest difflayer.

Delay of Diff Sync

The verification of difflayer takes time, we adopt an optimistic strategy to broadcast the difflayer. A node will cache two kinds of difflayers: 1. Trusted difflayer which have been verified; 2. Untrusted difflayer received from other peers. A node always tries to respond with trusted difflayer first, and only responds with untrusted difflayer if it is missed in cache and database. The client can disconnect the peer once receive an invalid diff layer. In this way, we dismiss the delay of diff sync.

Security

The diff sync guarantee that:

  1. Light client security
  2. State consistency.

It sustains:
validator collusion
Fork the chain with an invalid state

So do not enable diff sync when it requires high security.

For a validator, it can only use diff sync when it is the inturn validator for the next block. It guarantee that at most one validator is doing diff sync at the same time, the whole network is still secure.

Preflight checks

  • build passed (make build)
  • tests passed (make test)
  • manual transaction test passed

Already reviewed by

...

Related issues

... reference related issue #'s here ...

@unclezoro unclezoro force-pushed the light_sync branch 9 times, most recently from 14e48ff to 0f56e8c Compare August 26, 2021 12:22
@unclezoro unclezoro force-pushed the light_sync branch 2 times, most recently from b05e738 to b8d6373 Compare September 1, 2021 08:56
@unclezoro unclezoro changed the title [WIP]implement block process part of light sync [R4R]implement block process part of light sync Sep 2, 2021
@unclezoro unclezoro changed the title [R4R]implement block process part of light sync [R4R]implement diff sync Sep 2, 2021
@unclezoro unclezoro force-pushed the light_sync branch 5 times, most recently from f8ed94b to a8891d9 Compare September 6, 2021 11:55
unclezoro and others added 10 commits September 7, 2021 08:35
* ligth sync: download difflayer

Signed-off-by: kyrie-yl <lei.y@binance.com>

* download diff layer: fix according to the comments

Signed-off-by: kyrie-yl <lei.y@binance.com>

* download diff layer: update

Signed-off-by: kyrie-yl <lei.y@binance.com>

* download diff layer: fix accroding comments

Signed-off-by: kyrie-yl <lei.y@binance.com>

Co-authored-by: kyrie-yl <lei.y@binance.com>
@unclezoro unclezoro force-pushed the light_sync branch 8 times, most recently from 80be4ed to f89fe52 Compare September 10, 2021 08:15
@unclezoro unclezoro force-pushed the light_sync branch 2 times, most recently from 09515cf to f1f4615 Compare September 10, 2021 15:43
@@ -139,7 +139,7 @@ func main() {
log.Crit("Length of bep2eContracts, bep2eSymbols, bep2eAmounts mismatch")
}

bep2eInfos := make(map[string]bep2eInfo, 0)
bep2eInfos := make(map[string]bep2eInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make a map with capacity?
make(map[string]bep2eInfo,len(symbols))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -799,6 +799,10 @@ func (p *Parlia) Delay(chain consensus.ChainReader, header *types.Header) *time.
return nil
}
delay := p.delayForRamanujanFork(snap, header)
// The blocking time should be no more than half of period
if delay > time.Duration(p.config.Period)*time.Second/2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if halfPeriod := time.Duration(p.config.Period) * time.Second / 2; delay > halfPeriod {
    delay = halfPeriod
}

This way don't need to calculate twice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

idx := snap.indexOfVal(p.val)
// validator is not allowed to diff sync
return idx < 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -142,6 +156,8 @@ var defaultCacheConfig = &CacheConfig{
SnapshotWait: true,
}

type BlockChainOption func(*BlockChain) *BlockChain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is no need to return *BlockChain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true there is no need to return in this case. But in some other cases, we may wrapper the Object which is an interface, and return the wrapper. It impact little to performance, just personal code style. If you insist, I can prune the return objects.

@@ -375,6 +419,10 @@ func NewBlockChain(db ethdb.Database, cacheConfig *CacheConfig, chainConfig *par
}
bc.snaps, _ = snapshot.New(bc.db, bc.stateCache.TrieDB(), bc.cacheConfig.SnapshotLimit, int(bc.cacheConfig.TriesInMemory), head.Root(), !bc.cacheConfig.SnapshotWait, true, recover)
}
// do options before start any routine
for _, option := range options {
bc = option(bc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only need option(bc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check above

@@ -220,8 +220,43 @@ type BlockChain interface {
Snapshots() *snapshot.Tree
}

type DownloadOption func(downloader *Downloader) *Downloader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning *Downloader is unnecessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check above


// PeerInfo retrieves all known `diff` information about a peer.
func (h *diffHandler) PeerInfo(id enode.ID) interface{} {
if p := h.peers.peer(id.String()); p != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be combined into one line

if p := h.peers.peer(id.String()); p != nil && p.diffExt != nil {
		return p.diffExt.info()
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

// Otherwise wait for `diff` to connect concurrently
wait := make(chan *diff.Peer)
ps.diffWait[id] = wait
ps.lock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this line to :184 defer ps.lock.Unlock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just follow what waitSnapExtension did, it has been proven quite safe. So I think we don't have motivation to modify this.

}
if fulfilled := requestTracker.Fulfil(peer.id, peer.version, FullDiffLayerMsg, res.RequestId); fulfilled {
return backend.Handle(peer, res)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else is unnecessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

node/node.go Outdated Show resolved Hide resolved
make diff block configable

wait code write

fix testcase

resolve comments

resolve comment
@unclezoro unclezoro changed the base branch from master to develop September 27, 2021 02:06
threads = runtime.NumCPU()
} else if threads == 0 {
threads = 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that this logic had been written in two places, and the only parameter is the difference.
If it is just a strategy for determining the number of threads, we can make a unique function.
It will be easier to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@unclezoro unclezoro merged commit 1ded097 into bnb-chain:develop Sep 28, 2021
This was referenced Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants