ChainSync refactor #693

austinabell · 2020-09-09T21:59:22Z

Summary of changes
Changes introduced in this pull request:

Handles syncing asynchronously through sync workers on separate tasks
- All syncing logic now belongs in the SyncWorker struct, and the ChainSyncer just handles network events and scheduling/ handling sync workers
- Separating some of the fields so they are threadsafe across tasks meant there was some auxiliary updates
Updates peer selection for blocksync request to be random (should be made better with Implement peer manager stats for each peer #692)
Fixes some bugs with tipset scheduling and other functions

So the loading/ fetching of tipsets from hello protocol is done in separate threads as to not block the ChainSyncer polling or the network event channel will fill and stall (what was happening before). This can be handled better but I want to keep the scope of this PR to a minimum as it's already changing a lot.

Reference issue to close (if applicable)

Closes #690

Other information and links

ec2

It seems like while in bootstrap phase, every hello we get, we'll be doing a sync from that Tipset? So if there are 3 Hello messages, they all get scheduled? Or am I misreading?

blockchain/chain/src/store/chain_store.rs

node/forest_libp2p/src/service.rs

austinabell · 2020-09-10T12:48:42Z

It seems like while in bootstrap phase, every hello we get, we'll be doing a sync from that Tipset? So if there are 3 Hello messages, they all get scheduled? Or am I misreading?

No, only tipsets that aren't connected get scheduled, which is the point of having multiple workers (so one doesn't get sidetracked with an invalid chain and have to start from scratch again)

ec2 · 2020-09-10T18:21:18Z

No, only tipsets that aren't connected get scheduled, which is the point of having multiple workers (so one doesn't get sidetracked with an invalid chain and have to start from scratch again)

What do you mean by connected here? If i understand, connected means common ancestor?

Lets say we receive a Hello from PeerA. We start doing a reverse header sync. Half way through a reverse header sync, we receive another Hello from PeerB. PeerB will probably have a newer head than PeerA. If we start to do a reverse header sync on PeerB as well, in most cases (assuming no attacks or weird things which I think is the most likely and far more common scenario), we will have a large amount of the same fetches. This is because a reverse header sync can take quite a bit of time if the chain is long, and the blocks dont get persisted into the store until ALL the blocks from head -> 0 are fetched, so WorkerB fetching from PeerB will not know that WorkerA fetching from PeerA is already fetching the same blocks.

austinabell · 2020-09-10T18:38:18Z

No, only tipsets that aren't connected get scheduled, which is the point of having multiple workers (so one doesn't get sidetracked with an invalid chain and have to start from scratch again)

What do you mean by connected here? If i understand, connected means common ancestor?

Lets say we receive a Hello from PeerA. We start doing a reverse header sync. Half way through a reverse header sync, we receive another Hello from PeerB. PeerB will probably have a newer head than PeerA. If we start to do a reverse header sync on PeerB as well, in most cases (assuming no attacks or weird things which I think is the most likely and far more common scenario), we will have a large amount of the same fetches. This is because a reverse header sync can take quite a bit of time if the chain is long, and the blocks dont get persisted into the store until ALL the blocks from head -> 0 are fetched, so WorkerB fetching from PeerB will not know that WorkerA fetching from PeerA is already fetching the same blocks.

It's all based on the hello messages received, they get grouped as they come in, so as long as there isn't a break in between two epochs (probably rare) they will be considered on the same chain. This is determined by checking if either the tipset parent is the other or if they are the same.

This logic existed, but is used more correctly than before (can still be improved on later). This is the point of having those tipset buckets (to track which tipsets are being synced and what to assign another worker to)

ec2

it L(Real)GTM. I noticed that currently we dont verify the winning post proof. Cant remember why that is, but that can be addressed in another PR. Also nice one getting rid of the network_handler. I always got mixed up between that and the context when i was heads deep.

blockchain/chain_sync/src/sync.rs

dutterbutter

Awesome job! Great separating out the sync_worker logic away from sync.rs. Just a few nits and more questions than anything.

dutterbutter · 2020-09-14T16:14:39Z

blockchain/chain/src/store/chain_store.rs

-        let ts: Tipset = Tipset::new(vec![header.clone()])?;
+    // TODO this is really broken, should not be setting the tipset metadata to a tipset with just
+    // the one header.
+    pub async fn set_tipset_tracker(&self, header: &BlockHeader) -> Result<(), Error> {


I don't really see the need for the tip_index.rs file anymore. I recall implementing it really early thinking it would be good for tipset data retrievals but it doesn't look like we are using it anywhere for that purpose? We set the tracker but never make use of it?

Yeah I was planning on leaving it for now because out of scope of this and the changes were already a lot

dutterbutter · 2020-09-14T16:30:43Z

blockchain/chain_sync/src/sync.rs

-use vm::TokenAmount;
+
+// TODO revisit this type, necessary for two sets of Arc<Mutex<>> because each state is
+// on seperate thread and needs to be mutated independently, but the vec needs to be read


Nit: seperate -> separate

dutterbutter · 2020-09-14T16:41:59Z

blockchain/chain_sync/src/sync.rs

+            .chain_store
+            .heaviest_tipset()
+            .await
+            .map(|heaviest| ts.weight() >= heaviest.weight())


Shouldn't this just be ts.weight() > heaviest.weight()? Do we want to validate and set peer head when weight is equal?

Nah, intentional that if the weight is the same it will also be considered. Also the logic matches Lotus (Although they setup the function differently)

dutterbutter · 2020-09-14T16:49:14Z

blockchain/chain_sync/src/sync.rs

    }

    /// informs the syncer about a new potential tipset
    /// This should be called when connecting to new peers, and additionally
    /// when receiving new blocks from the network
-    pub async fn inform_new_head(&mut self, peer: PeerId, fts: &FullTipset) -> Result<(), Error> {
+    pub async fn inform_new_head(&mut self, peer: PeerId, ts: &FullTipset) -> Result<(), Error> {


I noticed in the Lotus implementation they are adding the peerId to their ChainExchange pool, should there be a TODO to add this peer to the available peers for blocksync peers?

we're doing it when the hello message is handled by the syncer. I don't see why it would need to be handled after this process

commit 8ef5ae5 Author: Austin Abell <austinabell8@gmail.com> Date: Wed Sep 16 12:30:01 2020 -0400 Peer stats tracking and selection (#701) * Added peer manager logic and started to move blocksync handling into network context * Migrate to using handled blocksync requests and remove a duplicate code * Fix test use * Fix duration subtract logic * Update peer manager updates from hello requests * Docs and cleanup * more cleanup commit dc0ff4c Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Tue Sep 15 14:43:18 2020 -0400 Semantic Validation for Messages (#703) * semantic validation * suggestions * suggestions commit 3411459 Author: Austin Abell <austinabell8@gmail.com> Date: Mon Sep 14 15:14:48 2020 -0400 ChainSync refactor (#693) * Switch chain store to threadsafe * Update genesis to arc reference * wip refactoring chain sync to workers in async tasks * Update network event handling and remove NetworkHandler * Update tipset scheduling logic * Update peer retrieval to take a random sample of available peers * Cleanup and enabling all existing tests * fix worker task spawn * Add TODO for emit event ignoring and change to error log * oops * Update comment * Fix typo commit 66ca99e Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Mon Sep 14 13:48:22 2020 -0400 Fix StateManager use in different components (#694) * wrap sm in arc * lint * fix tests * suggestions * clippy commit 96b64cb Author: nannick <rajarupans@gmail.com> Date: Mon Sep 14 10:33:27 2020 -0400 Drand ignore env variable (#697) * First commit * Using enviromental vairable instead of cli flag * FIxing dumb mistakes * Renaming drand flag * UPdate latest_beacon_entry * Adding drand check * Adding doc commit 548a464 Author: Austin Abell <austinabell8@gmail.com> Date: Thu Sep 10 13:22:22 2020 -0400 Print out conformance results and add log for skips (#695) commit 0d7b16c Author: Stepan <s.s.naumov@gmail.com> Date: Tue Sep 8 09:01:05 2020 -0400 Add CLI command to add Genesis Miner to Genesis Template (#644) * AddMinerGenesis cmd command * fixing tests * addressing comments * removing jsons

* WIP * More changes * Even more changes * Amt docs * Clippy & copyright * Squashed commit of the following: commit 8ef5ae5 Author: Austin Abell <austinabell8@gmail.com> Date: Wed Sep 16 12:30:01 2020 -0400 Peer stats tracking and selection (#701) * Added peer manager logic and started to move blocksync handling into network context * Migrate to using handled blocksync requests and remove a duplicate code * Fix test use * Fix duration subtract logic * Update peer manager updates from hello requests * Docs and cleanup * more cleanup commit dc0ff4c Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Tue Sep 15 14:43:18 2020 -0400 Semantic Validation for Messages (#703) * semantic validation * suggestions * suggestions commit 3411459 Author: Austin Abell <austinabell8@gmail.com> Date: Mon Sep 14 15:14:48 2020 -0400 ChainSync refactor (#693) * Switch chain store to threadsafe * Update genesis to arc reference * wip refactoring chain sync to workers in async tasks * Update network event handling and remove NetworkHandler * Update tipset scheduling logic * Update peer retrieval to take a random sample of available peers * Cleanup and enabling all existing tests * fix worker task spawn * Add TODO for emit event ignoring and change to error log * oops * Update comment * Fix typo commit 66ca99e Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Mon Sep 14 13:48:22 2020 -0400 Fix StateManager use in different components (#694) * wrap sm in arc * lint * fix tests * suggestions * clippy commit 96b64cb Author: nannick <rajarupans@gmail.com> Date: Mon Sep 14 10:33:27 2020 -0400 Drand ignore env variable (#697) * First commit * Using enviromental vairable instead of cli flag * FIxing dumb mistakes * Renaming drand flag * UPdate latest_beacon_entry * Adding drand check * Adding doc commit 548a464 Author: Austin Abell <austinabell8@gmail.com> Date: Thu Sep 10 13:22:22 2020 -0400 Print out conformance results and add log for skips (#695) commit 0d7b16c Author: Stepan <s.s.naumov@gmail.com> Date: Tue Sep 8 09:01:05 2020 -0400 Add CLI command to add Genesis Miner to Genesis Template (#644) * AddMinerGenesis cmd command * fixing tests * addressing comments * removing jsons * Delete old deadline tests * Fix verify_block_signature * Fix state_api * Silent unused variable warnings * clippy * clippy * clippy * Fix miner worker addr function * fix load_sectors_from_set * Fix get_miner_info * miner faults update * Setup faults and recoveries logic * Add todo for checking verification function * Update to specs-actors v0.9.3 * Fix test * Remove Cid.Undef check * PR fixes * Use the actor_error! macro in the miner actor * Fix type mismatch * PR changes * Use return value * Rename file * More PR fixes * Add assertions and error message * Downcast errors originating from Deadlines::{load_deadline, update_deadline} and State::save_deadlines * Rename Sectors::{new, load} to Sectors::{load, load_sector} * Fix inverted logic * Fix state manager * Remove unused parameters * clippy * Replace function by const, remove duplication, remove comment * Rename field, fix macro formatting, remove unnecessary vec * Downcast errors Co-authored-by: austinabell <austinabell8@gmail.com>

austinabell added 8 commits September 8, 2020 09:59

Switch chain store to threadsafe

39e37b6

Update genesis to arc reference

a1f80ca

wip refactoring chain sync to workers in async tasks

257db49

Update network event handling and remove NetworkHandler

8c0130d

Update tipset scheduling logic

f951232

Update peer retrieval to take a random sample of available peers

d06e717

Cleanup and enabling all existing tests

e6460ba

fix worker task spawn

73e57a8

austinabell added Status: Needs Review Blockchain Network Libp2p and PubSub stuff labels Sep 9, 2020

austinabell requested review from ansermino, dutterbutter and ec2 as code owners September 9, 2020 21:59

austinabell changed the title ~~Austin/sync/refactor~~ ChainSync refactor Sep 9, 2020

austinabell requested review from StaticallyTypedAnxiety, RajarupanSampanthan and timvermeulen September 9, 2020 21:59

ec2 reviewed Sep 10, 2020

View reviewed changes

blockchain/chain/src/store/chain_store.rs Show resolved Hide resolved

node/forest_libp2p/src/service.rs Outdated Show resolved Hide resolved

austinabell requested a review from ec2 September 10, 2020 15:54

austinabell added 2 commits September 10, 2020 16:13

Add TODO for emit event ignoring and change to error log

4836a0d

oops

239f60c

ec2 approved these changes Sep 14, 2020

View reviewed changes

blockchain/chain_sync/src/sync.rs Show resolved Hide resolved

austinabell added 2 commits September 14, 2020 10:41

merge and resolve conflicts

b330110

Update comment

e095ee5

dutterbutter suggested changes Sep 14, 2020

View reviewed changes

dutterbutter approved these changes Sep 14, 2020

View reviewed changes

austinabell added 2 commits September 14, 2020 13:06

Fix typo

56d857c

merge and resolve conflicts

00901e8

austinabell added Status: Merge Ready and removed Status: Needs Review labels Sep 14, 2020

austinabell merged commit 3411459 into main Sep 14, 2020

austinabell deleted the austin/sync/refactor branch September 14, 2020 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChainSync refactor #693

ChainSync refactor #693

austinabell commented Sep 9, 2020

ec2 left a comment

austinabell commented Sep 10, 2020

ec2 commented Sep 10, 2020

austinabell commented Sep 10, 2020

ec2 left a comment

dutterbutter left a comment

dutterbutter Sep 14, 2020

austinabell Sep 14, 2020

dutterbutter Sep 14, 2020

dutterbutter Sep 14, 2020

austinabell Sep 14, 2020

dutterbutter Sep 14, 2020

austinabell Sep 14, 2020

ChainSync refactor #693

ChainSync refactor #693

Conversation

austinabell commented Sep 9, 2020

ec2 left a comment

Choose a reason for hiding this comment

austinabell commented Sep 10, 2020

ec2 commented Sep 10, 2020

austinabell commented Sep 10, 2020

ec2 left a comment

Choose a reason for hiding this comment

dutterbutter left a comment

Choose a reason for hiding this comment

dutterbutter Sep 14, 2020

Choose a reason for hiding this comment

austinabell Sep 14, 2020

Choose a reason for hiding this comment

dutterbutter Sep 14, 2020

Choose a reason for hiding this comment

dutterbutter Sep 14, 2020

Choose a reason for hiding this comment

austinabell Sep 14, 2020

Choose a reason for hiding this comment

dutterbutter Sep 14, 2020

Choose a reason for hiding this comment

austinabell Sep 14, 2020

Choose a reason for hiding this comment