Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Running Network Joining node in catchup/sync leap for long #3563

Closed
cspramit opened this issue Jan 11, 2023 · 1 comment · Fixed by #3594
Closed

Long Running Network Joining node in catchup/sync leap for long #3563

cspramit opened this issue Jan 11, 2023 · 1 comment · Fixed by #3594
Assignees
Labels
bug Something isn't working release blocker PR to be merged before releasing

Comments

@cspramit
Copy link
Contributor

cspramit commented Jan 11, 2023

Observation: when network is being moderately loaded continuously, the joining nodes does not make it to KeepUp. But when load is cut off and no deploys being executed AND node restarted, the joiner node gets to KeepUp sooner.
Currently after 22hrs has not caughtup.

Name: rti-dev-small
Network: 5 + 2
Branch: dev
Chain: public
Commit: 01c2af5
Config Override: None
Mixed Deploy and Long Running Network Stability Testing
Test :
25 x multi delegations periodically
1 x multi NFT contract and minting periodically
5 x multi bloated wasm periodically
12k x transfers and wasm transfers periodically
4 x add joining node at certain intervals
random node restarts
Observations:
network running stable
network running for 48+ hrs
4 x joiner nodes @ network with 100k deploys successful
4 x joiner nodes @ network with 500k deploys - in progress - been 22 hrs so far
total network deploys currently at 1+million

Dumps: http://genesis.casperlabs.io/rti-dev-small/casper-node-dumps/rti-dev-small/10012023_1219/dump_download_list.txt
Nodes of Interest: 172.44.57.234, 172.44.68.115 (joiners)
Grafana: https://grafana.casperlabs.io/goto/1JLwb0hVz?orgId=1

@cspramit cspramit changed the title Long Running Network Joining node in catchup/sync leap for 22hrs Long Running Network Joining node in catchup/sync leap for long Jan 11, 2023
@Fraser999
Copy link
Collaborator

Looks like this is related to the global state synchronizer not handling peers in the same way as the block synchronizer. Once it's tasked with fetching the state under a given root hash, it never updates the set of peers. If the initial set happens to be other joiners who don't yet have the global state, the process can't proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release blocker PR to be merged before releasing
Projects
None yet
6 participants