Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master] Initiate rejoin of nodes for out of sync scenarios #2571

Merged
merged 1 commit into from
Jun 11, 2021

Conversation

chetan-zilliqa
Copy link
Contributor

@chetan-zilliqa chetan-zilliqa commented May 24, 2021

Description

This PR targets seed nodes and seedpub nodes rejoin during the following scenarios.

Scenario Occurance Fix
State root hash failed During comparing state root hash after getting Tx block range RejoinNetwork
DeserializeDelta failed During deserializing state deltas after receiving Tx block range RejoinNetwork
Storage root mismatch.ProtobufToAccountDelta failed for account at address During RetrieveTxBlocks() No Change.Already handled when we add GenesisInfo
Node still in new DS block mining and sync's at the same time. [INFO][30415][21-05-24T06:26:18.336][e/Messenger.cpp:6411][GetLookupSetTxBlockF] END[INFO][30415][21-05-24T06:26:18.336][okup/Lookup.cpp:2950][ProcessSetTxBlockFro] [Epoch 0] ProcessSetTxBlockFromSeed sent by <34.217.10.88:3268> for blocks 1178600 to 1178601[WARN][30415][21-05-24T06:26:18.336][okup/Lookup.cpp:2993][ProcessSetTxBlockFro] I am lagging behind in older ds epoch. Will rejoin again! RejoinNetwork

Backward Compatibility

  • This is not a breaking change
  • This is a breaking change

Testing

Below is the testsheet link.
https://docs.google.com/spreadsheets/d/1ZtXREAGIpTyUJlzXYtatpY9ZA8k5yE7FiwUaJ7Rgaow/edit#gid=1075062091

Review Suggestion

Status

Implementation

  • ready for review

Integration Test (Core Team)

  • local machine test
  • small-scale cloud test

@chetan-zilliqa chetan-zilliqa marked this pull request as draft May 24, 2021 12:00
@chetan-zilliqa chetan-zilliqa self-assigned this May 24, 2021
@github-actions github-actions bot added this to PRs in development in Core May 24, 2021
@github-actions github-actions bot changed the title Initiate rejoin of nodes for out of sync scenarios [master] Initiate rejoin of nodes for out of sync scenarios May 24, 2021
@chetan-zilliqa chetan-zilliqa requested review from KaustubhShamshery and bb111189 and removed request for bb111189 and KaustubhShamshery June 3, 2021 12:35
@chetan-zilliqa chetan-zilliqa marked this pull request as ready for review June 4, 2021 03:13
@@ -95,6 +95,7 @@ Lookup::~Lookup() {}
void Lookup::InitSync() {
LOG_MARKER();
auto func = [this]() -> void {
LOG_GENERAL(INFO, "###### NodeRejoin: InitSync started thread ######");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be remove NodeRejoin keyword since InitSync could be called or
During Rejoin but at startup as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"Seems we have received all or few of txblocks "
"previously. "
"so ignoring these txblocks!");
// cv_setRejoinRecovery.notify_all(); //TODO : Check with @Sandip if we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to notify right? otherwise cv_setRejoinRecovery will wait for 180 sec in InitSync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/libLookup/Lookup.cpp Outdated Show resolved Hide resolved
src/libLookup/Lookup.cpp Show resolved Hide resolved
Core automation moved this from PRs in development to PRs needing review Jun 8, 2021
LOG_MARKER();

auto func = [this]() -> void {
while (GetSyncType() != SyncType::NO_SYNC) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code seems identical to above, maybe make a function?:>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -2981,6 +2992,8 @@ bool Lookup::ProcessSetTxBlockFromSeed(
WARNING,
"The lowBlockNum is higher than highblocknum, maybe DS epoch ongoing");
cv_setTxBlockFromSeed.notify_all();
// cv_setRejoinRecovery.notify_all(); // No need to notify as we need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what constant denotes 180 seconds? maybe mention that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the line as it was initially kept for my understanding

Copy link
Contributor

@KaustubhShamshery KaustubhShamshery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz go through comments

Core automation moved this from PRs needing review to PRs approved - ready to merge! Jun 11, 2021
@sandipbhoir sandipbhoir merged commit 57f6905 into master Jun 11, 2021
Core automation moved this from PRs approved - ready to merge! to PRs done (merged) Jun 11, 2021
@sandipbhoir sandipbhoir deleted the zil-2960-seed-recovery branch June 11, 2021 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Core
  
PRs done (merged)
Development

Successfully merging this pull request may close these issues.

None yet

3 participants