Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak config params #2206

Merged
merged 14 commits into from
Aug 13, 2020
Merged

Tweak config params #2206

merged 14 commits into from
Aug 13, 2020

Conversation

iulianpascalau
Copy link
Contributor

@iulianpascalau iulianpascalau commented Aug 6, 2020

  • tweaked config.toml values for a better sync process
  • removed a bootstorer bug that prevented the node to start correctly from storage if it had incomplete information
  • added histogram prints in p2p regarding the peers split on shards

Manual testing procedures:

  • since the new config values accentuates the bootstorer bug, all we need to test is the normal manual testing of this implementation. The problem arrived on random nodes in low value epochs (after a maximum of 4 epochs we had stuck nodes), so testing on the order of tens of epoch should suffice.

- updated MaxNumberOfNodesForStake to the mainnet value in order to better test the new branches
@@ -8,8 +8,8 @@
NumRoundsWithoutBleed = 100
MaximumPercentageToBleed = 0.5
BleedPercentagePerRound = 0.00001
MaxNumberOfNodesForStake = 28
NodesToSelectInAuction = 28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved if back to 28

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LucianMincu needs to test this in "as close as it can get to mainnet" scenario

AdoAdoAdo
AdoAdoAdo previously approved these changes Aug 6, 2020
sasurobert
sasurobert previously approved these changes Aug 6, 2020
@iulianpascalau iulianpascalau self-assigned this Aug 6, 2020
SebastianMarian
SebastianMarian previously approved these changes Aug 6, 2020
Copy link
Contributor

@SebastianMarian SebastianMarian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test this PR in two different edge cases for which these values have been set to 1. (the most important thing is to keep at least the BatchDelaySeconds = 2). So, the two edge cases are:

First is when all the nodes (from one shard or all shards) are stopped in almost the same time but before they had the chance to put the batch in the storage and the cacher would not be flushed in. Then the nodes could start behind final and if they would have a majority, and in this case they would certainly have, they would propose another block which is already finalized and broadcasted to meta/shards.

The second problem is with MetaBlockStorage used by shards when they finalize all the miniblocks from one metablock. Then, and only then, they will put the meta in the storage. If they are roll back or restart from storage, they will load the previous/last info from storage and based on which meta blocks are found in storage they can correctly set which metablocks and miniblocks are finalized and which are not yet. If this info is not found in storage, but only in batch, and getting stuff from batch is not working (now as far as I remember this should be fixed) or the node is stopped unexpectedly and the cacher is not flushed into the persister the shard node will have a problem to start and to sync, as it will not set correctly the already processed miniblocks from finalized metablocks.

@iulianpascalau
Copy link
Contributor Author

We should test this PR in two different edge cases for which these values have been set to 1. (the most important thing is to keep at least the BatchDelaySeconds = 2). So, the two edge cases are:

First is when all the nodes (from one shard or all shards) are stopped in almost the same time but before they had the chance to put the batch in the storage and the cacher would not be flushed in. Then the nodes could start behind final and if they would have a majority, and in this case they would certainly have, they would propose another block which is already finalized and broadcasted to meta/shards.

The second problem is with MetaBlockStorage used by shards when they finalize all the miniblocks from one metablock. Then, and only then, they will put the meta in the storage. If they are roll back or restart from storage, they will load the previous/last info from storage and based on which meta blocks are found in storage they can correctly set which metablocks and miniblocks are finalized and which are not yet. If this info is not found in storage, but only in batch, and getting stuff from batch is not working (now as far as I remember this should be fixed) or the node is stopped unexpectedly and the cacher is not flushed into the persister the shard node will have a problem to start and to sync, as it will not set correctly the already processed miniblocks from finalized metablocks.

Totally agree. The BatchDelaySeconds will be kept to 2 seconds for above mentioned reasons.

@@ -794,6 +794,15 @@ func (bp *baseProcessor) prepareDataForBootStorer(args bootStorerDataArgs) {
EpochStartTriggerConfigKey: args.epochStartTriggerConfigKey,
}

log.Debug("baseProcessor.prepareDataForBootStorer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called in every commit block - is it not too much data for log.Debug ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, was used in debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

for _, h := range hdrs {
str += bp.displayBoostrapHeaderInfo(h)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed function -> no longer necessary

for _, h := range hdrs {
str += st.displayBoostrapHeaderInfo(h)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed function -> no longer necessary

AdoAdoAdo
AdoAdoAdo previously approved these changes Aug 13, 2020
Copy link
Contributor

@LucianMincu LucianMincu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System tests passed.

@LucianMincu LucianMincu merged commit d331516 into development Aug 13, 2020
@LucianMincu LucianMincu deleted the tweak-config-params branch August 13, 2020 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants