Skip to content

make the analyse startup check more understandable when build the cluster#2432

Merged
mychaow merged 2 commits intoapache:masterfrom
neuyilan:apache_master_0106_fix_check_start_bug
Jan 7, 2021
Merged

make the analyse startup check more understandable when build the cluster#2432
mychaow merged 2 commits intoapache:masterfrom
neuyilan:apache_master_0106_fix_check_start_bug

Conversation

@neuyilan
Copy link
Member

@neuyilan neuyilan commented Jan 6, 2021

No description provided.

@neuyilan neuyilan added bug Something isn't working Module - Cluster PRs for the cluster module labels Jan 6, 2021
@OneSizeFitsQuorum
Copy link
Contributor

OneSizeFitsQuorum commented Jan 6, 2021

Current logic:
if one node finds all other nodes's config is consistent, then start server;
else if this node get some but not all reponse indicating their config are consistent and this node never find that there is a inconsistent node, then try to check in next turn.
else indicating there must be at lease one node's config is inconsistent with the cluster, then throw exception and stop server.

In my opinion, I think may be we can start server if quorum nodes return consistent check response, so that we can start cluster even one node will never come back after a node hang. Of course, we must ensure the inconsistent node can't join cluster, maybe we need some manual test to see whether a inconsistent node can join the cluster if we change totalNum from allnode.size() to allnode.size()/2 + 1.

In this commit, it seems that the startupCheck will perform next turn even it find a inconsistent node, but the cluster will never be establishd

@neuyilan
Copy link
Member Author

neuyilan commented Jan 6, 2021

Current logic:
if one node finds all other nodes's config is consistent, then start server;
else if this node get some but not all reponse indicating their config are consistent and this node never find that there is a inconsistent node, then try to check in next turn.
else indicating there must be at lease one node's config is inconsistent with the cluster, then throw exception and stop server.

In my opinion, I think may be we can start server if quorum nodes return consistent check response, so that we can start cluster even one node will never come back after a node hang. Of course, we must ensure the inconsistent node can't join cluster, maybe we need some manual test to see whether a inconsistent node can join the cluster if we change totalNum from allnode.size() to allnode.size()/2 + 1.

In this commit, it seems that the startupCheck will perform next turn even it find a inconsistent node, but the cluster will never be establishd

Thanks for your reply, In the beginning, I think we can build the cluster as long as the quorum nodes start successful, in the end, I think in the init building of the cluster, it's ok to make sure that all seed nodes start successful, then the cluster can build success, if some seed nodes do hang, we can remove it from the seed nodes. So I think the initial design logic is ok.

As for the commit, you're right, I misunderstanding the code, I prefer to change it to more comprehensible.

@neuyilan neuyilan changed the title fix the analyse startup check bug when building cluster make the analyse startup check more understandable when build the cluster Jan 6, 2021
@neuyilan neuyilan removed the bug Something isn't working label Jan 6, 2021
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@mychaow mychaow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mychaow mychaow merged commit cf9c31a into apache:master Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Module - Cluster PRs for the cluster module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants