New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No node has been set up yet" during concurrent cluster join #2145

Closed
aphyr opened this Issue Feb 19, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@aphyr

aphyr commented Feb 19, 2018

On the current nightly (2018/02/19), concurrently launching a five-node cluster can occasionally deadlock with errors like...

2018/02/19 15:32:30 node.go:343: Error while sending message to node with addr: n3:7080, err: rpc error: code = Unknown desc = No node has been set up yet

... when n3, the node n1 is trying to join, is in some sort of timeout loop on its join RPC:

2018/02/19 15:32:25 draft.go:658: Error while calling fn: Error while joining cluster: rpc error: code = DeadlineExceeded desc = context deadline exceeded
. Retrying...

See full logs, attached:
20180219T173203.000-0600.zip

@pawanrawal

This comment has been minimized.

Show comment
Hide comment
@pawanrawal

pawanrawal Mar 19, 2018

Contributor

This one is hard to reproduce (I ran the upsert test 100 times but couldn't reproduce it) though I can see when this would happen.

This happens because JoinCluster has a timeout of time.Second, now imagine when a node(n3) was added to the cluster but the timeout got triggered and it got back an error. Now n1 thinks n3 is already added to the cluster and tries to send it a message but fails because n3 is still stuck in the JoinCluster loop.

n1/alpha.log

2018/02/19 15:32:26 node.go:127: Setting conf state to nodes:1 nodes:3 
2018/02/19 15:32:26 node.go:343: Error while sending message to node with addr: n3:7080, err: rpc error: code = Unknown desc = No node has been set up yet

n3/alpha.log

2018/02/19 15:32:24 draft.go:640: Calling JoinCluster
2018/02/19 15:32:25 draft.go:658: Error while calling fn: Error while joining cluster: rpc error: code = DeadlineExceeded desc = context deadline exceeded
. Retrying...

All subsequent EntryConfChange proposals are ignored since the last proposal to add n3 wasn't applied across the cluster which now contains n1 and n3. This is visible from the following logs on n1.

2018/02/19 15:32:28 raft.go:815: INFO: propose conf Type:EntryConfChange Data:"\010\243\360\311\357\376\214\366\210o\020\000\030\003\"\024\t\003\000\000\000\000\000\000\000\020\001\032\007n3:7080"  ignored since pending unapplied configuration
2018/02/19 15:32:28 raft.go:815: INFO: propose conf Type:EntryConfChange Data:"\010\361\234\311\361\224\333\372\214\324\001\020\000\030\005\"\024\t\005\000\000\000\000\000\000\000\020\001\032\007n5:7080"  ignored since pending unapplied configuration

Removing the timeout fixes this. Though it doesn't fix the case where n3 crashes before receiving the response and after being added to the cluster. @janardhan1993 any ideas why do we have a timeout for JoinCluster?

Contributor

pawanrawal commented Mar 19, 2018

This one is hard to reproduce (I ran the upsert test 100 times but couldn't reproduce it) though I can see when this would happen.

This happens because JoinCluster has a timeout of time.Second, now imagine when a node(n3) was added to the cluster but the timeout got triggered and it got back an error. Now n1 thinks n3 is already added to the cluster and tries to send it a message but fails because n3 is still stuck in the JoinCluster loop.

n1/alpha.log

2018/02/19 15:32:26 node.go:127: Setting conf state to nodes:1 nodes:3 
2018/02/19 15:32:26 node.go:343: Error while sending message to node with addr: n3:7080, err: rpc error: code = Unknown desc = No node has been set up yet

n3/alpha.log

2018/02/19 15:32:24 draft.go:640: Calling JoinCluster
2018/02/19 15:32:25 draft.go:658: Error while calling fn: Error while joining cluster: rpc error: code = DeadlineExceeded desc = context deadline exceeded
. Retrying...

All subsequent EntryConfChange proposals are ignored since the last proposal to add n3 wasn't applied across the cluster which now contains n1 and n3. This is visible from the following logs on n1.

2018/02/19 15:32:28 raft.go:815: INFO: propose conf Type:EntryConfChange Data:"\010\243\360\311\357\376\214\366\210o\020\000\030\003\"\024\t\003\000\000\000\000\000\000\000\020\001\032\007n3:7080"  ignored since pending unapplied configuration
2018/02/19 15:32:28 raft.go:815: INFO: propose conf Type:EntryConfChange Data:"\010\361\234\311\361\224\333\372\214\324\001\020\000\030\005\"\024\t\005\000\000\000\000\000\000\000\020\001\032\007n5:7080"  ignored since pending unapplied configuration

Removing the timeout fixes this. Though it doesn't fix the case where n3 crashes before receiving the response and after being added to the cluster. @janardhan1993 any ideas why do we have a timeout for JoinCluster?

@janardhan1993

This comment has been minimized.

Show comment
Hide comment
@janardhan1993

janardhan1993 Mar 19, 2018

Contributor

Sometimes EntryConfChange proposals were dropped silently by raft and without timeout JoinCluster was stuck waiting for response.
I guess when you start three nodes, then if two followers call joinCluster concurrently then one of the entryConfproposal would get dropped because etcd/raft by design allows only one pending entryConfChange. So one of the node was stuck waiting for joinCluster to finish

Contributor

janardhan1993 commented Mar 19, 2018

Sometimes EntryConfChange proposals were dropped silently by raft and without timeout JoinCluster was stuck waiting for response.
I guess when you start three nodes, then if two followers call joinCluster concurrently then one of the entryConfproposal would get dropped because etcd/raft by design allows only one pending entryConfChange. So one of the node was stuck waiting for joinCluster to finish

@manishrjain

This comment has been minimized.

Show comment
Hide comment
@manishrjain

manishrjain Mar 19, 2018

Member

We could still have the timeout, but potentially put the JoinCluster in a loop?

Member

manishrjain commented Mar 19, 2018

We could still have the timeout, but potentially put the JoinCluster in a loop?

@pawanrawal

This comment has been minimized.

Show comment
Hide comment
@pawanrawal

pawanrawal Mar 19, 2018

Contributor

JoinCluster is already in a loop, the problem is that after the first context deadline exceeded all subsequent requests fail because there is already a ConfChange entry (adding n3) which hasn't been applied to the group (which consists of n1 and n3) and RAFT would ignore any new proposals for config change.

I think the solution here is simple, check the leader if the node being added is already part of the confState. If yes, then JoinCluster should return without any error.

Contributor

pawanrawal commented Mar 19, 2018

JoinCluster is already in a loop, the problem is that after the first context deadline exceeded all subsequent requests fail because there is already a ConfChange entry (adding n3) which hasn't been applied to the group (which consists of n1 and n3) and RAFT would ignore any new proposals for config change.

I think the solution here is simple, check the leader if the node being added is already part of the confState. If yes, then JoinCluster should return without any error.

@manishrjain manishrjain added the bug label Mar 21, 2018

@pawanrawal

This comment has been minimized.

Show comment
Hide comment
@pawanrawal

pawanrawal Mar 22, 2018

Contributor

This should be fixed with 807976c. I took the approach of removing the timeout as JoinCluster is only called once while booting up a new node.

Contributor

pawanrawal commented Mar 22, 2018

This should be fixed with 807976c. I took the approach of removing the timeout as JoinCluster is only called once while booting up a new node.

@pawanrawal pawanrawal closed this Mar 22, 2018

@manishrjain manishrjain added the bug label Mar 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment