Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ring state will be inconsistent between memory and consul after a CAS error #3154

Open
bboreham opened this issue Sep 10, 2020 · 2 comments
Open

Comments

@bboreham
Copy link
Contributor

The change in memory state is made before updating Consul, and no attempt is made to revert the former if the latter fails:

i.setState(state)
return i.updateConsul(ctx)

I noticed this because I got this log message:

level=warn ts=2020-09-09T19:59:32.324235593Z caller=grpc_logging.go:55 duration=15.010918473s method=/cortex.Ingester/TransferChunks err="Transfer: ChangeState: failed to CAS collectors/ring" msg="gRPC\n"

That's coming from here:

if err := i.lifecycler.ChangeState(ctx, ring.ACTIVE); err != nil {

The defer in that function should then log "TransferChunks failed" and go back to PENDING state, but I don't see that log, which is explained by this line checking the in-memory state:

if i.lifecycler.GetState() == ring.ACTIVE {

(Also odd: metrics show it did go to ACTIVE state)

@bboreham
Copy link
Contributor Author

@pstibrany explained the last point: the next heartbeat will save the in-memory state to Consul.

So, maybe all we need is a better check in Ingester.transfer() ?

@pstibrany
Copy link
Contributor

What do you think should happen? How do you suggest to modify check in Ingester.transfer()?

Changing state doesn't seem appropriate. As of now the only possible transition from ACTIVE state is to LEAVING state. I don't think that's correct answer.

Other possibilities seem even worse (going back to PENDING/JOINING), because transfer has already finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants