Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: node gossip loop panic when own node descriptor deleted #32942

Closed
tbg opened this issue Dec 7, 2018 · 3 comments
Closed

storage: node gossip loop panic when own node descriptor deleted #32942

tbg opened this issue Dec 7, 2018 · 3 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Milestone

Comments

@tbg
Copy link
Member

tbg commented Dec 7, 2018

https://sentry.io/cockroach-labs/cockroachdb/issues/798675882/?referrer=webhooks_plugin

stopper.go:182: gossip.go:969

github.com/cockroachdb/cockroach/pkg/server.(*Node).startGossip.func1

stacktrace: {u'frames': [{u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go', u'module': u'github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker', u'in_app': True, u'lineno': 199, u'filename': u'github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go'}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go', u'module': u'github.com/cockroachdb/cockroach/pkg/server.(*Node).startGossip', u'in_app': True, u'lineno': 740, u'filename': u'github.com/cockroachdb/cockroach/pkg/server/node.go'}, {u'function': u'gopanic', u'abs_path': u'/usr/local/go/src/runtime/panic.go', u'module': u'runtime', u'in_app': False, u'lineno': 502, u'filename': u'runtime/panic.go'}, {u'function': u'call32', u'abs_path': u'/usr/local/go/src/runtime/asm_amd64.s', u'module': u'runtime', u'in_app': False, u'lineno': 573, u'filename': u'runtime/asm_amd64.s'}]}
type: *log.safeError
value: stopper.go:182: gossip.go:969

@tbg tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Dec 7, 2018
@tbg
Copy link
Member Author

tbg commented Dec 11, 2018

// Verify we've already gossiped our node descriptor.
if _, err := n.storeCfg.Gossip.GetNodeDescriptor(n.Descriptor.NodeID); err != nil {
panic(err)
}

The error is

// Don't return node descriptors that are empty, because that's meant to
// indicate that the node has been removed from the cluster.
if nodeDescriptor.NodeID == 0 || nodeDescriptor.Address.IsEmpty() {
return nil, errors.Errorf("n%d has been removed from the cluster", nodeID)
}

@tbg tbg changed the title sentry: stopper.go:182: gossip.go:969 storage: node gossip loop panic when own node descriptor deleted Dec 11, 2018
@tbg tbg added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Dec 11, 2018
@tbg tbg added this to Incoming in KV via automation Dec 11, 2018
@tbg tbg added this to the 2.2 milestone Dec 11, 2018
@tbg
Copy link
Member Author

tbg commented Feb 27, 2019

@petermattis is this fixed by #34155?

@petermattis
Copy link
Collaborator

#34155 will certainly avoid this problem as our own node descriptor can no longer be removed by another node. This symptom of the underlying bug is relatively innocuous. The process will crash and hopefully something will restart it, but it isn't as bad as the other symptom where the process would remain running, but receive no traffic.

I think this can reasonably be closed as only a single event occurred months ago.

KV automation moved this from Incoming to Closed Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

No branches or pull requests

2 participants