New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: default zone config missing unexpectedly [now has a reliable repro] #43951
Comments
ping @andreimatei |
I also am able to hit this stressing those random syntax tests. |
Recent repro on 22.2.17 branch #114879. Perhaps there is a race between default zone config being set and reports running? |
No repro on 9efa484 over 10mins.
|
Happened on 23.2 here: #117081 I found a reliable way to repro this:
|
Thanks for the reproduction @rafiss, it's working for me as well. In it, we see that the diff --git a/pkg/kv/kvserver/reports/reporter.go b/pkg/kv/kvserver/reports/reporter.go
index 40346701b3a..5853cd4516e 100644
--- a/pkg/kv/kvserver/reports/reporter.go
+++ b/pkg/kv/kvserver/reports/reporter.go
@@ -196,6 +196,13 @@ func (stats *Reporter) update(
if stats.latestConfig == nil {
return nil
}
+ zone, err := getZoneByID(keys.RootNamespaceID, stats.latestConfig)
+ if err != nil {
+ log.Fatalf(ctx, "failed to get default zone config: %s", err)
+ }
+ if zone == nil {
+ log.Fatalf(ctx, "default zone config missing unexpectedly (%T): %+v", stats.cfgs, stats.latestConfig)
+ }
allStores := stats.storePool.GetStores()
var storesFromGossip StoreResolver = func( Using that diff, we also see the Instrumenting
I'll then hand the remainder of this investigation back to SQL Foundations. Specifically, we still need to determine why the row in |
Informs cockroachdb#43951. In cockroachdb#43951, we saw that some random sequence on queries led to the default zone configuration in the system.zones table (key `/Table/5/1/0/2/1`) being deleted. This led to a fatal log in the replication report generation code. Deleting this key should not be allowed, but it also shouldn't crash cockroach. This commit replaces the fatal log with an error, which is logged a few stack frames up with a gentle "some reports have not been generated" message. I've confirmed that with this fix, the reproduction steps in cockroachdb#43951 no longer lead to a fatal log and a crashed process. Release note: None
117662: kv/reports: don't panic on missing default zone config r=nvanbenschoten a=nvanbenschoten Informs #43951. In #43951, we saw that some random sequence on queries led to the default zone configuration in the system.zones table (key `/Table/5/1/0/2/1`) being deleted. This led to a fatal log in the replication report generation code. Deleting this key should not be allowed, but it also shouldn't crash cockroach. This commit replaces the fatal log with an error, which is logged a few stack frames up with a gentle "some reports have not been generated" message. I've confirmed that with this fix, the reproduction steps in #43951 no longer lead to a fatal log and a crashed process. Release note: None 117663: ci: migrate CI docker images to GCP r=jlinder a=rail Previously, we used Docker Hub to host our CI docker images. To improve service reliability, this PR moves the used docker images to GAR. Part of: DEVINF-915 Epic: RE-539 Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Rail Aliiev <rail@iqchoice.com>
We have marked this test failure issue as stale because it has been |
Informs cockroachdb#43951. In cockroachdb#43951, we saw that some random sequence on queries led to the default zone configuration in the system.zones table (key `/Table/5/1/0/2/1`) being deleted. This led to a fatal log in the replication report generation code. Deleting this key should not be allowed, but it also shouldn't crash cockroach. This commit replaces the fatal log with an error, which is logged a few stack frames up with a gentle "some reports have not been generated" message. I've confirmed that with this fix, the reproduction steps in cockroachdb#43951 no longer lead to a fatal log and a crashed process. Release note: None
We have marked this test failure issue as stale because it has been |
The RSG tests appear to have done a bad thing: https://teamcity.cockroachdb.com/viewLog.html?buildId=1688020&buildTypeId=Cockroach_Nightlies_RandomSyntaxTests
Full stack trace and logs at https://teamcity.cockroachdb.com/downloadBuildLog.html?buildId=1688020&plain=true
Jira issue: CRDB-5266
The text was updated successfully, but these errors were encountered: