Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: default zone config missing unexpectedly [now has a reliable repro] #43951

Open
maddyblue opened this issue Jan 14, 2020 · 8 comments
Open
Labels
branch-release-23.2 Used to mark GA and release blockers and technical advisories for 23.2 C-test-failure Broken test (automatically or manually discovered). O-rsg Random Syntax Generator P-3 Issues/test failures with no fix SLA T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@maddyblue
Copy link
Contributor

maddyblue commented Jan 14, 2020

The RSG tests appear to have done a bad thing: https://teamcity.cockroachdb.com/viewLog.html?buildId=1688020&buildTypeId=Cockroach_Nightlies_RandomSyntaxTests

F200114 06:12:11.657426 168 storage/reports/reporter.go:494  [n1,replication-reporter] default zone config missing unexpectedly
goroutine 168 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x6d57001, 0xed5af52bb, 0x0, 0xc00c1a39e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb8
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x6d53d80, 0xc000000004, 0x664e8f0, 0x1b, 0x1ee, 0xc00cb8c690, 0x42)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:211 +0xa0c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4867720, 0xc000a8a030, 0x4, 0x2, 0x0, 0x0, 0xc00c1a3438, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2c9
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4867720, 0xc000a8a030, 0x1, 0x4, 0x0, 0x0, 0xc00c1a3438, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:164
github.com/cockroachdb/cockroach/pkg/storage/reports.visitDefaultZone(0x4867720, 0xc000a8a030, 0xc0059bbb80, 0xc00c1a36f0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:494 +0x18f
github.com/cockroachdb/cockroach/pkg/storage/reports.visitAncestors(0x4867720, 0xc000a8a030, 0x10, 0xc0059bbb80, 0xc00c1a36f0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:457 +0x2ff
github.com/cockroachdb/cockroach/pkg/storage/reports.visitZones(0x4867720, 0xc000a8a030, 0xc009ca3730, 0xc0059bbb80, 0x1, 0xc00c1a36f0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:440 +0x100
github.com/cockroachdb/cockroach/pkg/storage/reports.(*zoneResolver).updateZone(0xc00c1a38c0, 0x4867720, 0xc000a8a030, 0xc009ca3730, 0xc0059bbb80, 0x70, 0xc00aa3af7a, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:344 +0xfc
github.com/cockroachdb/cockroach/pkg/storage/reports.(*zoneResolver).resolveRange(0xc00c1a38c0, 0x4867720, 0xc000a8a030, 0xc009ca3730, 0xc0059bbb80, 0x0, 0x0, 0xc00aa3af80)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:318 +0xb2
github.com/cockroachdb/cockroach/pkg/storage/reports.visitRanges(0x4867720, 0xc000a8a030, 0x48265e0, 0xc00f5228c0, 0xc0059bbb80, 0xc00c1a3b08, 0x3, 0x3, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:647 +0x3a2
github.com/cockroachdb/cockroach/pkg/storage/reports.(*Reporter).update(0xc000070880, 0x4867720, 0xc000a8a030, 0xc000a8a0c0, 0xc000a8a060, 0xc000a8a120, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:214 +0x7e4
github.com/cockroachdb/cockroach/pkg/storage/reports.(*Reporter).Start.func2(0x4867720, 0xc000a8a000)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/reports/reporter.go:136 +0x533
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc000318620, 0xc000604480, 0xc0003cccc0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x13e
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:190 +0xa8

Full stack trace and logs at https://teamcity.cockroachdb.com/downloadBuildLog.html?buildId=1688020&plain=true

Jira issue: CRDB-5266

@maddyblue maddyblue added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-rsg Random Syntax Generator labels Jan 14, 2020
@maddyblue maddyblue added this to Incoming in KV via automation Jan 14, 2020
@lunevalex lunevalex moved this from Incoming to Broken Tests in KV Jul 29, 2020
@tbg
Copy link
Member

tbg commented Aug 26, 2020

ping @andreimatei

@ajwerner
Copy link
Contributor

ajwerner commented Apr 5, 2021

I also am able to hit this stressing those random syntax tests.

@yuzefovich
Copy link
Member

Recent repro on 22.2.17 branch #114879. Perhaps there is a race between default zone config being set and reports running?

@kvoli kvoli added the branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 label Dec 5, 2023
@kvoli
Copy link
Collaborator

kvoli commented Dec 5, 2023

No repro on 9efa484 over 10mins.

dev test pkg/sql/tests -f TestRandomSyntaxGeneration -v --stress
...
15162 runs so far, 0 failures, over 10m0s

@rafiss rafiss added branch-release-23.2 Used to mark GA and release blockers and technical advisories for 23.2 P-2 Issues/test failures with a fix SLA of 3 months and removed branch-release-22.2 Used to mark release blockers and technical advisories for 22.2 labels Dec 26, 2023
@rafiss
Copy link
Collaborator

rafiss commented Dec 26, 2023

Happened on 23.2 here: #117081

I found a reliable way to repro this:

git checkout b1b23cd8e5df4c94cad31352183b50f6d351

./dev test pkg/sql/tests -f=TestRandomSyntaxGeneration -- --test_env='COCKROACH_RANDOM_SEED=-7353888918832731527' --test_arg -rsg=5m --test_arg -rsg-routines=8 --test_arg -rsg-exec-timeout=1m --test_arg -rsg-exec-column-change-timeout=90s
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1  default zone config missing unexpectedly
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !goroutine 358 [running]:
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !runtime/debug.Stack()
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	GOROOT/src/runtime/debug/stack.go:24 +0x5e
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(_, {{{0xc001e01590, 0x24}, {0x62b9580, 0x1}, {0x62b9580, 0x1}, {0x61ea1a7, 0x6}, {0x62b9580, ...}}, ...})
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/util/log/clog.go:280 +0xe7
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x78e0018, 0xc0018006e0}, 0x2, 0x4, 0x0, 0x0?, {0x62e6995, 0x28}, {0x0, 0x0, ...})
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:109 +0x585
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/util/log.Fatal(...)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/bazel-out/k8-fastbuild/bin/pkg/util/log/log_channels_generated.go:896
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.visitDefaultZone({0x78e0018, 0xc0018006e0}, 0xc03525c0c0?, 0xc02f9c52c8)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/reporter.go:497 +0xe5
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.visitAncestors({0x78e0018, 0xc0018006e0}, 0x16179398?, 0x0?, 0xc02f9c52c8)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/reporter.go:459 +0x1b1
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.visitZones({0x78e0018, 0xc0018006e0}, 0x0?, 0x0?, 0x0, 0xc02f9c52c8)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/reporter.go:439 +0x213
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.(*constraintConformanceVisitor).visitNewZone(0xc034c34380, {0x78e0018, 0xc0018006e0}, 0x0?)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/constraint_stats_report.go:476 +0x152
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.visitRanges({0x78e0018, 0xc0018006e0}, {0x78b1890, 0xc03476df80}, 0xc0018006e0?, {0xc0061db808, 0x3, 0x3})
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/reporter.go:606 +0x50e
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.(*Reporter).update(0xc001216f50, {0x78e0018?, 0xc0018006e0}, 0x0?, 0x0?, 0x1?)
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/reporter.go:235 +0x8b3
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports.(*Reporter).Start.func2({0x78e0050?, 0xc0033ed0e0?})
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/kv/kvserver/reports/reporter.go:157 +0x1d6
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x13a
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx in goroutine 8
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x415
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !For more context, check log files in: /artifacts/tmp/_tmp/d437d2c847dfedbc4972f231c3331c8e/logTestRandomSyntaxGeneration4077009151
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !****************************************************************************
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !This node experienced a fatal error (printed above), and as a result the
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !process is terminating.
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !problem in CockroachDB. With your help, the support team at Cockroach Labs
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !will try to determine the root cause, recommend next steps, and we can
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !improve CockroachDB based on your report.
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !Please submit a crash report by following the instructions here:
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !    https://github.com/cockroachdb/cockroach/issues/new/choose
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !If you would rather not post publicly, please contact us directly at:
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !    support@cockroachlabs.com
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !
F231226 06:17:42.287653 358 kv/kvserver/reports/reporter.go:497  [T1,Vsystem,n1,replication-reporter] 1 !The Cockroach Labs team appreciates your feedback.
I231226 06:17:42.390959 1 (gostd) testmain.go:191  [-] 1  Test //pkg/sql/tests:tests_test exited with error code 7

@rafiss rafiss changed the title storage: panic: default zone config missing unexpectedly storage: panic: default zone config missing unexpectedly [now has a reliable repro] Dec 26, 2023
@rafiss rafiss moved this from On Hold to Incoming in KV Dec 26, 2023
@nvanbenschoten
Copy link
Member

Thanks for the reproduction @rafiss, it's working for me as well.

In it, we see that the kvserver/reports.Reporter.latestConfig enters a state where it has no value for config.MakeZoneKey(keys.SystemSQLCodec, keys.RootNamespaceID), which should be the default zone config. I see the same thing when hoisting the check up to Reporter.update like:

diff --git a/pkg/kv/kvserver/reports/reporter.go b/pkg/kv/kvserver/reports/reporter.go
index 40346701b3a..5853cd4516e 100644
--- a/pkg/kv/kvserver/reports/reporter.go
+++ b/pkg/kv/kvserver/reports/reporter.go
@@ -196,6 +196,13 @@ func (stats *Reporter) update(
        if stats.latestConfig == nil {
                return nil
        }
+       zone, err := getZoneByID(keys.RootNamespaceID, stats.latestConfig)
+       if err != nil {
+               log.Fatalf(ctx, "failed to get default zone config: %s", err)
+       }
+       if zone == nil {
+               log.Fatalf(ctx, "default zone config missing unexpectedly (%T): %+v", stats.cfgs, stats.latestConfig)
+       }

        allStores := stats.storePool.GetStores()
        var storesFromGossip StoreResolver = func(

Using that diff, we also see the stats.cfgs config.SystemConfigProvider has a concrete type of *systemconfigwatcher.Cache. So systemconfigwatcher.Cache.GetSystemConfig() is returning a system config without a default configuration.

Instrumenting systemconfigwatcher.Cache.handleUpdate, we see that the default zone config row in the zones table ({/Table/5/1/0/0 {[] 1704949171.845526000,0}} and {/Table/5/1/0/2/1 {[] 1704949171.845526000,0}}, two column families) is getting deleted.

TestRandomSyntaxGeneration must be generating some query that is causing this row to get deleted. That shouldn't be allowed, but it also shouldn't be leading to a panic deep down in the replication report generation code. I'll send a patch to error instead of panic.

I'll then hand the remainder of this investigation back to SQL Foundations. Specifically, we still need to determine why the row in system.zones is being deleted and whether we should put any additional safeguards in place to prevent this from being allowed.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jan 11, 2024
Informs cockroachdb#43951.

In cockroachdb#43951, we saw that some random sequence on queries led to the default zone
configuration in the system.zones table (key `/Table/5/1/0/2/1`) being deleted.
This led to a fatal log in the replication report generation code.

Deleting this key should not be allowed, but it also shouldn't crash cockroach.
This commit replaces the fatal log with an error, which is logged a few stack
frames up with a gentle "some reports have not been generated" message.

I've confirmed that with this fix, the reproduction steps in cockroachdb#43951 no longer
lead to a fatal log and a crashed process.

Release note: None
@nvanbenschoten nvanbenschoten added this to Triage in SQL Foundations via automation Jan 11, 2024
@nvanbenschoten nvanbenschoten removed this from Incoming in KV Jan 11, 2024
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Jan 11, 2024
@nvanbenschoten nvanbenschoten removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-kv KV Team labels Jan 11, 2024
craig bot pushed a commit that referenced this issue Jan 11, 2024
117662: kv/reports: don't panic on missing default zone config r=nvanbenschoten a=nvanbenschoten

Informs #43951.

In #43951, we saw that some random sequence on queries led to the default zone configuration in the system.zones table (key `/Table/5/1/0/2/1`) being deleted. This led to a fatal log in the replication report generation code.

Deleting this key should not be allowed, but it also shouldn't crash cockroach. This commit replaces the fatal log with an error, which is logged a few stack frames up with a gentle "some reports have not been generated" message.

I've confirmed that with this fix, the reproduction steps in #43951 no longer lead to a fatal log and a crashed process.

Release note: None

117663: ci: migrate CI docker images to GCP r=jlinder a=rail

Previously, we used Docker Hub to host our CI docker images.

To improve service reliability, this PR moves the used docker images to GAR.

Part of: DEVINF-915
Epic: RE-539
Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rail Aliiev <rail@iqchoice.com>
@exalate-issue-sync exalate-issue-sync bot added the T-storage Storage Team label Jan 23, 2024
@blathers-crl blathers-crl bot added this to Incoming in Storage Jan 23, 2024
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Jan 23, 2024
@lunevalex lunevalex added T-kv KV Team and removed T-storage Storage Team labels Jan 23, 2024
@blathers-crl blathers-crl bot added this to Incoming in KV Jan 23, 2024
@nicktrav nicktrav added sync-me and removed A-storage Relating to our storage engine (Pebble) on-disk storage. sync-me labels Jan 23, 2024
@nicktrav nicktrav removed this from Incoming in Storage Jan 23, 2024
@nvanbenschoten nvanbenschoten changed the title storage: panic: default zone config missing unexpectedly [now has a reliable repro] panic: default zone config missing unexpectedly [now has a reliable repro] Jan 24, 2024
@nvanbenschoten nvanbenschoten removed this from Incoming in KV Jan 24, 2024
@nvanbenschoten nvanbenschoten removed the T-kv KV Team label Jan 24, 2024
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) P-3 Issues/test failures with no fix SLA and removed P-2 Issues/test failures with a fix SLA of 3 months labels Jan 24, 2024
Copy link

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

andrewbaptist pushed a commit to andrewbaptist/cockroach that referenced this issue Mar 6, 2024
Informs cockroachdb#43951.

In cockroachdb#43951, we saw that some random sequence on queries led to the default zone
configuration in the system.zones table (key `/Table/5/1/0/2/1`) being deleted.
This led to a fatal log in the replication report generation code.

Deleting this key should not be allowed, but it also shouldn't crash cockroach.
This commit replaces the fatal log with an error, which is logged a few stack
frames up with a gentle "some reports have not been generated" message.

I've confirmed that with this fix, the reproduction steps in cockroachdb#43951 no longer
lead to a fatal log and a crashed process.

Release note: None
Copy link

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.2 Used to mark GA and release blockers and technical advisories for 23.2 C-test-failure Broken test (automatically or manually discovered). O-rsg Random Syntax Generator P-3 Issues/test failures with no fix SLA T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
SQL Foundations
  
Triage
Development

No branches or pull requests