Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/tracing: segfault in (*Span).Finish somewhere in sql/TestTrace #57875

Closed
knz opened this issue Dec 13, 2020 · 3 comments · Fixed by #57942
Closed

util/tracing: segfault in (*Span).Finish somewhere in sql/TestTrace #57875

knz opened this issue Dec 13, 2020 · 3 comments · Fixed by #57942
Labels
A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@knz
Copy link
Contributor

knz commented Dec 13, 2020

Found in CI in an unrelated PR: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests_Test/2516403

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb4bca2]

goroutine 624262 [running]:
golang.org/x/net/trace.(*traceSet).Remove(0x0, 0xc004ac5860)
        /go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/net/trace/trace.go:478 +0x22
golang.org/x/net/trace.(*trace).Finish(0xc004ac5860)
        /go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/net/trace/trace.go:412 +0x13e
github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).Finish(0xc00b0bc6c0)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:392 +0x112
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send(0xc000d44d80, 0x500ebc0, 0xc003bff4d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:528 +0x545
github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc0050eae70, 0x500ebc0, 0xc00df30e10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:744 +0x13c
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send(0xc0062f3dd0, 0x500ebc0, 0xc00df30e10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:919 +0x12e
github.com/cockroachdb/cockroach/pkg/kv.sendAndFill(0x500ebc0, 0xc00df30e10, 0xc0024d3fd8, 0xc005be1900, 0x0, 0x5)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:654 +0x107
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Run(0xc0062f3dd0, 0x500ebc0, 0xc00df30e10, 0xc005be1900, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:590 +0xe7
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Get(0xc0062f3dd0, 0x500ebc0, 0xc00df30e10, 0x408b8e0, 0xc0095b8e60, 0x8, 0xc0024d41c8, 0x14eeef3, 0xc0075ac460, 0xc0024d41c8, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:369 +0xbf
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).GetProtoTs(0xc0062f3dd0, 0x500ebc0, 0xc00df30e10, 0x408b8e0, 0xc0095b8e60, 0x5051b60, 0xc0041d56d0, 0x800000001, 0x220000c0024d4258, 0x1341378, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:391 +0x66
github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv.GetDescriptorByID(0x500ebc0, 0xc00df30e10, 0xc0062f3dd0, 0xc007f36e80, 0xc007f36e80, 0x0, 0xc, 0x4, 0x0, 0xc, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:151 +0x191
github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv.GetAnyDescriptorByID(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/catalogkv/catalogkv.go:125
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.(*Collection).getDescriptorFromStore(0xc00350f210, 0x500ebc0, 0xc00df30e10, 0xc0062f3dd0, 0xc007f36e80, 0xc007f36e80, 0x0, 0x1d00000001, 0x44c8f7f, 0x8, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/collection.go:283 +0x125
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.(*Collection).getObjectByName(0xc00350f210, 0x500ebc0, 0xc00df30e10, 0xc0062f3dd0, 0x44c8f78, 0x6, 0x436fb4f, 0x6, 0x44c8f7f, 0x8, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/collection.go:432 +0x39c
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs.(*Collection).GetTableByName(0xc00350f210, 0x500ebc0, 0xc00df30e10, 0xc0062f3dd0, 0x5053ae0, 0xc00845e680, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/collection.go:453 +0x14b
github.com/cockroachdb/cockroach/pkg/sql/catalog/accessors.(*LogicalSchemaAccessor).GetObjectDesc(0xc0137b2480, 0x500ebc0, 0xc00df30e10, 0xc0062f3dd0, 0xc002fc7500, 0xc007f36e80, 0xc007f36e80, 0x0, 0x44c8f78, 0x6, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/accessors/logical_schema_accessors.go:152 +0x59b
github.com/cockroachdb/cockroach/pkg/sql.(*planner).LookupObject(0xc00350f3c0, 0x500ebc0, 0xc00df30e10, 0x0, 0x0, 0x0, 0x44c8f78, 0x6, 0x436fb4f, 0x6, ...)
...

It also repros readily with make stress PKG=./pkg/sql TESTS=TestTrace

cc @tbg for triage.

@blathers-crl
Copy link

blathers-crl bot commented Dec 13, 2020

Hi @knz, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@knz knz added A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. labels Dec 13, 2020
@knz knz added this to Incoming in KV via automation Dec 13, 2020
@knz knz changed the title util/tracing: segfault in (*trace).Finish() somewhere in sql/TestTrace util/tracing: segfault in (*Span).Finish somewhere in sql/TestTrace Dec 13, 2020
@yuzefovich
Copy link
Member

I've also seen a couple of panics similar to it.

E201213 00:03:14.802484 624482 util/log/logcrash/crash_reporting.go:140  [n1,client=‹127.0.0.1:47674›,hostssl,user=root] a panic has occurred!
runtime error: index out of range [0] with length 0
(1) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:479
  | [...repeated from below...]
Wraps: (2) while executing: SELECT * FROM _._
Wraps: (3) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:479
  | runtime.gopanic
  | 	/usr/local/go/src/runtime/panic.go:969
  | runtime.goPanicIndex
  | 	/usr/local/go/src/runtime/panic.go:88
  | golang.org/x/net/trace.(*trace).addEvent
  | 	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/net/trace/trace.go:805
  | golang.org/x/net/trace.(*trace).LazyPrintf
  | 	/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/net/trace/trace.go:834
  | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).setTagInner
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:480
  | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).SetSpanStats
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:371
  | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).moveToTrailingMeta
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:707
  | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).MoveToDraining
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:611
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).Next
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:245
  | github.com/cockroachdb/cockroach/pkg/sql/execinfra.Run
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/base.go:171
  | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).Run
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:765
  | github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:392
  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:384
  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun
...

@tbg
Copy link
Member

tbg commented Dec 15, 2020

Looking.

craig bot pushed a commit that referenced this issue Dec 15, 2020
57879: sql: make writes to `system.eventlog` conditional r=tbg a=knz

This patch is meant to help recovering partial availability in
clusters where the `system.eventlog` table / range are unsavailable.

Prior to this patch, when any SQL action was causing a notable event,
that event would be written transactionally (in the same transaction)
to the table `system.eventlog`. If that table happened to be
unavailable, the action would not complete. This was true of even
basic operations like changing a cluster setting, changing privileges
on unrelated tables, etc.

This patch changes that by introducing a new cluster setting
`server.eventlog.enabled` to make these writes conditional.

Release note (general change): The new cluster setting
`server.eventlog.enabled` controls whether notable events are also
written to the table `system.eventlog`. Its default value is
`true`. Changing this cluster setting can help recovering partial
cluster availability when the `system.eventlog` table becomes
unavailable. Note that even when `false`, notable events are still
propagated to the logging system, where they can be e.g. redirected to
files.

57913: build: move `.pb.go` contents munging from `make` to `protoc-gen-gogoroach` r=rickystewart a=rickystewart

This worked fine, but it complicated #56067 because [the standard Go
protobuf Bazel support](https://github.com/bazelbuild/rules_go/blob/master/proto/core.rst#go_proto_library)
doesn't easily allow us to just inject a random `sed` command into the
middle of the (`.proto` -> `.pb.go` -> compiled Go library) pipeline.

With this logic in the actual `protoc` plugin proper, we can then safely
use Bazel's built-in stuff without a lot of monkey-patching or code
duplication.

Release note: None

57942: tracing: fix child span optimization r=knz a=tbg

When given a context with a non-recording (but real) Span, we would
return the incoming context. This would lead to an extra premature
call to `Finish()` and subsequent use-after-Finish of the Span, which
can blow up and/or hang, most of the time within net/trace code.

Prior to this commit, the crash reproduced readily (within seconds) via

```
make stress PKG=./pkg/sql TESTS=TestTrace
```

and I had no issues for ~10 minutes with this commit.

Fixes #57875.

Release note: None


Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@craig craig bot closed this as completed in 947bd9c Dec 15, 2020
KV automation moved this from Incoming to Closed Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants