Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 1.50.0: "fatal error: concurrent map read and map write" #1944

Closed
drichelson opened this issue Apr 26, 2023 · 6 comments · Fixed by #1948
Closed

[BUG] 1.50.0: "fatal error: concurrent map read and map write" #1944

drichelson opened this issue Apr 26, 2023 · 6 comments · Fixed by #1948
Labels
ack bug unintended behavior that has to be fixed

Comments

@drichelson
Copy link

drichelson commented Apr 26, 2023

Version of dd-trace-go
1.50.0 (upgrade from 1.48.0 is correlated with this issue)

Describe what happened:
fatal error! relevant go routine dump (with internal stacktrace bits elided):

"2023-04-26T22:05:11.509Z","fatal error: concurrent map read and map write"
"2023-04-26T22:05:11.509Z","goroutine 101135 [running]:"
"2023-04-26T22:05:11.509Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*propagatorW3c).injectTextMap(0xc03a4f7ad8, {0x45fcad0?, 0xc03ecfc2a0?}, {0x7f8aade2fa60, 0xc03c27d650})"
"2023-04-26T22:05:11.509Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/textmap.go:674 +0x312"
"2023-04-26T22:05:11.509Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*propagatorW3c).Inject(0xc03a4f7ac8?, {0x45fcad0, 0xc03ecfc2a0}, {0x38f1b40?, 0xc03c27d650?})"
"2023-04-26T22:05:11.510Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/textmap.go:630 +0x89"
"2023-04-26T22:05:11.510Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*chainedPropagator).Inject(0xc00007ec00?, {0x45fcad0, 0xc03ecfc2a0}, {0x38f1b40, 0xc03c27d650})"
"2023-04-26T22:05:11.510Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/textmap.go:246 +0x83"
"2023-04-26T22:05:11.510Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).Inject(0xc035b934a0?, {0x45fcad0?, 0xc03ecfc2a0?}, {0x38f1b40?, 0xc03c27d650?})"
"2023-04-26T22:05:11.510Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:595 +0x3f"
"2023-04-26T22:05:11.510Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.Inject({0x45fcad0?, 0xc03ecfc2a0?}, {0x38f1b40?, 0xc03c27d650?})"
"2023-04-26T22:05:11.511Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:180 +0x96"

"2023-04-26T22:05:11.646Z","goroutine 2990 [select]:"
"2023-04-26T22:05:11.646Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*concentrator).runFlusher(...)"
"2023-04-26T22:05:11.646Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:106"
"2023-04-26T22:05:11.646Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*concentrator).Start.func1()"
"2023-04-26T22:05:11.646Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:94 +0x136"
"2023-04-26T22:05:11.646Z","created by gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*concentrator).Start"
"2023-04-26T22:05:11.646Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:90 +0xb8"

"2023-04-26T22:05:11.695Z","goroutine 2989 [select]:"
"2023-04-26T22:05:11.696Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).reportHealthMetrics(0xc014a33900, 0x0?)"
"2023-04-26T22:05:11.696Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/metrics.go:101 +0x1bf"
"2023-04-26T22:05:11.696Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func3()"
"2023-04-26T22:05:11.696Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:277 +0x65"
"2023-04-26T22:05:11.696Z","created by gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer"
"2023-04-26T22:05:11.696Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:275 +0x196"

"2023-04-26T22:05:11.753Z","goroutine 2991 [select, 8 minutes]:"
"2023-04-26T22:05:11.753Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*concentrator).runIngester(0xc00db600c0)"
"2023-04-26T22:05:11.753Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:127 +0xae"
"2023-04-26T22:05:11.753Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*concentrator).Start.func2()"
"2023-04-26T22:05:11.753Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:99 +0x5a"
"2023-04-26T22:05:11.754Z","created by gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*concentrator).Start"
"2023-04-26T22:05:11.754Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:97 +0x105"

"2023-04-26T22:05:11.829Z","goroutine 2988 [select]:"
"2023-04-26T22:05:11.829Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker(0xc014a33900, 0xc00dbf4660)"
"2023-04-26T22:05:11.830Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:310 +0xdc"
"2023-04-26T22:05:11.830Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()"
"2023-04-26T22:05:11.830Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:272 +0xba"
"2023-04-26T22:05:11.830Z","created by gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer"
"2023-04-26T22:05:11.830Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:264 +0x145"

"2023-04-26T22:05:11.891Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).reportRuntimeMetrics(0xc014a33900, 0x2540be400?)"
"2023-04-26T22:05:11.891Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/metrics.go:44 +0x1b3"
"2023-04-26T22:05:11.891Z","gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func1()"
"2023-04-26T22:05:11.891Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:260 +0x68"
"2023-04-26T22:05:11.891Z","created by gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer"
"2023-04-26T22:05:11.891Z",".../vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:258 +0xe5"

Describe what you expected:
no fatal errors

Steps to reproduce the issue:
I'm working on this but I figured the go routine dump + general awareness might be helpful to start.

Additional environment details (Version of Go, Operating System, etc.):
running go 1.20.3

@drichelson drichelson added the bug unintended behavior that has to be fixed label Apr 26, 2023
@knusbaum
Copy link
Contributor

Thanks for this report. We're looking into it.

Is there any dd-trace-go code in any of your other stacks?
Except for goroutine 101135 here, the rest are sitting in selects, so while I have a good idea of where the read is occuring, I don't see the write in this set of stacks.

@knusbaum knusbaum added the ack label Apr 27, 2023
knusbaum added a commit that referenced this issue Apr 27, 2023
Spancontext marshaling was accessing tracer internal structures without a
lock, resulting in a data race and panic.

This commit adds a few methods to trace to allow safe access to the tags
and propagatingTags members of trace to the marshaling code.

Fixes #1944
@knusbaum
Copy link
Contributor

@drichelson
I've reassessed access to that map, and I believe the PR (#1948) should fix the issue.

If you have more info on where exactly the race is occurring, that will help us confirm the fix and create a regression test.

knusbaum added a commit that referenced this issue Apr 28, 2023
Spancontext marshaling was accessing tracer internal structures without a
lock, resulting in a data race and panic.

This commit adds a few methods to trace to allow safe access to the tags
and propagatingTags members of trace to the marshaling code.

Fixes #1944
@phoenix2x
Copy link
Contributor

Facing the same issue. Any ETA when the new version with the fix is ready?

@katiehockman
Copy link
Contributor

katiehockman commented May 1, 2023

We are preparing the release now and I expect it to be published within the next 24-48 hours. Thanks for the message.
I'll comment on this issue again once the release is done, or if any unforeseen delays pop up.

katiehockman pushed a commit that referenced this issue May 1, 2023
Spancontext marshaling was accessing tracer internal structures without a
lock, resulting in a data race and panic.

This commit adds a few methods to trace to allow safe access to the tags
and propagatingTags members of trace to the marshaling code.

Fixes #1944
@katiehockman
Copy link
Contributor

v1.50.1 has been released which included the fix for this bug. It would be helpful if you could confirm that upgrading fixes the issue for you.
Thanks again for reporting the issue and for the details you provided.

zARODz11z added a commit that referenced this issue May 8, 2023
…ch as queuename tags

contrib: upgrade labstack/echo/v4 from v4.2.0 to v4.9.0 (#1891)

ci: fix flaky lint job (#1892)

contrib/elasticsearch: use naming schema (#1897)

ci: introduce golangci (#1898)

appsec: suspicious request blocking (#1797)

Co-authored-by: Julio Guerra <julio@datadog.com>

ci/golangci-lint: skip google.golang.org/grpc.v12 (#1899)

.github/workflows: run ASM and RC system-tests scenarios (#1900)

contrib/hashicorp/vault: use naming schema (#1868)

contrib/database/sql: add WithIgnoreQueryTypes option (#1823)

Co-authored-by: Zarir Hamza <zarir.hamza@datadoghq.com>
Co-authored-by: Rodrigo Argüello <rodrigo.arguello@datadoghq.com>

contrib/database/sql: use naming schema (#1895)

internal/appsec: add server.request.method address (#1893)

Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com>
Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com>

internal/appsec/dyngo: atomic instrumentation swapping (#1873)

Co-authored-by: François Mazeau <francois.mazeau@datadoghq.com>

go.mod: datadog-agent/pkg/remoteconfig/state 7.45.0-rc.1 (#1902)

internal/version: bump to v1.51.0 (#1912)

ddtrace/tracer: don't set empty tracestate propagation tag (#1910)

go.mod: github.com/DataDog/datadog-agent/pkg/obfuscate 7.45.0-rc.1 (#1916)

appsec: add blocking SDK body operation (#1901)

* Modifying the appsec api: adding appsec.MonitorParsedHTTPBody an error as return value
* Adding a call to the WAF to check for security event synchronously with a call to appsec.MonitorParsedHTTPBody on the body passed as parameter
* Removing the call to the WAF done on the body an the end of a request because we moved it.
* Refactoring the waf addresses storage and access

Signed-off-by: Eliott Bouhana <eliott.bouhana@datadoghq.com>

ddtrace/{opentelemetry,opentracer}: add telemetry (#1909)

internal/appsec: fix user ID event detection (#1918)

internal/telemetry: track tracer init time metric (#1896)

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

internal/appsec/remoteconfig: fix rules overrides (#1921)

contrib/mongodb: use naming schema (#1908)

contrib/syndtr/goleveldb/leveldb: use naming schema (#1914)

contrib/tidwall/buntdb: use naming schema (#1913)

internal/appsec: do not ignore the appsec events rate limiter (#1927)

remoteconfig: remove empty products and don't override appsec rules data (#1925)

contrib/kafka: refactor tests (#1907)

contrib/google.golang.org/grpc: use naming schema (#1919)

contrib/twitchtv/twirp: use naming schema (#1920)

contrib/http: use naming schema (#1929)

ddtrace/tracer: reset decision maker during fallback behavior of w3c header extraction (#1933)

contrib/cassandra: use naming schema (#1911)

Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com>

contrib/redis: use naming schema (#1906)

Co-authored-by: Andrew Glaude <andrew.glaude@datadoghq.com>

ci/system-tests: more scenarios with parallel jobs (#1938)

ci: update linter job and add bodyclose (#1942)

contrib/redis/go-redis.v9: support v9 (#1730)

Add support for new go-redis version v9.

It does 2 things:
Copy existing version 8 files to a new path, /redis/go-redis.v9.
Make changes to support version 9.

Fixes #1710

format and rerun go tidy

get rid of prints

add topLevelRegion assertions

remove confusing named return values and todo comment

ddtrace/tracer: ensure access to trace tags is concurrency-safe (#1948)

Spancontext marshaling was accessing tracer internal structures without a
lock, resulting in a data race and panic.

This commit adds a few methods to trace to allow safe access to the tags
and propagatingTags members of trace to the marshaling code.

Fixes #1944

ddtrace/tracer: mark context updated when SetUser is called (#1949)

Fixes a minor logic mistake when setting a user on a span

lint and add default switch case

refactor resourceNameKey and value assignments

restructure functions to be left aligned

use internal logger, be less verbose with function names

go back to normal switch type and format

Set keyTraceID128 on first span in the chunk only (#1946)

go.mod: upgrade go-libddwaf to v1.2.0 (#1953)

Co-authored-by: Julio Guerra <julio@datadog.com>

contrib/database/sql: fix bug where options were always overwritten by register options (#1904)

Co-authored-by: Diana Shevchenko <40775148+dianashevchenko@users.noreply.github.com>

ci/smoke-tests: update the go.sum file after go get -u (#1957)

contrib/net/http: don't set empty string values as span tags (#1956)

Do not set span fields when they are not configured so the tracer can put the defaults in.

use normal string then derefence

rever go.mod and go.sum changes

contrib/internal/httptrace: remove naming schema from init (#1960)

contrib/graphql: use naming schema (#1926)

internal/telemetry: trim the dependencies version prefix v (#1963)

contrib/aws: use naming schema (#1931)

contrib/cloud.google.com/go/pubsub.v1: use naming schema (#1937)

go mod tidy

lint and fix test
@phoenix2x
Copy link
Contributor

Confirming that the new version fixes our issues. Thank you for the quick turnaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ack bug unintended behavior that has to be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants