Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmselect panic caused by query tracer data race #5319

Closed
luckyxiaoqiang opened this issue Nov 14, 2023 · 5 comments
Closed

vmselect panic caused by query tracer data race #5319

luckyxiaoqiang opened this issue Nov 14, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@luckyxiaoqiang
Copy link
Contributor

luckyxiaoqiang commented Nov 14, 2023

Describe the bug

There is a vm cluster of version v1.79.6 in our product environment. The vmselect component panics every several days.

Here are the panic logs

panic: runtime error: invalid memory address or nil pointer dereference

goroutine 1647633 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper.func1()
        github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:226 +0x65
panic({0x920a00, 0xfa09f0})
        runtime/panic.go:1047 +0x266
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc0003f96e0, {0x39a1ebd06816, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:206 +0x2b6
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc0003f9140, {0xc000114cd8, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc0003f8f60, {0x0, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc0003f8e40, {0x39a1ebeb4198, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc0003f8de0, {0x39a1ebeb43bc, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc0003f8d80, {0x0, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc00111e900, {0x39a1ebeb4a53, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc00111e8a0, {0x0, 0xffc880, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal(0xc000e083c0, {0xc00097b000, 0x8, 0xffc880})
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x2c8
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpan(0xc000e083c0)
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:178 +0x31
github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).ToJSON(0x90be60)
        github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer/tracer.go:169 +0x30
github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.streamdumpQueryTrace(0xc0000f93b0, 0x98ffd9)
        app/vmselect/prometheus/util.qtpl:50 +0x25
github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.StreamQueryResponse(0xc0000f93b0, 0x0, {0xc0000f9340, 0x1, 0x1}, 0xffbb60, 0xc0008c7258)
        app/vmselect/prometheus/query_response.qtpl:37 +0x245
github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.WriteQueryResponse({0xb30460, 0xc000118fc0}, 0x80, {0xc0000f9340, 0x1, 0x1}, 0x18, 0xc0011218a8)
        app/vmselect/prometheus/query_response.qtpl:39 +0x7e
github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.QueryHandler(0xc000e083c0, {0xc000a1ca80, 0xffffffffffffffff, 0xffc880}, 0xc001111f80, {0xb36380, 0xc000a1ca80}, 0xc000b33800)
        github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus/prometheus.go:831 +0x12c6
main.selectHandler(0xc001335c8d, {0x4, 0x978784, 0xffc880}, {0xb36380, 0xc000a1ca80}, 0xc000b33800, 0xc000fcec90, 0xc001111f80)
        github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/main.go:366 +0x1ec5
main.requestHandler({0xb36380, 0xc000a1ca80}, 0xc000b33800)
        github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/main.go:252 +0x9e5
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper(0xc0001170f0, {0xb36380, 0xc000a1ca80}, 0xc000b33800, 0xac2ff0)
        github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:334 +0xddd
github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1({0xb36380, 0xc000a1ca80}, 0xc000e4de80)
        github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver/httpserver.go:194 +0x76
net/http.HandlerFunc.ServeHTTP(0x0, {0xb36380, 0xc000a1ca80}, 0xc0008c7b28)
        net/http/server.go:2047 +0x2f
net/http.serverHandler.ServeHTTP({0xb35a20}, {0xb36380, 0xc000a1ca80}, 0xc000b33800)
        net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc002a365a0, {0xb39c00, 0xc000d92db0})
        net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
        net/http/server.go:3034 +0x4e8

And, below are the args

-replicationFactor=2
-search.latencyOffset=30s
-search.maxConcurrentRequests=8
-search.maxPointsPerTimeseries=30000
-search.maxQueryDuration=30s
-search.maxQueueDuration=30s
-search.maxSamplesPerQuery=1000000000
-search.maxSamplesPerSeries=30000000
-search.maxUniqueTimeseries=300000
-search.minStalenessInterval=5m
-dedup.minScrapeInterval=1ms
-loggerLevel=INFO
-storageNode=storage-0:8401
-storageNode=storage-1:8401
-storageNode=storage-2:8401
-storageNode=storage-3:8401
-storageNode=storage-4:8401
-storageNode=storage-5:8401

To Reproduce

It is very hard to reproduce the same problem totally. But we have reproduced similar problem offline with go race detector.

We run vmselect with args

-replicationFactor=2
-search.latencyOffset=30s
-search.maxConcurrentRequests=8
-search.maxPointsPerTimeseries=30000
-search.maxQueryDuration=30s
-search.maxQueueDuration=30s
-search.maxSamplesPerQuery=1000000000
-search.maxSamplesPerSeries=30000000
-search.maxUniqueTimeseries=300000
-search.minStalenessInterval=5m
-dedup.minScrapeInterval=1ms
-loggerLevel=INFO
-storageNode=storage-0:8401
-storageNode=storage-1:8401

and turn on the go race flag.

Here are the data race logs

==================
WARNING: DATA RACE
Read at 0x00c000206070 by goroutine 188:
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).getLastChildDoneTime()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:222 +0xbd
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).getLastChildDoneTime()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:223 +0xf2
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:200 +0x2bc
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x6e8
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x6e8
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x6e8
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpanInternal()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:207 +0x6e8
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).toSpan()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:178 +0x3c
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).ToJSON()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:169 +0x48
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.streamdumpQueryTrace()
      app/vmselect/prometheus/util.qtpl:50 +0x35
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.StreamQueryResponse()
      app/vmselect/prometheus/query_response.qtpl:37 +0x2ba
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.WriteQueryResponse()
      app/vmselect/prometheus/query_response.qtpl:39 +0x95
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.QueryHandler()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/prometheus/prometheus.go:831 +0x12af
  main.selectHandler()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/main.go:366 +0x2a29
  main.requestHandler()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/main.go:252 +0x956
  github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/httpserver/httpserver.go:334 +0x12ec
  github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/httpserver/httpserver.go:194 +0x97
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:2122 +0x4d
  net/http.serverHandler.ServeHTTP()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:2936 +0x682
  net/http.(*conn).serve()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:1995 +0xbd4
  net/http.(*Server).Serve.func3()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:3089 +0x58

Previous write at 0x00c000206070 by goroutine 191:
  github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer.(*Tracer).AddJSON()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/querytracer/tracer.go:145 +0x2e6
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.readTrace()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1754 +0x204
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.(*storageNode).execOnConn()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1736 +0xc9e
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.(*storageNode).execOnConnWithPossibleRetry()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1652 +0x115
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.(*storageNode).processSearchQuery()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1647 +0x107
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.ProcessBlocks.func2()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1338 +0x134
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.startStorageNodesRequest.func1()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1370 +0x62
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.startStorageNodesRequest.func2()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1373 +0x58

Goroutine 188 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:3089 +0x817
  github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.serveWithListener()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/httpserver/httpserver.go:139 +0x3fe
  github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.Serve()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/httpserver/httpserver.go:107 +0x5e8
  main.main.func1()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/main.go:112 +0x54

Goroutine 191 (running) created at:
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.startStorageNodesRequest()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1369 +0x9d
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.ProcessBlocks()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1336 +0x35c
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage.ProcessSearchQuery()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/netstorage/netstorage.go:1292 +0x45a
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql.evalRollupFuncWithMetricExpr()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/promql/eval.go:940 +0x10a5
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql.evalRollupFuncWithoutAt()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/promql/eval.go:710 +0xf3a
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql.evalRollupFunc()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/promql/eval.go:651 +0x9b7
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql.evalExprInternal()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/promql/eval.go:247 +0xc19
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql.evalExpr()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/promql/eval.go:226 +0x364
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql.Exec()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/promql/exec.go:45 +0x2d2
  github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus.QueryHandler()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/prometheus/prometheus.go:812 +0xc5d
  main.selectHandler()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/main.go:366 +0x2a29
  main.requestHandler()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/app/vmselect/main.go:252 +0x956
  github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.handlerWrapper()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/httpserver/httpserver.go:334 +0x12ec
  github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver.gzipHandler.func1()
      /Users/bytedance/code.byted.org/infcp/VictoriaMetrics/lib/httpserver/httpserver.go:194 +0x97
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:2122 +0x4d
  net/http.serverHandler.ServeHTTP()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:2936 +0x682
  net/http.(*conn).serve()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:1995 +0xbd4
  net/http.(*Server).Serve.func3()
      /usr/local/Cellar/go/1.20.3/libexec/src/net/http/server.go:3089 +0x58
==================

Version

v1.79.6-cluster.
We run a newer version v1.93.7-cluster with -skipSlowReplicas=true, and met similar problem.

Logs

No response

Screenshots

No response

Used command-line flags

No response

Additional information

Most likely related code

@luckyxiaoqiang luckyxiaoqiang added the bug Something isn't working label Nov 14, 2023
@luckyxiaoqiang luckyxiaoqiang changed the title Panic caused by query tracer data race vmselect panic caused by query tracer data race Nov 14, 2023
f41gh7 added a commit that referenced this issue Nov 14, 2023
it was incorrecly captured by goroutine function during for range loop
#5319
@f41gh7
Copy link
Contributor

f41gh7 commented Nov 14, 2023

Thanks for reporting this issue. Currently tracing package is design for single thread usage. But actually, it maybe called concurrently.

It causes various data races and those specific panic.

I think, it's better to make tracing package thread safe instead of fixing unsafe usage of it at concurrent code. Such as collect results.

@valyala
Copy link
Collaborator

valyala commented Nov 14, 2023

The panic at query tracer should be fixed in the commit 0730c25 . This commit will be included in the next release.

@valyala
Copy link
Collaborator

valyala commented Nov 15, 2023

FYI, the commit has been included in v1.87.11 LTS release. The commit will be included in the new release as well.

@valyala
Copy link
Collaborator

valyala commented Nov 15, 2023

FYI, the bugfix has been included in v1.93.8 LTS release.

@valyala
Copy link
Collaborator

valyala commented Nov 15, 2023

The panic related to enabled query tracing at vmselect has been fixed starting from v1.95.0 release. Closing the issue as fixed then.

@valyala valyala closed this as completed Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants