app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

f41gh7 · 2023-11-14T12:16:07Z

it was incorrecly captured by goroutine function during for range loop #5319

codecov · 2023-11-14T12:20:22Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (72a4053) 58.92% compared to head (e2f434f) 58.89%.

Files	Patch %	Lines
app/vmselect/netstorage/netstorage.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           cluster    #5320      +/-   ##
===========================================
- Coverage    58.92%   58.89%   -0.03%     
===========================================
  Files          405      405              
  Lines        76903    76903              
===========================================
- Hits         45312    45289      -23     
- Misses       29081    29097      +16     
- Partials      2510     2517       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

luckyxiaoqiang · 2023-11-14T12:48:42Z

app/vmselect/netstorage/netstorage.go

 			timeseriesWorker(qtChild, workChs, workerID)
 			qtChild.Done()
 			wg.Done()
-		}(uint(i))
+		}(uint(i), qtChild)


According to my tests, it seems that not the root cause.
Most likely related code maybe:

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/v1.93.7-cluster/app/vmselect/netstorage/netstorage.go#L1700-L1706

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/v1.93.7-cluster/app/vmselect/netstorage/netstorage.go#L1790-L1800

With skipSlowReplicas set, collectResults will not wait all storage node result. When marshaling the query tracer, the later storage result will be received and update query tracer concurrently, which caused data race, and maybe panic.

Yes, marshaling causes this data race most likely. Need to properly fix it.

app/vmselect: fixes datarace caused by incorrect query tracer usage

e2f434f

it was incorrecly captured by goroutine function during for range loop #5319

f41gh7 requested review from valyala, zekker6 and hagen1778 November 14, 2023 12:16

zekker6 approved these changes Nov 14, 2023

View reviewed changes

luckyxiaoqiang reviewed Nov 14, 2023

View reviewed changes

f41gh7 closed this Nov 14, 2023

f41gh7 deleted the gh-5319 branch November 14, 2023 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

f41gh7 commented Nov 14, 2023

codecov bot commented Nov 14, 2023 •

edited

luckyxiaoqiang Nov 14, 2023

f41gh7 Nov 14, 2023

app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

Conversation

f41gh7 commented Nov 14, 2023

codecov bot commented Nov 14, 2023 • edited

Codecov Report

luckyxiaoqiang Nov 14, 2023

Choose a reason for hiding this comment

f41gh7 Nov 14, 2023

Choose a reason for hiding this comment

codecov bot commented Nov 14, 2023 •

edited