Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions app/vmselect/netstorage/netstorage.go
Original file line number Diff line number Diff line change
Expand Up @@ -353,11 +353,11 @@
for i := range workChs {
wg.Add(1)
qtChild := qt.NewChild("worker #%d", i)
go func(workerID uint) {
go func(workerID uint, qtChild *querytracer.Tracer) {

Check warning on line 356 in app/vmselect/netstorage/netstorage.go

View check run for this annotation

Codecov / codecov/patch

app/vmselect/netstorage/netstorage.go#L356

Added line #L356 was not covered by tests
timeseriesWorker(qtChild, workChs, workerID)
qtChild.Done()
wg.Done()
}(uint(i))
}(uint(i), qtChild)

Check warning on line 360 in app/vmselect/netstorage/netstorage.go

View check run for this annotation

Codecov / codecov/patch

app/vmselect/netstorage/netstorage.go#L360

Added line #L360 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my tests, it seems that not the root cause.
Most likely related code maybe:

With skipSlowReplicas set, collectResults will not wait all storage node result. When marshaling the query tracer, the later storage result will be received and update query tracer concurrently, which caused data race, and maybe panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, marshaling causes this data race most likely. Need to properly fix it.

}
wg.Wait()

Expand Down
1 change: 1 addition & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ The sandbox cluster installation is running under the constant load generated by
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly decode Snappy-encoded data blocks received via [VictoriaMetrics remote_write protocol](https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301).
* BUGFIX: [vmstorage](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): prevent deleted series to be searchable via `/api/v1/series` API if they were re-ingested with staleness markers. This situation could happen if user deletes the series from the target and from VM, and then vmagent sends stale markers for absent series. Thanks to @ilyatrefilov for the [issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5069) and [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5174).
* BUGFIX: [vmstorage](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): log warning about switching to ReadOnly mode only on state change. Before, vmstorage would log this warning every 1s. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159) for details.
* BUGFIX: [vmselect](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html): fixes panic during requests with enabled tracing. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5319) for details.
* BUGFIX: [vmauth](https://docs.victoriametrics.com/vmauth.html): show browser authorization window for unauthorized requests to unsupported paths if the `unauthorized_user` section is specified. This allows properly authorizing the user. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5236) for details.
* BUGFIX: [vmauth](https://docs.victoriametrics.com/vmauth.html): properly proxy requests to HTTP/2.0 backends and properly pass `Host` header to backends.
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix the `Disable cache` toggle at `JSON` and `Table` views. Previously response caching was always enabled and couldn't be disabled at these views.
Expand Down