Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app/vmselect: fixes datarace caused by incorrect query tracer usage #5320

Closed
wants to merge 1 commit into from

Conversation

f41gh7
Copy link
Contributor

@f41gh7 f41gh7 commented Nov 14, 2023

it was incorrecly captured by goroutine function during for range loop #5319

it was incorrecly captured by goroutine function during for range loop
#5319
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (72a4053) 58.92% compared to head (e2f434f) 58.89%.

Files Patch % Lines
app/vmselect/netstorage/netstorage.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           cluster    #5320      +/-   ##
===========================================
- Coverage    58.92%   58.89%   -0.03%     
===========================================
  Files          405      405              
  Lines        76903    76903              
===========================================
- Hits         45312    45289      -23     
- Misses       29081    29097      +16     
- Partials      2510     2517       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

timeseriesWorker(qtChild, workChs, workerID)
qtChild.Done()
wg.Done()
}(uint(i))
}(uint(i), qtChild)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my tests, it seems that not the root cause.
Most likely related code maybe:

With skipSlowReplicas set, collectResults will not wait all storage node result. When marshaling the query tracer, the later storage result will be received and update query tracer concurrently, which caused data race, and maybe panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, marshaling causes this data race most likely. Need to properly fix it.

@f41gh7 f41gh7 closed this Nov 14, 2023
@f41gh7 f41gh7 deleted the gh-5319 branch November 14, 2023 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants