Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otel): use BatchQueue to avoid worker-level table data race #9504

Merged
merged 4 commits into from
Oct 28, 2022

Conversation

mayocream
Copy link
Contributor

@mayocream mayocream commented Oct 8, 2022

Summary

Fix an issue where OpenTelemetry spans are randomly stripped and not sent to the backend.

@mayocream mayocream requested a review from a team as a code owner October 8, 2022 06:07
@mayocream mayocream added this to the 3.0.1 milestone Oct 8, 2022
@sumimakito sumimakito changed the title fix(otel) spans are ramdomly stripped and not sent to the backend fix(otel) spans are randomly stripped and not sent to the backend Oct 8, 2022
@mayocream
Copy link
Contributor Author

Oops, thanks for fixing the title typo @sumimakito (*´∇`*)

@chronolaw chronolaw changed the title fix(otel) spans are randomly stripped and not sent to the backend fix(otel): spans are randomly stripped and not sent to the backend Oct 10, 2022

spans_queue[len] = pb_span
spans_queue.n = len
insert(spans_queue, pb_span)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be slow as table length is actually relatively expensive to calculate in Lua. It is common to store the length of the array in spans_queue[0] and then do the previous logic instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing, used BatchQueue to replace the worker-level cache table. I have an idea that only the privileged worker handles the queue and sends requests; the other workers insert data into the queue through the new events library. That could reduce the performance impact on normal workers.

@fffonion
Copy link
Contributor

Discussed with @mayocream we will need more work to properly fix this issue, thus moving out of 3.01 scope

@fffonion fffonion removed this from the 3.0.1 milestone Oct 19, 2022
@mayocream mayocream changed the title fix(otel): spans are randomly stripped and not sent to the backend fix(otel): use BatchQueue to avoid worker-level table data race Oct 26, 2022
@hbagdi hbagdi requested review from dndx and fffonion October 26, 2022 17:12
@fffonion fffonion merged commit 8298788 into master Oct 28, 2022
@fffonion fffonion deleted the fix/otel-data-race branch October 28, 2022 06:39
mayocream added a commit that referenced this pull request Nov 2, 2022
fffonion pushed a commit that referenced this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants