Skip to content

control: forward traces in a non-blocking goroutine #5912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jsternberg
Copy link
Collaborator

Modify the trace forwarder in the controller to process the traces in
another goroutine to prevent the original client from blocking when the
downstream exporter isn't available.

If buildkit was configured to export traces, it would proxy the traces
from a client. This export was blocking and caused the upstream client
to stall until buildkit had successfully sent the trace to its external
collector.

This changes the forwarding to be non-blocking from the client's
perspective.

Fixes #4616.

Modify the trace forwarder in the controller to process the traces in
another goroutine to prevent the original client from blocking when the
downstream exporter isn't available.

If buildkit was configured to export traces, it would proxy the traces
from a client. This export was blocking and caused the upstream client
to stall until buildkit had successfully sent the trace to its external
collector.

This changes the forwarding to be non-blocking from the client's
perspective.

Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
@jsternberg jsternberg force-pushed the trace-forwarder-buildx-stall branch from 601dcdc to ff4cde9 Compare April 14, 2025 18:49

func NewExporter(exp sdktrace.SpanExporter) *Exporter {
return &Exporter{
bsp: sdktrace.NewBatchSpanProcessor(exp),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This seems to drop spans if the processor's queue becomes full. Should we initialize it to block instead in that scenario?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the dropping will only occur if the export fails which would be the same behavior as a span originating in buildkit itself.

Buildkit is acting as a forwarder for the span from buildx and we don't necessarily want to keep these spans if the upstream collector is down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the buffer size is 2048 spans, while buildx creates ~1 span per target. We should be safe even if exporting the spans upstream from BuildKit is a bit slow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think slow is an issue with the processor. I think it's mostly if the export fails where it becomes an issue. If the connection is slow I think it should just wait.

@thompson-shaun thompson-shaun added this to the v0.21.2 milestone May 1, 2025
@thompson-shaun thompson-shaun modified the milestones: v0.21.2, v0.23.0 May 22, 2025
@jsternberg jsternberg modified the milestones: v0.23.0, v0.24.0 Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Buildx will stall when buildkit is configured with OTel but the OTel endpoint isn't available
4 participants