Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"context canceled" is Added as a Span Event on cortex.ingester/QueryStream Trace #5702

Open
kennytrytek-wf opened this issue Dec 7, 2023 · 2 comments

Comments

@kennytrytek-wf
Copy link

kennytrytek-wf commented Dec 7, 2023

Describe the bug
When a QueryStream operation ends due to the context being canceled, the error is added to the trace's span event.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex with tracing enabled.
  2. Run a query that the distributor sends to more than one ingester.

Expected behavior
If an ingester context is canceled during the query (which I understand is normal operation of cortex?), then the operation results in an OK span status with no attached span event.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm

Additional Context

The span in question:

spanlog, ctx := spanlogger.New(stream.Context(), "QueryStream")
defer spanlog.Finish()

Similar issue from the past:
#1279

How it was fixed in WeaveWorks:
https://github.com/weaveworks/common/pull/148/files

An example of a failing trace. In this example, there were five parallel query streams, and the one that was canceled was the slowest.
Screenshot 2023-12-07 at 2 42 26 PM

And its span event:
Screenshot 2023-12-07 at 2 42 43 PM

@yeya24
Copy link
Contributor

yeya24 commented Dec 8, 2023

The span /cortex.Ingester/QueryStream was instrumented automatically by gRPC tracing middleware I think. It is not

spanlog, ctx := spanlogger.New(stream.Context(), "QueryStream")
defer spanlog.Finish()
codepath.

We need to change gRPC middleware library behavior to ignore context canceled error. I am not sure if it is something we can do easily.

@kennytrytek-wf
Copy link
Author

As a workaround, I added a transform processor to our Mimir OpenTelemetry collector that watches for this case and sets the span status to OK.

processors:
  transform/cortexquerycontextcanceledspanevent:
    error_mode: ignore
    trace_statements:
      - context: spanevent
        statements:
          - set(span.status.code, 1) where (span.name == "/cortex.Ingester/QueryStream" and attributes["message"] == "context canceled")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants