-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hubble: add support for filtering by trace ID #21551
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @rolinh! One small non-blocking question, and more importantly I believe TraceIDFilter
should be added to the DefaultFilters slice.
20e0217
to
55792ed
Compare
@@ -331,6 +331,7 @@ multiple fields are set, then all fields must match for the filter to match. | |||
| tcp_flags | [TCPFlags](#flow-TCPFlags) | repeated | tcp_flags filters flows based on TCP header flags | | |||
| node_name | [string](#string) | repeated | node_name is a list of patterns to filter on the node name, e.g. "k8s*", "test-cluster/*.domain.com", "cluster-name/" etc. | | |||
| ip_version | [IPVersion](#flow-IPVersion) | repeated | filter based on IP version (ipv4 or ipv6) | | |||
| trace_id | [string](#string) | repeated | trace_id filters flows by trace ID | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about the trace id semantics. The only available trace id is the one of the parent. Is it realistic to assume that we eventually support other trace IDs other than the parent? In which case, does this API still make sense if we have multiple trace IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. The trace ID semantics we support are the ones defined by the W3C Trace Context specification. The parent represents the incoming request in a tracing system and allows to identify the incoming request. The trace ID within the parent is an identifier that is used to uniquely identify a distributed trace. In contrast, the parent ID, which is more commonly known as the span ID, tracks a unit of work or operation. So a trace is made of one or more spans (or parents) that each contain a reference to the same trace ID to identify the trace.
In this model, I don't think there is room for multiple trace IDs (as in, spans/parents that belong to multiple traces). At least, this is my understanding of it. Let's maybe pull in someone more knowledgeable on the matter though 🙂 cc @chancez @lambdanis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting question. If context propagation is done only via traceparent
header following the W3C spec - as @rolinh explained - each span has a single parent and all spans in a trace share the same trace ID. In other words, a trace is a tree of spans with the same trace ID.
That being said, trace context is not the only propagation method. There are a few "legacy" or vendor-specific ones, and there is a (fairly new) W3C Baggage spec, supported by OpenTelemetry. Baggage can be used to pass links to multiple spans instead of just a parent (see e.g. OpenTelemetry overview). A classic use case is batching, where multiple traces converge into a single operation.
So yeah, if we want Hubble to support different propagation methods than trace context, then it's possible to have multiple trace IDs. I'm not sure how widely baggage is used today, but I think it has a future, so I'd consider it realistic we eventually support it.
Regarding semantics of trace ID in flows - it's simply a trace ID that was propagated in the request. Currently we support only a single parent scenario. We may support many parents scenario (batching) or some other relationships, but I think flows filtering would be similar: check if the flow "belongs" to any of the traces, what's defined by the propagated trace IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed reply @lambdanis ! I think that the TLDR is that the current filtering implementation is probably good but we may need to revisit it down the road if/when we add support for the baggage spec.
a6b584b
to
6d9efa7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more nit, other than that LGTM
6d9efa7
to
7aa835d
Compare
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
7aa835d
to
095600a
Compare
/test |
Marking as ready to merge as CI is green and all reviews are in except no one could be found to review the API changes apparently (Am I the only one assigned to review API changes?). |
PR #21456 added a trace context to Hubble flows, notably a trace ID field. This PR adds a new flow filter such that clients may filter flows by trace ID.