Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we clarify why external services spans are not Measured ? #1006

Closed
hexa00 opened this issue Sep 13, 2021 · 9 comments
Closed

Can we clarify why external services spans are not Measured ? #1006

hexa00 opened this issue Sep 13, 2021 · 9 comments

Comments

@hexa00
Copy link
Contributor

hexa00 commented Sep 13, 2021

See this commit : 8fb554f

It mentions "We also don't measure spans that are for integrations for external service clients. This is a very generic class of integrations, and includes any integration for a package that is primarily an API to an external service. This includes SQL integrations, the aws integration, the client portion of the grpc integration, the http client integration, etc. This is actually most of our integrations. The reason for this is that there can be conflicts when measuring client spans that may cause inaccurate metrics."

This is a problem as un-measured spans can't be used in Monitors, thus for example if you want to setup a monitor on the error rate of your ElastiSearch client, well you can't, despite the data showing up on the dashboard.

See:

opts := []ddtrace.StartSpanOption{

@knusbaum Could you please elaborate on what these conflict issues are? As this is quite frustrating :(

I'm open to workaround suggestions too.

Thanks

@knusbaum
Copy link
Contributor

knusbaum commented Sep 14, 2021

Hi, @hexa00.

Sorry for the frustration. The basic reasoning is that metrics today are calculated based on service name and operation name. This can cause clashes when a client and server both use the same operation name.

e.g. net/http server tracing:

span, ctx := tracer.StartSpanFromContext(cfg.Request.Context(), "http.request", opts...)

and net/http client tracing:

span, ctx := tracer.StartSpanFromContext(req.Context(), "http.request", opts...)

The decision was then made at a product level (across all languages) to have client integration spans not be measured.

There is ongoing work to improve this situation, but in the short term we can look at solutions for allowing users to change the default behavior. We already have the Measured StartSpanOption. Any integration which does not already have it can likely be given a WithSpanOptions Option like gorilla/mux and several others, which will allow configuring the integration with the Measured Option.

What do you think?

@hexa00
Copy link
Contributor Author

hexa00 commented Sep 14, 2021

@knusbaum Thanks for the clarification about the conflicts, that makes more sense to me now!

I would be in favor of adding a WithSpanOptions attribute for this use case and likely others.

However I guess this doesn't help in the conflicting situation you described, since we can't override the operation name with an option :(

I think the only way out of this would be to prefix/namespace the operation names so that they reflect their client/server nature... but would likely break a lot of things for people ?

Also another idea I had as a workaround for us is to add the Measured() tag at the agent level so all the spans have it but we're unclear if the agent will assign tags to the spans or only to metrics... would you know about that ?

@knusbaum
Copy link
Contributor

@hexa00
Thanks for the feedback. Let's start with adding the WithSpanOption Option and potentially a span name override if necessary, in case someone is using both client and server and wants to change the operation name of one.

Prefixing the operation names is not a bad idea, but the issue is that they are currently standard across the product (i.e. http.request is the same across Go, Python, PHP, etc.) so we can't unilaterally change them. Also it will likely cause issues for existing users (metrics suddenly have different names than they used to).

Adding the measured tag in the agent would likely result in an explosion of metrics, which may have an effect on billing, IIRC.

More likely, we would do something like the processing pipeline (#360) that would allow users to modify spans/traces as they're completed before they get flushed to the agent.

Please reach out to our support team as well and reference this issue. There is ongoing internal conversation about this issue and getting user input will be very helpful.

@randallt
Copy link

In the Java tracer, incoming HTTP spans typically have the operation name "http.request" and outgoing HTTP calls are typically "http.client.request". Why is it different in different languages?

@knusbaum
Copy link
Contributor

@randallt This is likely a decision that was made a long time ago. The various tracers used to be a lot less cohesive in their implementations and tag choices. We've made a concerted effort to unify them over time, but there are still numerous deviations here and there. I think Python might also use http.client.request in some cases.

http.client.request wouldn't necessarily be a bad choice, but it's not used consistently.

@knusbaum
Copy link
Contributor

CC: @andrewsouthard1
There's interest from multiple customers in this issue, but it requires cross-team planning.

@randallt
Copy link

Yeah, that's regrettable since the Datadog APM UI doesn't let you choose span.kind=client. You only get the operation name and the resource name. So it's important to distinguish incoming vs outgoing using either operation name or resource name.

@knusbaum
Copy link
Contributor

@randallt
Expanding the metrics aggregation key to include a span.kind as well as operation and service is one option being discussed. I think that would largely solve the issue.

@knusbaum knusbaum added the ack label Dec 16, 2021
@knusbaum
Copy link
Contributor

There is a note added to the FAQ about this: https://github.com/DataDog/dd-trace-go/blob/main/FAQ.md#why-do-client-integration-spans-not-use-the-global-service-name

Please re-open this ticket if there is more work we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants