Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental OpenTelemetry Tracing Support #1632

Merged
merged 3 commits into from
May 18, 2022
Merged

Experimental OpenTelemetry Tracing Support #1632

merged 3 commits into from
May 18, 2022

Conversation

moskyb
Copy link
Contributor

@moskyb moskyb commented May 10, 2022

🤔 Problem:

The times, they are a-changing, and so are distributed tracing standards.

The CNCF, in their infinite wisdom, have archived the OpenTracing project, upon which our implementation of tracing in the agent was implemented. We'll continue to support OpenTracing tracing through datadog for the forseeable future, but people are rightly migrating to the newer standard of OpenTelemetry.

OpenTelemetry has a tracing provider that's well-defined, supports multiple backends (Datadog, Jaeger, etc), and is all-around a good and useful piece of software. We should provide functionality within the agent to trace CI runs using OpenTelemetry tracing.

There's a bit of an issue in that while OpenTracing and OpenTelemetry are similar, they're just different enough that it's a bit of a hassle cognitively to have them both in the same place. In this PR you'll see a bunch of stuff where we're doing things with OpenTelemetry spans that are just slightly different to what we'd do with OpenTracing spans. It doesn't hurt (or more accurately, it does hurt) that they share an acronym, and have similar names.

💬 Cool, so what does this PR do?

Building on the work done in #1631, we add a new tracing backend, opentelemetry, and associated utilities in the tracetools and bootstrap packages in order to be able to use them. This new tracing backend is behind an experiment called opentelemetry-tracing, and is not considered stable - we may change the implementation without warning.

I've confirmed that this implementation can egest tracing data into a local Jaeger instance, and that those traces look relatively normal (to my admittedly untrained eye, anyway).

Screen Shot 2022-05-17 at 5 09 18 PM

You can find a JSON dump of the pictured trace here

🙅‍♂️ What doesn't it do?

Notably missing from the OpenTelemetry implementation are equivalents of the methods in tracetools/propagate.go. A large part of this is that after looking into this for quite a while, i'm still not exactly sure of:

  • What these functions actually do in the context of opentracing
  • What the best way to implement them in an OpenTelemetry context is
  • If this functionality gets used

With this in mind, I'm open to bundling this lack of functionality into "well, it's experimental" and only implementing it only if it gets asked for, but this is also kind of a cop out, so if you have more information on this please let me know!

🏴‍☠️ Acknowledgements:

Huge thanks to @rajatvig and his awesome work on #1548 - this PR would be in much, much rougher shape without all their code that i stole inspired me

@KevinGreen
Copy link
Contributor

Hey Ben, this probably isn't the best place to make this comment but I've noticed that rebuilds of the same job will get grouped under the same root span in datadog. This ends up showing the one span as taking as long as the first build + the delay before I click the rebuild button + the second build. It would be nice if there was a way to have a root span per build instead of per pipeline+sha

@moskyb
Copy link
Contributor Author

moskyb commented May 15, 2022

@KevinGreen this is definitely an issue that i'm interested in fixing - could you open an issue on the repo and include a screenshot or two so that i know what i'm looking for?

@moskyb moskyb force-pushed the modular-tracing branch 2 times, most recently from 7c879d1 to fec0ca9 Compare May 16, 2022 23:23
Base automatically changed from modular-tracing to main May 17, 2022 03:04
@moskyb moskyb changed the title WIP AGAIN: Opentelemetry Experimental OpenTelemetry Tracing Support May 17, 2022
@moskyb moskyb requested review from tessereth and pda May 17, 2022 05:20
@moskyb moskyb marked this pull request as ready for review May 17, 2022 05:20
Copy link
Contributor

@tessereth tessereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'm so excited this is happening.

I don't suppose there's anything in tracing.go that's worth unit testing? I guess there aren't that many branches so if it works at all it's probably all working. /shrug

bootstrap/tracing.go Outdated Show resolved Hide resolved
bootstrap/tracing.go Outdated Show resolved Hide resolved
tracetools/span.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants