[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

ivoanjo · 2024-03-06T15:06:04Z

What does this PR do?

This PR adds experimental support for getting profiling code hotspots data (including endpoint profiling) when profiling processes being traced using the opentelemetry ruby gem directly.

Note that this differs from the recommended way of using opentelemetry with the ddtrace library, which is to follow the instructions from https://docs.datadoghq.com/tracing/trace_collection/custom_instrumentation/otel_instrumentation/ruby/ .

The key difference is -- this PR makes code hotspots work even for setups that opt to not use require 'datadog/opentelemetry' (which is the recommended and easier way).

The approach taken here is similar to #2342 and #3466: we peek inside the implementation of the opentelemetry gem to extract the information we need (namely the span id, local root span id, trace type, and trace endpoint). This approach is potentially brittle, which is why the code is written very defensively, with the aim of never breaking the application (or profiling) if something is off -- it just won't collect code hotspots.

Motivation:

We have a customer interested in running this setup, so hopefully they'll be able to test this PR and validate if it works for them.

Furthermore, I'm hoping to see if the opentelemetry Ruby folks would be open to tweaking their APIs to be more friendlier to tools such as the profiler, but for now I opted for getting our hands dirt.

Additional Notes:

I'm opening this PR as draft until we can get feedback from the customer and see if this works for them.

How to test the change?

On top of the added test coverage, I was able to see code hotspots working for the following sinatra example app:

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'

  gem 'rackup'
  gem 'dogstatsd-ruby'
  gem 'ddtrace', git: 'https://github.com/datadog/dd-trace-rb', branch: 'ivoanjo/prof-9263-otlp-ruby-code-hotspots'
  gem 'sinatra'
  gem 'opentelemetry-api'
  gem 'opentelemetry-sdk'
  gem 'opentelemetry-instrumentation-sinatra'
  gem 'opentelemetry-exporter-otlp'
  gem 'pry'
end

require 'sinatra/base'
require 'opentelemetry/sdk'
require 'pry'

Datadog.configure do |c|
  c.service = 'ivoanjo-testing-opentelemetry-test'
  c.profiling.enabled = true
end

# Configure OpenTelemetry
OpenTelemetry::SDK.configure do |c|
  c.service_name = 'ivoanjo-testing-opentelemetry-test'
  c.use 'OpenTelemetry::Instrumentation::Sinatra'
end

class MyApp < Sinatra::Base
  get '/' do
    OpenTelemetry::Trace.current_span.add_attributes({'runtime-id' => Datadog::Core::Environment::Identity.id})
    sleep 1
    'Hello, OpenTelemetry!'
  end
end

MyApp.run!

After doing a few requests, here's how this looks:

For Datadog employees:

If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.

…tel gem Things missing: * Specs conflict with ddtrace otel specs (need to poke at appraisals) * Missing endpoint support

While we don't need the actual span object to read the span ids, we will need it to read the endpoint names.

… specs I'm... unhappy about this, but couldn't think of anything better that wouldn't involve refactoring the ddtrace tracing otel support and that seems even worse.

Sigh old rubies...

AlexJF

LGTM! And so safe you should be designing baby furniture 😄

AlexJF · 2024-03-06T15:31:03Z

ext/datadog_profiling_native_extension/collectors_thread_context.c

@@ -734,6 +765,11 @@ static void trigger_sample_for_thread(
  struct trace_identifiers trace_identifiers_result = {.valid = false, .trace_endpoint = Qnil};
  trace_identifiers_for(state, thread, &trace_identifiers_result);

+  if (!trace_identifiers_result.valid) {


Worth/possible doing a bit of extra work at the start to arrive at a sticky decision here and short-circuit constantly failing trace_identifiers_for with all its rb_var_get if ddtrace is not used for tracing at all?

Or thinking is that we want to support situations where there's a mix of ddtrace and pure-ot traces and/or the ability to change between one and the other dynamically (e.g. via a feature flag)?

I think your suggestion makes sense.

My intent here in checking both is that the profiler may start quite early in the app lifecycle, so we may not know which one is going to be used yet.

Or thinking is that we want to support situations where there's a mix of ddtrace and pure-ot traces and/or the ability to change between one and the other dynamically (e.g. via a feature flag)?

I'm not sure mixing is even possible at this point, since the ddtrace otel support monkey patches itself pretty deep into opentelemetry (which is why I needed to contort a bit to be able to test both).

For that reason, and after our last discussion, I think it makes sense to stop checking opentelemetry once we see data coming from ddtrace traces.

The reverse is harder to figure out, actually. It would be weird, but not impossible, for an app that started with opentelemetry to then switch over to ddtrace.

TL;DR: I'll wait for feedback from our customer on how this is working before acting on this comment, just in case we end up going in a completely different direction BUT I'll definitely come back to it before marking the PR as non-draft.

ivoanjo added 7 commits March 6, 2024 13:51

[PROF-9263] First working version of getting trace identifiers from o…

e33d982

…tel gem Things missing: * Specs conflict with ddtrace otel specs (need to poke at appraisals) * Missing endpoint support

Refactor otel_span_context_from to also return span

df5a9c5

While we don't need the actual span object to read the span ids, we will need it to read the endpoint names.

Add support for collecting trace endpoint from otel spans

ca5bdf4

Add coverage for trace with invalid span

bc266ec

Allow specs for otel sdk without ddtrace to co-exist with the ddtrace…

66ade00

… specs I'm... unhappy about this, but couldn't think of anything better that wouldn't involve refactoring the ddtrace tracing otel support and that seems even worse.

Fix specs breaking on Ruby 2.3 due to missing String#unpack1

6b3bc3f

Sigh old rubies...

Add appraisal gemfiles/lockfiles for opentelemetry_otlp configuration

796ea12

ivoanjo requested review from a team as code owners March 6, 2024 15:06

ivoanjo marked this pull request as draft March 6, 2024 15:06

github-actions bot added the profiling Involves Datadog profiling label Mar 6, 2024

AlexJF reviewed Mar 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

ivoanjo commented Mar 6, 2024

AlexJF left a comment

AlexJF Mar 6, 2024 •

edited

Loading

ivoanjo Mar 8, 2024

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

Are you sure you want to change the base?

[PROF-9263] Add experimental support for profiling code hotspots when used with opentelemetry ruby gem #3510

Conversation

ivoanjo commented Mar 6, 2024

AlexJF left a comment

Choose a reason for hiding this comment

AlexJF Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

ivoanjo Mar 8, 2024

Choose a reason for hiding this comment

AlexJF Mar 6, 2024 •

edited

Loading