Automatically send traces to agent Unix socket if present #1700

marcotc · 2021-09-27T23:00:48Z

When ddtrace transport options are not provided, this PR now automatically connects ddtrace to an existing agent unix socket if one is present at /var/run/datadog/apm.socket, the default path for the Datadog agent Kubernetes Helm chart deployment. If such file doesn't exist, it falls back to the default host:port connection method.

This change has no impact on users that have any transport option configured, regardless of their source (env var, c.hostname = ..., c.tracer.transport_options = proc { ...), neither it affects clients where the file /var/run/datadog/apm.socket is not present in the local file system.

This change should ease onboarding for Kubernetes workflows, and bring ddtrace in line with tracers from other APM languages.

marcotc · 2021-09-27T23:04:38Z

docker-compose.yml

  ddagent:
    image: datadog/agent
    environment:
      - DD_APM_ENABLED=true
      - DD_BIND_HOST=0.0.0.0
-      - DD_API_KEY=00000000000000000000000000000000
+      - "DD_API_KEY=${DD_API_KEY}"


I've parametrized DD_API_KEY to make it easier to submit real data to our backend if convenient for testing (like it was for me).

marcotc · 2021-09-27T23:05:50Z

docker-compose.yml

  ddagent:
    image: datadog/agent
    environment:
      - DD_APM_ENABLED=true
      - DD_BIND_HOST=0.0.0.0
-      - DD_API_KEY=00000000000000000000000000000000
+      - "DD_API_KEY=${DD_API_KEY}"
+      - DD_APM_RECEIVER_SOCKET=/var/run/datadog/apm.socket


DD_APM_RECEIVER_SOCKET is the environment variable used by our Kubernetes Heml chart to expose the agent Unix socket: https://github.com/DataDog/helm-charts/blob/4ea229d38cdf27e6c43bce012f776c6e4deeba10/charts/datadog/templates/_container-trace-agent.yaml#L41-L42

marcotc · 2021-09-27T23:11:51Z

lib/ddtrace/ext/transport.rb

@@ -3,6 +3,7 @@ module Datadog
  module Ext
    module Transport
      module HTTP
+        ADAPTER = :net_http


Because I had to add a few new constants here, I ended up performing a small refactor around transport constants, as you'll see in the files to follow.

I'm not entirely sure it's in our interest to expand the number of places where we use "net_http" explicitly. Should we just call this "http", to leave some space when in the future we want to move away from "net_http"?

It seems a bit too low-level for most customers to be able to specify this; for instance for "test" and "unix" we only specify the method the connection will use, not really the implementation.

I think it makes sense to have it as only :http, but we can't really change it at this time because :net_http is publicly documented: https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#using-the-nethttp-adapter

We could have alternative http implementations (we had an Ethon implementation, for example), thus they might have more implementation specific names, but none today.

I'll document the desire to change :net_http to just :net here.

Right! I think I was leaving out a bit too much detail on my description; let me share a bit more of my thinking, as I think it factors in some of my other feedback that you replied to as well.

I think the transport options proc really needs to go. Because it's effectively a bit of code that can do anything at all, it makes it really hard for us to know what's up and evolve the configuration. I think the direction we want to evolve in is one where those options are provided in the settings object, and the AgentSettingsResolver is able to convert everything into the resulting AgentSettings object.

Of course, until we actually do such a step, we'll need to still support the current public APIs and ways that people use to configure the gem.

BUT, and here is where some of my suggestions come in, nothing really forces us to use the "deprecated way" of doing things as the primary internal representation for them. E.g.

We may support people using t.adapter :net_http in a proc, but just alias to :http

We may change some of the APIs to use AgentSettings objects directly, as long as they still support the "old" way of doing it as well

The advantage of doing this is that we push the "deprecated way" further and further to what we do and how we think about it, rather than considering it as set in stone and building around it. Conceptually think of it as reimplementing stuff from scratch to use the new way only AND THEN adding support for the deprecated way on the side VS leaving the deprecated way as the main way AND THEN adding the new way on the side :)

Now, of course we all have dreams and aspirations for the codebase, so feel free to draw a line and declare "this is how far I want to get in this PR, and we'll see about the rest in the future". I'm just sharing this to contextualize how I'm thinking about these things.

We could have alternative http implementations (we had an Ethon implementation, for example), thus they might have more implementation specific names, but none today.

On this specific bit, I would represent ethon to customers as an extra option to the http adapter, not as an http adapter as itself. E.g. I envision that customers ask for http, and we pick what's the best http option, and that at some point may be ethon.

Otherwise we create this situation where customers need to both upgrade AND THEN review their configuration regularly in order to take advantage of improvements; whereas I think a model where they upgrade and automatically get the latest and greatest ™️ provides a better experience.

On this specific bit, I would represent ethon to customers as an extra option to the http adapter, not as an http adapter as itself. E.g. I envision that customers ask for http, and we pick what's the best http option, and that at some point may be ethon.

Otherwise we create this situation where customers need to both upgrade AND THEN review their configuration regularly in order to take advantage of improvements; whereas I think a model where they upgrade and automatically get the latest and greatest ™️ provides a better experience.

Totally agree, this is what I was thinking: using simply :http instead of :net_http, regardless of the underlying implementation.

We may support people using t.adapter :net_http in a proc, but just alias to :http

That seems reasonable. I'm not sure it has to be done here. I'm thinking we can rip off the band-aid and rename :net_http to :http in 1.0 and that's it. It's a small API change, and it's not very common to have a custom proc block for the HTTP transport configuration, it mostly happens for unix or testing adapters, so I think the affected users will be small.

We may change some of the APIs to use AgentSettings objects directly, as long as they still support the "old" way of doing it as well

I did some of this refactor, take a look!

marcotc · 2021-09-27T23:13:56Z

lib/ddtrace/profiling/transport/http.rb

+            hostname: agent_settings.hostname,
+            port: agent_settings.port,


All adapter options became keyword arguments (instead of positional) to allow for different adapters to be invoked by the same agent_settings object.

If we're going to change the adapters to receive these arguments in a different way, should we just pass in the whole agent_settings object?

That was somewhat my intention when I started moving in this direction, although I decided to take a smaller step, as the introduction of the agent_settings was already quite gnarly by itself.

I think it would also allow us to simplify the tests, for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.

should we just pass in the whole agent_settings object?

The adapters' initializers are indirectly public API: https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#using-the-nethttp-adapter

For example, whatever is passed after :net_http in t.adapter :net_http, '127.0.0.1', 8126, { timeout: 1 } is passed to the initializer as is.
I don't see value in wrapping it in an AgentSettings object in such scenario, unless you think otherwise.

for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.

I can't find a case like this in the PR, what part exactly are you referring to?

The adapters' initializers are indirectly public API

Yup, I'm aware. See my TL;DR above where I "share a bit more of my thinking". My suggestion here would be kind of the next step in the agent settings, which is to make them go all the way to the transport.

I'm aware that we would still need to support for a time whatever arguments are allowed to be called from the transport proc.

I can't find a case like this in the PR, what part exactly are you referring to?

I mean that in the profiling/transport/http_spec.rb and transport/http_spec.rb files, we needed to add uds_path to specs (or parts of specs) that otherwise are http-specific.

Done! Take a look at see if you like the changes. AgentSettings now goes all the way to the adapters.

marcotc · 2021-09-27T23:14:33Z

lib/ddtrace/transport/http/builder.rb

@@ -33,12 +33,12 @@ def initialize
          yield(self) if block_given?
        end

-        def adapter(type, *args)
+        def adapter(type, *args, **kwargs)


Ruby 3 compatibility.

codecov-commenter · 2021-09-27T23:49:07Z

Codecov Report

Merging #1700 (3742fd7) into master (b1bd0c0) will decrease coverage by 0.01%.
The diff coverage is 94.31%.

@@            Coverage Diff             @@
##           master    #1700      +/-   ##
==========================================
- Coverage   98.31%   98.29%   -0.02%     
==========================================
  Files         928      928              
  Lines       44803    44896      +93     
==========================================
+ Hits        44049    44132      +83     
- Misses        754      764      +10

Impacted Files	Coverage Δ
...pec/ddtrace/diagnostics/environment_logger_spec.rb	`100.00% <ø> (ø)`
spec/ddtrace/integration_spec.rb	`97.16% <70.00%> (-1.51%)`	⬇️
spec/ddtrace/transport/http_spec.rb	`98.02% <76.92%> (-1.98%)`	⬇️
lib/ddtrace/transport/http/adapters/unix_socket.rb	`97.29% <91.66%> (-2.71%)`	⬇️
...b/ddtrace/configuration/agent_settings_resolver.rb	`100.00% <100.00%> (ø)`
lib/ddtrace/ext/transport.rb	`100.00% <100.00%> (ø)`
lib/ddtrace/profiling/transport/http.rb	`100.00% <100.00%> (ø)`
lib/ddtrace/transport/http.rb	`100.00% <100.00%> (ø)`
lib/ddtrace/transport/http/adapters/net.rb	`97.14% <100.00%> (+0.08%)`	⬆️
lib/ddtrace/transport/http/adapters/test.rb	`97.56% <100.00%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1bd0c0...3742fd7. Read the comment docs.

ivoanjo

Thanks for this! Turn-key onboarding experience is THE BEST :)

I left some comments, especially around how we represent this internally in the AgentSettingsResolver, but I really like how you were able to slice in this extra useful behavior with minimal impact outside of that class.

ivoanjo · 2021-09-28T08:05:36Z

lib/ddtrace/configuration/agent_settings_resolver.rb

+        # If no agent settings have been provided, we try to connect using a local unix socket.
+        # We only do so if the socket is present when `ddtrace` runs.
+        #
+        # Note we still haven't exposed a way to to configure unix socket connectivity using
+        # environment variables, only using the custom configuration proc.
+        adapter_ = if default_settings && (uds_path_ = uds_path)
+                     Ext::Transport::UnixSocket::ADAPTER
+                   else
+                     # We can't invoke private methods with `self.method` in Ruby < 2.7, thus
+                     # the local variable has to be renamed to avoid conflict.
+                     adapter
+                   end


Minor: Have you considered moving this inside #adapter? Since #uds_path takes care of memoizing its result, I don't think there's a problem that #adapter will need to call #uds_path inside itself, and then we need to call it here again.

I read AgentSettingsResolver as the authoritative entity for deciding how to connect to the agent, regardless of product. I was thinking if Security wants to submit data through the agent, it would need this same information and would need to add code around if uds_path this, else that, and I thought this decision felt like it was leaking out of AgentSettingsResolver.

In other words, I see the AgentSettingsResolver as the encapsulation of all the complexity around figuring out how to connect to the agent, thus I see choosing the adapter as part of it.
The main reason for that is that whether you'll have to use uds_path or hostname can be decided by the AgentSettingsResolver.

I understand that the Adapter is actually tied today to implementation (:net_http actually means Net::HTTP-backed today), which I don't think is the right level of abstraction, but returning a value that lets the caller know "you should use TCP+HTTP for this configuration" seems reasonable to me.

I could change it from adapter to protocol, would that make sense to you? Then there's additional logic in each AgentSettingsResolver to figure out what adapter to use for the required protocol.

Ah, wait, I think my suggestion was misunderstood. I fully agree with all that you mentioned :)

I was mainly referring to code organization inside the agent_settings_resolver.rb file: I was saying that inside the #call method we could just call to an #adapter method, and have this logic there, so that the #call method can be 100% focused on just creating the AgentSettings, and leaves all of the logic to its helper methods.

I think that otherwise calling it adapter is fine.

Refactor done!

lib/ddtrace/configuration/agent_settings_resolver.rb

ivoanjo · 2021-09-28T08:36:20Z

lib/ddtrace/ext/transport.rb

@@ -3,6 +3,7 @@ module Datadog
  module Ext
    module Transport
      module HTTP
+        ADAPTER = :net_http


I'm not entirely sure it's in our interest to expand the number of places where we use "net_http" explicitly. Should we just call this "http", to leave some space when in the future we want to move away from "net_http"?

It seems a bit too low-level for most customers to be able to specify this; for instance for "test" and "unix" we only specify the method the connection will use, not really the implementation.

ivoanjo · 2021-09-28T08:40:54Z

lib/ddtrace/profiling/transport/http.rb

+            hostname: agent_settings.hostname,
+            port: agent_settings.port,


If we're going to change the adapters to receive these arguments in a different way, should we just pass in the whole agent_settings object?

That was somewhat my intention when I started moving in this direction, although I decided to take a smaller step, as the introduction of the agent_settings was already quite gnarly by itself.

I think it would also allow us to simplify the tests, for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.

lib/ddtrace/configuration/agent_settings_resolver.rb

ivoanjo · 2021-09-28T08:46:13Z

lib/ddtrace/profiling/transport/http.rb

        private_class_method def self.default_adapter
-          :net_http
+          Datadog::Ext::Transport::HTTP::ADAPTER
        end


I suggest removing this, now that the agent settings carry the adapter inside. (We may not be able to remove this for the tracer http transport, I wonder if that API is expected to be public, but the profiler one is fully NOT public so remove ahoy :D)

You asked me to possibly remove the adapter form the agent settings, but this suggestion contradicts that. I'll wait for a response on the original adapter question before following up on this.

Hopefully I clarified above what I meant and why I believe this can be removed :)

Now that I refactor all adapters to use agent settings, I'm actually confused in a different way on how to remove this method.

This default_adapter is only used in configure_for_agentless, and configure_for_agentless does not build its settings using a AgentSettings, likely because AgentSettings currently is only built from a Datadog::Configuration::Settings-like object and not a site_uri.

Are you suggesting I add a new data source for AgentSettings, based on the profiler's site_uri?

Are you suggesting I add a new data source for AgentSettings, based on the profiler's site_uri?

Ah, no. For the configure_for_agent case, we can use the AgentSettings as you pointed out. For the configure_for_agentless, it doesn't make sense for agentless to be anything other than "talk to the backend directly via http", otherwise, and thus we can effectively hardcode it to use the http adapter (e.g. inline the constant in that method).

The agentless mode is always going to be a bit weird; and I don't think it's worth trying to fit it into the AgentSettings object model since it's a profiler-specific thing.

ivoanjo · 2021-09-28T08:52:25Z

lib/ddtrace/transport/http/adapters/unix_socket.rb

+          # @deprecated Positional parameters are deprecated. Use named parameters instead.
+          def initialize(filepath = nil, **options)
+            @filepath = filepath || options[:filepath]
+            @timeout = options.fetch(:timeout, Ext::Transport::UnixSocket::DEFAULT_TIMEOUT_SECONDS)


For compatibility, we'll need to keep the default here, but I think we should also modify AgentSettings#timeout to use the correct default timeout (http vs uds). Right now, both are set at 1, BUT if we do change the default uds timeout, it won't be applied when the "default unix socket" is used (the http one will be used instead).

I'm thinking of removing the default value for time timeout and return nil instead, unless specified by the user. This will allow the adapter to decide its own default.

I could figure out the timeout on a per-protocol basis in the agent settings resolver, but I think that will leak implementation data into the resolver class.

What do you think of letting the adapter decide the timeout when not specified?

What do you think of letting the adapter decide the timeout when not specified?

Works for me, I don't have strong feelings either way, happy with either option.

Done! The correct default timeout will now be filled out by the adapter implementation.

ivoanjo

I think we're almost there! I think the two remaining items to settle on are:

Use of filepath as an argument name
Pass agent_settings all the way to the adapters or not

...since these are the two items that are most visible in the API.

Let me know your thoughts/decision (which can just be "they're fine as-is right now") and I'm ready to press approve ;)

docs/GettingStarted.md

ivoanjo · 2021-09-29T08:26:56Z

lib/ddtrace/profiling/transport/http.rb

        private_class_method def self.default_adapter
-          :net_http
+          Datadog::Ext::Transport::HTTP::ADAPTER
        end


Hopefully I clarified above what I meant and why I believe this can be removed :)

ivoanjo · 2021-09-29T08:29:12Z

lib/ddtrace/transport/http/adapters/net.rb

+            @hostname = hostname || options[:hostname]
+            @port = port || options[:port]


Minor: Should we use options.fetch(:hostname) and options.fetch(:port) here?

(same note for similar changes in other adapters; this will not apply if we end up just passing the agent settings object directly)

lib/ddtrace/transport/http/adapters/unix_socket.rb

ivoanjo · 2021-09-29T08:39:59Z

spec/ddtrace/configuration/agent_settings_resolver_spec.rb

-  let(:default_settings) do
+  let(:settings) do
    {
+      adapter: adapter,
      ssl: false,
-      hostname: '127.0.0.1',
-      port: 8126,
+      hostname: hostname,
+      port: port,
+      uds_path: uds_path,
      timeout_seconds: 1,
      deprecated_for_removal_transport_configuration_proc: nil,
-      deprecated_for_removal_transport_configuration_options: nil
+      deprecated_for_removal_transport_configuration_options: nil,
    }
  end


I can live with, but I don't quite like this change. The specs below all had the pattern of "when you do this, then the outcome is default settings + your change". With this change, it means that the settings may have been changed at another level, which I think is a bit less readable.

My alternative suggestion would be to have a uds_settings let and use that in the uds tests.

ivoanjo · 2021-09-29T08:42:32Z

spec/ddtrace/configuration/agent_settings_resolver_spec.rb

        it 'falls back to the defaults' do
-          expect(resolver).to have_attributes default_settings
+          expect(resolver).to have_attributes settings
        end
      end


Minor: we could add a test here that checked that the uds socket was still used if it exists, since in this branch the custom config was invalid

The unix socket won't be used here, as DD_TRACE_AGENT_PORT was explicitly provided. We don't want to override use configuration, even if invalid: we want to have the tracer behave as instructed given the provided settings. Of course in this case an InvalidArgument will be raised, but that will give the user a clear message on what's the issue in their configuration.

spec/ddtrace/configuration/agent_settings_resolver_spec.rb

spec/ddtrace/transport/http_spec.rb

Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

lib/ddtrace/configuration/agent_settings_resolver.rb

spec/ddtrace/transport/http_spec.rb

ivoanjo

LGTM 👍

Thanks for the great improvement + clean up :)

spec/ddtrace/transport/http/builder_spec.rb

Automatically send traces to agent Unix socket if present

b94e347

marcotc added the core Involves Datadog core libraries label Sep 27, 2021

marcotc self-assigned this Sep 27, 2021

marcotc requested a review from a team September 27, 2021 23:00

marcotc commented Sep 27, 2021

View reviewed changes

Fix Ruby 3 test arguments

33c08a0

ivoanjo reviewed Sep 28, 2021

View reviewed changes

marcotc added 3 commits September 28, 2021 14:33

Refator UDS in agent resolver

ecab2ea

Removed file by mistake

1f5d664

Document :net_http being implementation detail

babe821

marcotc requested a review from ivoanjo September 28, 2021 22:00

ivoanjo reviewed Sep 29, 2021

View reviewed changes

marcotc and others added 7 commits September 29, 2021 15:04

Propagate AgentSettings to adapters

5893709

Update docs/GettingStarted.md

2aad93c

Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>

Refactor adapter in agent settings

a7707d2

Rename filename to uds_path

fe17f58

Let adapter decide on best default timeout

b62ed16

Better argument handling in adapters

8e82d7d

Fix tests

6adafa9

ivoanjo reviewed Oct 4, 2021

View reviewed changes

lib/ddtrace/configuration/agent_settings_resolver.rb Outdated Show resolved Hide resolved

spec/ddtrace/transport/http_spec.rb Outdated Show resolved Hide resolved

ivoanjo approved these changes Oct 4, 2021

View reviewed changes

spec/ddtrace/transport/http/builder_spec.rb Outdated Show resolved Hide resolved

marcotc added 4 commits October 4, 2021 13:27

Merge branch 'master' into uds-auto-detect

a938046

Inline default_adapters method for agentless

c4a6ab7

Freeze struct merge return value

87d8e9c

Document global agent settings deprecation

dcfb881

marcotc added 2 commits October 4, 2021 13:36

Use real agent settings instead of mock

29e48fc

Remove uds combination on each spec

3742fd7

marcotc merged commit de6dc4c into master Oct 4, 2021

marcotc deleted the uds-auto-detect branch October 4, 2021 21:05

github-actions bot added this to the 0.53.0 milestone Oct 4, 2021

djmitche mentioned this pull request Nov 8, 2021

Improve automatic configuration DataDog/dogstatsd-ruby#226

Open

		hostname: agent_settings.hostname,
		port: agent_settings.port,

		@hostname = hostname \|\| options[:hostname]
		@port = port \|\| options[:port]

Automatically send traces to agent Unix socket if present #1700

Automatically send traces to agent Unix socket if present #1700

Conversation

marcotc commented Sep 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc Sep 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 27, 2021 • edited

Codecov Report

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc Sep 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcotc Sep 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

marcotc Sep 29, 2021 •

edited

codecov-commenter commented Sep 27, 2021 •

edited

marcotc Sep 28, 2021 •

edited

marcotc Sep 28, 2021 •

edited