Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically send traces to agent Unix socket if present #1700

Merged
merged 18 commits into from
Oct 4, 2021

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Sep 27, 2021

When ddtrace transport options are not provided, this PR now automatically connects ddtrace to an existing agent unix socket if one is present at /var/run/datadog/apm.socket, the default path for the Datadog agent Kubernetes Helm chart deployment. If such file doesn't exist, it falls back to the default host:port connection method.

This change has no impact on users that have any transport option configured, regardless of their source (env var, c.hostname = ..., c.tracer.transport_options = proc { ...), neither it affects clients where the file /var/run/datadog/apm.socket is not present in the local file system.

This change should ease onboarding for Kubernetes workflows, and bring ddtrace in line with tracers from other APM languages.

@marcotc marcotc added the core Involves Datadog core libraries label Sep 27, 2021
@marcotc marcotc self-assigned this Sep 27, 2021
@marcotc marcotc requested a review from a team September 27, 2021 23:00
ddagent:
image: datadog/agent
environment:
- DD_APM_ENABLED=true
- DD_BIND_HOST=0.0.0.0
- DD_API_KEY=00000000000000000000000000000000
- "DD_API_KEY=${DD_API_KEY}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've parametrized DD_API_KEY to make it easier to submit real data to our backend if convenient for testing (like it was for me).

ddagent:
image: datadog/agent
environment:
- DD_APM_ENABLED=true
- DD_BIND_HOST=0.0.0.0
- DD_API_KEY=00000000000000000000000000000000
- "DD_API_KEY=${DD_API_KEY}"
- DD_APM_RECEIVER_SOCKET=/var/run/datadog/apm.socket
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DD_APM_RECEIVER_SOCKET is the environment variable used by our Kubernetes Heml chart to expose the agent Unix socket: https://github.com/DataDog/helm-charts/blob/4ea229d38cdf27e6c43bce012f776c6e4deeba10/charts/datadog/templates/_container-trace-agent.yaml#L41-L42

@@ -3,6 +3,7 @@ module Datadog
module Ext
module Transport
module HTTP
ADAPTER = :net_http
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I had to add a few new constants here, I ended up performing a small refactor around transport constants, as you'll see in the files to follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure it's in our interest to expand the number of places where we use "net_http" explicitly. Should we just call this "http", to leave some space when in the future we want to move away from "net_http"?

It seems a bit too low-level for most customers to be able to specify this; for instance for "test" and "unix" we only specify the method the connection will use, not really the implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have it as only :http, but we can't really change it at this time because :net_http is publicly documented: https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#using-the-nethttp-adapter

We could have alternative http implementations (we had an Ethon implementation, for example), thus they might have more implementation specific names, but none today.

I'll document the desire to change :net_http to just :net here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! I think I was leaving out a bit too much detail on my description; let me share a bit more of my thinking, as I think it factors in some of my other feedback that you replied to as well.

I think the transport options proc really needs to go. Because it's effectively a bit of code that can do anything at all, it makes it really hard for us to know what's up and evolve the configuration. I think the direction we want to evolve in is one where those options are provided in the settings object, and the AgentSettingsResolver is able to convert everything into the resulting AgentSettings object.

Of course, until we actually do such a step, we'll need to still support the current public APIs and ways that people use to configure the gem.

BUT, and here is where some of my suggestions come in, nothing really forces us to use the "deprecated way" of doing things as the primary internal representation for them. E.g.

  • We may support people using t.adapter :net_http in a proc, but just alias to :http
  • We may change some of the APIs to use AgentSettings objects directly, as long as they still support the "old" way of doing it as well

The advantage of doing this is that we push the "deprecated way" further and further to what we do and how we think about it, rather than considering it as set in stone and building around it. Conceptually think of it as reimplementing stuff from scratch to use the new way only AND THEN adding support for the deprecated way on the side VS leaving the deprecated way as the main way AND THEN adding the new way on the side :)

Now, of course we all have dreams and aspirations for the codebase, so feel free to draw a line and declare "this is how far I want to get in this PR, and we'll see about the rest in the future". I'm just sharing this to contextualize how I'm thinking about these things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have alternative http implementations (we had an Ethon implementation, for example), thus they might have more implementation specific names, but none today.

On this specific bit, I would represent ethon to customers as an extra option to the http adapter, not as an http adapter as itself. E.g. I envision that customers ask for http, and we pick what's the best http option, and that at some point may be ethon.

Otherwise we create this situation where customers need to both upgrade AND THEN review their configuration regularly in order to take advantage of improvements; whereas I think a model where they upgrade and automatically get the latest and greatest ™️ provides a better experience.

Copy link
Member Author

@marcotc marcotc Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this specific bit, I would represent ethon to customers as an extra option to the http adapter, not as an http adapter as itself. E.g. I envision that customers ask for http, and we pick what's the best http option, and that at some point may be ethon.

Otherwise we create this situation where customers need to both upgrade AND THEN review their configuration regularly in order to take advantage of improvements; whereas I think a model where they upgrade and automatically get the latest and greatest ™️ provides a better experience.

Totally agree, this is what I was thinking: using simply :http instead of :net_http, regardless of the underlying implementation.

  • We may support people using t.adapter :net_http in a proc, but just alias to :http

That seems reasonable. I'm not sure it has to be done here. I'm thinking we can rip off the band-aid and rename :net_http to :http in 1.0 and that's it. It's a small API change, and it's not very common to have a custom proc block for the HTTP transport configuration, it mostly happens for unix or testing adapters, so I think the affected users will be small.

  • We may change some of the APIs to use AgentSettings objects directly, as long as they still support the "old" way of doing it as well

I did some of this refactor, take a look!

Comment on lines 76 to 77
hostname: agent_settings.hostname,
port: agent_settings.port,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All adapter options became keyword arguments (instead of positional) to allow for different adapters to be invoked by the same agent_settings object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to change the adapters to receive these arguments in a different way, should we just pass in the whole agent_settings object?

That was somewhat my intention when I started moving in this direction, although I decided to take a smaller step, as the introduction of the agent_settings was already quite gnarly by itself.

I think it would also allow us to simplify the tests, for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just pass in the whole agent_settings object?

The adapters' initializers are indirectly public API: https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#using-the-nethttp-adapter

For example, whatever is passed after :net_http in t.adapter :net_http, '127.0.0.1', 8126, { timeout: 1 } is passed to the initializer as is.
I don't see value in wrapping it in an AgentSettings object in such scenario, unless you think otherwise.

for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.

I can't find a case like this in the PR, what part exactly are you referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adapters' initializers are indirectly public API

Yup, I'm aware. See my TL;DR above where I "share a bit more of my thinking". My suggestion here would be kind of the next step in the agent settings, which is to make them go all the way to the transport.

I'm aware that we would still need to support for a time whatever arguments are allowed to be called from the transport proc.

I can't find a case like this in the PR, what part exactly are you referring to?

I mean that in the profiling/transport/http_spec.rb and transport/http_spec.rb files, we needed to add uds_path to specs (or parts of specs) that otherwise are http-specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Take a look at see if you like the changes. AgentSettings now goes all the way to the adapters.

@@ -33,12 +33,12 @@ def initialize
yield(self) if block_given?
end

def adapter(type, *args)
def adapter(type, *args, **kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby 3 compatibility.

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2021

Codecov Report

Merging #1700 (3742fd7) into master (b1bd0c0) will decrease coverage by 0.01%.
The diff coverage is 94.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1700      +/-   ##
==========================================
- Coverage   98.31%   98.29%   -0.02%     
==========================================
  Files         928      928              
  Lines       44803    44896      +93     
==========================================
+ Hits        44049    44132      +83     
- Misses        754      764      +10     
Impacted Files Coverage Δ
...pec/ddtrace/diagnostics/environment_logger_spec.rb 100.00% <ø> (ø)
spec/ddtrace/integration_spec.rb 97.16% <70.00%> (-1.51%) ⬇️
spec/ddtrace/transport/http_spec.rb 98.02% <76.92%> (-1.98%) ⬇️
lib/ddtrace/transport/http/adapters/unix_socket.rb 97.29% <91.66%> (-2.71%) ⬇️
...b/ddtrace/configuration/agent_settings_resolver.rb 100.00% <100.00%> (ø)
lib/ddtrace/ext/transport.rb 100.00% <100.00%> (ø)
lib/ddtrace/profiling/transport/http.rb 100.00% <100.00%> (ø)
lib/ddtrace/transport/http.rb 100.00% <100.00%> (ø)
lib/ddtrace/transport/http/adapters/net.rb 97.14% <100.00%> (+0.08%) ⬆️
lib/ddtrace/transport/http/adapters/test.rb 97.56% <100.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1bd0c0...3742fd7. Read the comment docs.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Turn-key onboarding experience is THE BEST :)

I left some comments, especially around how we represent this internally in the AgentSettingsResolver, but I really like how you were able to slice in this extra useful behavior with minimal impact outside of that class.

Comment on lines 78 to 89
# If no agent settings have been provided, we try to connect using a local unix socket.
# We only do so if the socket is present when `ddtrace` runs.
#
# Note we still haven't exposed a way to to configure unix socket connectivity using
# environment variables, only using the custom configuration proc.
adapter_ = if default_settings && (uds_path_ = uds_path)
Ext::Transport::UnixSocket::ADAPTER
else
# We can't invoke private methods with `self.method` in Ruby < 2.7, thus
# the local variable has to be renamed to avoid conflict.
adapter
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Have you considered moving this inside #adapter? Since #uds_path takes care of memoizing its result, I don't think there's a problem that #adapter will need to call #uds_path inside itself, and then we need to call it here again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read AgentSettingsResolver as the authoritative entity for deciding how to connect to the agent, regardless of product. I was thinking if Security wants to submit data through the agent, it would need this same information and would need to add code around if uds_path this, else that, and I thought this decision felt like it was leaking out of AgentSettingsResolver.

Copy link
Member Author

@marcotc marcotc Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I see the AgentSettingsResolver as the encapsulation of all the complexity around figuring out how to connect to the agent, thus I see choosing the adapter as part of it.
The main reason for that is that whether you'll have to use uds_path or hostname can be decided by the AgentSettingsResolver.

I understand that the Adapter is actually tied today to implementation (:net_http actually means Net::HTTP-backed today), which I don't think is the right level of abstraction, but returning a value that lets the caller know "you should use TCP+HTTP for this configuration" seems reasonable to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change it from adapter to protocol, would that make sense to you? Then there's additional logic in each AgentSettingsResolver to figure out what adapter to use for the required protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, wait, I think my suggestion was misunderstood. I fully agree with all that you mentioned :)

I was mainly referring to code organization inside the agent_settings_resolver.rb file: I was saying that inside the #call method we could just call to an #adapter method, and have this logic there, so that the #call method can be 100% focused on just creating the AgentSettings, and leaves all of the logic to its helper methods.

I think that otherwise calling it adapter is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor done!

lib/ddtrace/configuration/agent_settings_resolver.rb Outdated Show resolved Hide resolved
@@ -3,6 +3,7 @@ module Datadog
module Ext
module Transport
module HTTP
ADAPTER = :net_http
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure it's in our interest to expand the number of places where we use "net_http" explicitly. Should we just call this "http", to leave some space when in the future we want to move away from "net_http"?

It seems a bit too low-level for most customers to be able to specify this; for instance for "test" and "unix" we only specify the method the connection will use, not really the implementation.

Comment on lines 76 to 77
hostname: agent_settings.hostname,
port: agent_settings.port,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to change the adapters to receive these arguments in a different way, should we just pass in the whole agent_settings object?

That was somewhat my intention when I started moving in this direction, although I decided to take a smaller step, as the introduction of the agent_settings was already quite gnarly by itself.

I think it would also allow us to simplify the tests, for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.

lib/ddtrace/configuration/agent_settings_resolver.rb Outdated Show resolved Hide resolved
Comment on lines 67 to 69
private_class_method def self.default_adapter
:net_http
Datadog::Ext::Transport::HTTP::ADAPTER
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing this, now that the agent settings carry the adapter inside. (We may not be able to remove this for the tracer http transport, I wonder if that API is expected to be public, but the profiler one is fully NOT public so remove ahoy :D)

Copy link
Member Author

@marcotc marcotc Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You asked me to possibly remove the adapter form the agent settings, but this suggestion contradicts that. I'll wait for a response on the original adapter question before following up on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully I clarified above what I meant and why I believe this can be removed :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I refactor all adapters to use agent settings, I'm actually confused in a different way on how to remove this method.

This default_adapter is only used in configure_for_agentless, and configure_for_agentless does not build its settings using a AgentSettings, likely because AgentSettings currently is only built from a Datadog::Configuration::Settings-like object and not a site_uri.

Are you suggesting I add a new data source for AgentSettings, based on the profiler's site_uri?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting I add a new data source for AgentSettings, based on the profiler's site_uri?

Ah, no. For the configure_for_agent case, we can use the AgentSettings as you pointed out. For the configure_for_agentless, it doesn't make sense for agentless to be anything other than "talk to the backend directly via http", otherwise, and thus we can effectively hardcode it to use the http adapter (e.g. inline the constant in that method).

The agentless mode is always going to be a bit weird; and I don't think it's worth trying to fit it into the AgentSettings object model since it's a profiler-specific thing.

# @deprecated Positional parameters are deprecated. Use named parameters instead.
def initialize(filepath = nil, **options)
@filepath = filepath || options[:filepath]
@timeout = options.fetch(:timeout, Ext::Transport::UnixSocket::DEFAULT_TIMEOUT_SECONDS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility, we'll need to keep the default here, but I think we should also modify AgentSettings#timeout to use the correct default timeout (http vs uds). Right now, both are set at 1, BUT if we do change the default uds timeout, it won't be applied when the "default unix socket" is used (the http one will be used instead).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of removing the default value for time timeout and return nil instead, unless specified by the user. This will allow the adapter to decide its own default.

I could figure out the timeout on a per-protocol basis in the agent settings resolver, but I think that will leak implementation data into the resolver class.

What do you think of letting the adapter decide the timeout when not specified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of letting the adapter decide the timeout when not specified?

Works for me, I don't have strong feelings either way, happy with either option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! The correct default timeout will now be filled out by the adapter implementation.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're almost there! I think the two remaining items to settle on are:

  1. Use of filepath as an argument name
  2. Pass agent_settings all the way to the adapters or not

...since these are the two items that are most visible in the API.

Let me know your thoughts/decision (which can just be "they're fine as-is right now") and I'm ready to press approve ;)

docs/GettingStarted.md Outdated Show resolved Hide resolved
Comment on lines 67 to 69
private_class_method def self.default_adapter
:net_http
Datadog::Ext::Transport::HTTP::ADAPTER
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully I clarified above what I meant and why I believe this can be removed :)

Comment on lines 21 to 22
@hostname = hostname || options[:hostname]
@port = port || options[:port]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should we use options.fetch(:hostname) and options.fetch(:port) here?

(same note for similar changes in other adapters; this will not apply if we end up just passing the agent settings object directly)

lib/ddtrace/transport/http/adapters/unix_socket.rb Outdated Show resolved Hide resolved
Comment on lines 19 to 30
let(:default_settings) do
let(:settings) do
{
adapter: adapter,
ssl: false,
hostname: '127.0.0.1',
port: 8126,
hostname: hostname,
port: port,
uds_path: uds_path,
timeout_seconds: 1,
deprecated_for_removal_transport_configuration_proc: nil,
deprecated_for_removal_transport_configuration_options: nil
deprecated_for_removal_transport_configuration_options: nil,
}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with, but I don't quite like this change. The specs below all had the pattern of "when you do this, then the outcome is default settings + your change". With this change, it means that the settings may have been changed at another level, which I think is a bit less readable.

My alternative suggestion would be to have a uds_settings let and use that in the uds tests.

Comment on lines 170 to 173
it 'falls back to the defaults' do
expect(resolver).to have_attributes default_settings
expect(resolver).to have_attributes settings
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: we could add a test here that checked that the uds socket was still used if it exists, since in this branch the custom config was invalid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unix socket won't be used here, as DD_TRACE_AGENT_PORT was explicitly provided. We don't want to override use configuration, even if invalid: we want to have the tracer behave as instructed given the provided settings. Of course in this case an InvalidArgument will be raised, but that will give the user a clear message on what's the issue in their configuration.

spec/ddtrace/configuration/agent_settings_resolver_spec.rb Outdated Show resolved Hide resolved
spec/ddtrace/transport/http_spec.rb Outdated Show resolved Hide resolved
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Thanks for the great improvement + clean up :)

spec/ddtrace/transport/http/builder_spec.rb Outdated Show resolved Hide resolved
@marcotc marcotc merged commit de6dc4c into master Oct 4, 2021
@marcotc marcotc deleted the uds-auto-detect branch October 4, 2021 21:05
@github-actions github-actions bot added this to the 0.53.0 milestone Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants