-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically send traces to agent Unix socket if present #1700
Conversation
ddagent: | ||
image: datadog/agent | ||
environment: | ||
- DD_APM_ENABLED=true | ||
- DD_BIND_HOST=0.0.0.0 | ||
- DD_API_KEY=00000000000000000000000000000000 | ||
- "DD_API_KEY=${DD_API_KEY}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've parametrized DD_API_KEY
to make it easier to submit real data to our backend if convenient for testing (like it was for me).
ddagent: | ||
image: datadog/agent | ||
environment: | ||
- DD_APM_ENABLED=true | ||
- DD_BIND_HOST=0.0.0.0 | ||
- DD_API_KEY=00000000000000000000000000000000 | ||
- "DD_API_KEY=${DD_API_KEY}" | ||
- DD_APM_RECEIVER_SOCKET=/var/run/datadog/apm.socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DD_APM_RECEIVER_SOCKET
is the environment variable used by our Kubernetes Heml chart to expose the agent Unix socket: https://github.com/DataDog/helm-charts/blob/4ea229d38cdf27e6c43bce012f776c6e4deeba10/charts/datadog/templates/_container-trace-agent.yaml#L41-L42
lib/ddtrace/ext/transport.rb
Outdated
@@ -3,6 +3,7 @@ module Datadog | |||
module Ext | |||
module Transport | |||
module HTTP | |||
ADAPTER = :net_http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I had to add a few new constants here, I ended up performing a small refactor around transport constants, as you'll see in the files to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure it's in our interest to expand the number of places where we use "net_http" explicitly. Should we just call this "http", to leave some space when in the future we want to move away from "net_http"?
It seems a bit too low-level for most customers to be able to specify this; for instance for "test" and "unix" we only specify the method the connection will use, not really the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have it as only :http
, but we can't really change it at this time because :net_http
is publicly documented: https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#using-the-nethttp-adapter
We could have alternative http implementations (we had an Ethon implementation, for example), thus they might have more implementation specific names, but none today.
I'll document the desire to change :net_http
to just :net
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! I think I was leaving out a bit too much detail on my description; let me share a bit more of my thinking, as I think it factors in some of my other feedback that you replied to as well.
I think the transport options proc really needs to go. Because it's effectively a bit of code that can do anything at all, it makes it really hard for us to know what's up and evolve the configuration. I think the direction we want to evolve in is one where those options are provided in the settings object, and the AgentSettingsResolver
is able to convert everything into the resulting AgentSettings
object.
Of course, until we actually do such a step, we'll need to still support the current public APIs and ways that people use to configure the gem.
BUT, and here is where some of my suggestions come in, nothing really forces us to use the "deprecated way" of doing things as the primary internal representation for them. E.g.
- We may support people using
t.adapter :net_http
in a proc, but just alias to:http
- We may change some of the APIs to use
AgentSettings
objects directly, as long as they still support the "old" way of doing it as well
The advantage of doing this is that we push the "deprecated way" further and further to what we do and how we think about it, rather than considering it as set in stone and building around it. Conceptually think of it as reimplementing stuff from scratch to use the new way only AND THEN adding support for the deprecated way on the side VS leaving the deprecated way as the main way AND THEN adding the new way on the side :)
Now, of course we all have dreams and aspirations for the codebase, so feel free to draw a line and declare "this is how far I want to get in this PR, and we'll see about the rest in the future". I'm just sharing this to contextualize how I'm thinking about these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have alternative http implementations (we had an Ethon implementation, for example), thus they might have more implementation specific names, but none today.
On this specific bit, I would represent ethon to customers as an extra option to the http adapter, not as an http adapter as itself. E.g. I envision that customers ask for http, and we pick what's the best http option, and that at some point may be ethon.
Otherwise we create this situation where customers need to both upgrade AND THEN review their configuration regularly in order to take advantage of improvements; whereas I think a model where they upgrade and automatically get the latest and greatest ™️ provides a better experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this specific bit, I would represent ethon to customers as an extra option to the http adapter, not as an http adapter as itself. E.g. I envision that customers ask for http, and we pick what's the best http option, and that at some point may be ethon.
Otherwise we create this situation where customers need to both upgrade AND THEN review their configuration regularly in order to take advantage of improvements; whereas I think a model where they upgrade and automatically get the latest and greatest ™️ provides a better experience.
Totally agree, this is what I was thinking: using simply :http
instead of :net_http
, regardless of the underlying implementation.
- We may support people using t.adapter :net_http in a proc, but just alias to :http
That seems reasonable. I'm not sure it has to be done here. I'm thinking we can rip off the band-aid and rename :net_http
to :http
in 1.0 and that's it. It's a small API change, and it's not very common to have a custom proc block for the HTTP transport configuration, it mostly happens for unix or testing adapters, so I think the affected users will be small.
- We may change some of the APIs to use AgentSettings objects directly, as long as they still support the "old" way of doing it as well
I did some of this refactor, take a look!
hostname: agent_settings.hostname, | ||
port: agent_settings.port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All adapter options became keyword arguments (instead of positional) to allow for different adapters to be invoked by the same agent_settings
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to change the adapters to receive these arguments in a different way, should we just pass in the whole agent_settings
object?
That was somewhat my intention when I started moving in this direction, although I decided to take a smaller step, as the introduction of the agent_settings
was already quite gnarly by itself.
I think it would also allow us to simplify the tests, for instance the net_http ones where we needed to add uds_path
and filepath
references because the class needs to receive, and then ignore then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just pass in the whole agent_settings object?
The adapters' initializers are indirectly public API: https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#using-the-nethttp-adapter
For example, whatever is passed after :net_http
in t.adapter :net_http, '127.0.0.1', 8126, { timeout: 1 }
is passed to the initializer as is.
I don't see value in wrapping it in an AgentSettings object in such scenario, unless you think otherwise.
for instance the net_http ones where we needed to add uds_path and filepath references because the class needs to receive, and then ignore then.
I can't find a case like this in the PR, what part exactly are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The adapters' initializers are indirectly public API
Yup, I'm aware. See my TL;DR above where I "share a bit more of my thinking". My suggestion here would be kind of the next step in the agent settings, which is to make them go all the way to the transport.
I'm aware that we would still need to support for a time whatever arguments are allowed to be called from the transport proc.
I can't find a case like this in the PR, what part exactly are you referring to?
I mean that in the profiling/transport/http_spec.rb
and transport/http_spec.rb
files, we needed to add uds_path
to specs (or parts of specs) that otherwise are http-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Take a look at see if you like the changes. AgentSettings
now goes all the way to the adapters.
@@ -33,12 +33,12 @@ def initialize | |||
yield(self) if block_given? | |||
end | |||
|
|||
def adapter(type, *args) | |||
def adapter(type, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby 3 compatibility.
Codecov Report
@@ Coverage Diff @@
## master #1700 +/- ##
==========================================
- Coverage 98.31% 98.29% -0.02%
==========================================
Files 928 928
Lines 44803 44896 +93
==========================================
+ Hits 44049 44132 +83
- Misses 754 764 +10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Turn-key onboarding experience is THE BEST :)
I left some comments, especially around how we represent this internally in the AgentSettingsResolver
, but I really like how you were able to slice in this extra useful behavior with minimal impact outside of that class.
# If no agent settings have been provided, we try to connect using a local unix socket. | ||
# We only do so if the socket is present when `ddtrace` runs. | ||
# | ||
# Note we still haven't exposed a way to to configure unix socket connectivity using | ||
# environment variables, only using the custom configuration proc. | ||
adapter_ = if default_settings && (uds_path_ = uds_path) | ||
Ext::Transport::UnixSocket::ADAPTER | ||
else | ||
# We can't invoke private methods with `self.method` in Ruby < 2.7, thus | ||
# the local variable has to be renamed to avoid conflict. | ||
adapter | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Have you considered moving this inside #adapter
? Since #uds_path
takes care of memoizing its result, I don't think there's a problem that #adapter
will need to call #uds_path
inside itself, and then we need to call it here again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read AgentSettingsResolver
as the authoritative entity for deciding how to connect to the agent, regardless of product. I was thinking if Security wants to submit data through the agent, it would need this same information and would need to add code around if uds_path this, else that
, and I thought this decision felt like it was leaking out of AgentSettingsResolver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I see the AgentSettingsResolver
as the encapsulation of all the complexity around figuring out how to connect to the agent, thus I see choosing the adapter as part of it.
The main reason for that is that whether you'll have to use uds_path
or hostname
can be decided by the AgentSettingsResolver
.
I understand that the Adapter is actually tied today to implementation (:net_http
actually means Net::HTTP
-backed today), which I don't think is the right level of abstraction, but returning a value that lets the caller know "you should use TCP+HTTP for this configuration" seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it from adapter
to protocol
, would that make sense to you? Then there's additional logic in each AgentSettingsResolver
to figure out what adapter to use for the required protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, wait, I think my suggestion was misunderstood. I fully agree with all that you mentioned :)
I was mainly referring to code organization inside the agent_settings_resolver.rb
file: I was saying that inside the #call
method we could just call to an #adapter
method, and have this logic there, so that the #call
method can be 100% focused on just creating the AgentSettings
, and leaves all of the logic to its helper methods.
I think that otherwise calling it adapter is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor done!
lib/ddtrace/ext/transport.rb
Outdated
@@ -3,6 +3,7 @@ module Datadog | |||
module Ext | |||
module Transport | |||
module HTTP | |||
ADAPTER = :net_http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure it's in our interest to expand the number of places where we use "net_http" explicitly. Should we just call this "http", to leave some space when in the future we want to move away from "net_http"?
It seems a bit too low-level for most customers to be able to specify this; for instance for "test" and "unix" we only specify the method the connection will use, not really the implementation.
hostname: agent_settings.hostname, | ||
port: agent_settings.port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to change the adapters to receive these arguments in a different way, should we just pass in the whole agent_settings
object?
That was somewhat my intention when I started moving in this direction, although I decided to take a smaller step, as the introduction of the agent_settings
was already quite gnarly by itself.
I think it would also allow us to simplify the tests, for instance the net_http ones where we needed to add uds_path
and filepath
references because the class needs to receive, and then ignore then.
private_class_method def self.default_adapter | ||
:net_http | ||
Datadog::Ext::Transport::HTTP::ADAPTER | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing this, now that the agent settings carry the adapter inside. (We may not be able to remove this for the tracer http transport, I wonder if that API is expected to be public, but the profiler one is fully NOT public so remove ahoy :D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You asked me to possibly remove the adapter
form the agent settings, but this suggestion contradicts that. I'll wait for a response on the original adapter
question before following up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully I clarified above what I meant and why I believe this can be removed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I refactor all adapters to use agent settings, I'm actually confused in a different way on how to remove this method.
This default_adapter
is only used in configure_for_agentless
, and configure_for_agentless
does not build its settings using a AgentSettings
, likely because AgentSettings
currently is only built from a Datadog::Configuration::Settings
-like object and not a site_uri
.
Are you suggesting I add a new data source for AgentSettings
, based on the profiler's site_uri
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting I add a new data source for AgentSettings, based on the profiler's site_uri?
Ah, no. For the configure_for_agent
case, we can use the AgentSettings
as you pointed out. For the configure_for_agentless
, it doesn't make sense for agentless to be anything other than "talk to the backend directly via http", otherwise, and thus we can effectively hardcode it to use the http adapter (e.g. inline the constant in that method).
The agentless mode is always going to be a bit weird; and I don't think it's worth trying to fit it into the AgentSettings
object model since it's a profiler-specific thing.
# @deprecated Positional parameters are deprecated. Use named parameters instead. | ||
def initialize(filepath = nil, **options) | ||
@filepath = filepath || options[:filepath] | ||
@timeout = options.fetch(:timeout, Ext::Transport::UnixSocket::DEFAULT_TIMEOUT_SECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compatibility, we'll need to keep the default here, but I think we should also modify AgentSettings#timeout
to use the correct default timeout (http vs uds). Right now, both are set at 1, BUT if we do change the default uds timeout, it won't be applied when the "default unix socket" is used (the http one will be used instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of removing the default value for time timeout
and return nil
instead, unless specified by the user. This will allow the adapter to decide its own default.
I could figure out the timeout on a per-protocol basis in the agent settings resolver, but I think that will leak implementation data into the resolver class.
What do you think of letting the adapter decide the timeout when not specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of letting the adapter decide the timeout when not specified?
Works for me, I don't have strong feelings either way, happy with either option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! The correct default timeout will now be filled out by the adapter implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're almost there! I think the two remaining items to settle on are:
- Use of
filepath
as an argument name - Pass
agent_settings
all the way to the adapters or not
...since these are the two items that are most visible in the API.
Let me know your thoughts/decision (which can just be "they're fine as-is right now") and I'm ready to press approve ;)
private_class_method def self.default_adapter | ||
:net_http | ||
Datadog::Ext::Transport::HTTP::ADAPTER | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully I clarified above what I meant and why I believe this can be removed :)
@hostname = hostname || options[:hostname] | ||
@port = port || options[:port] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Should we use options.fetch(:hostname)
and options.fetch(:port)
here?
(same note for similar changes in other adapters; this will not apply if we end up just passing the agent settings object directly)
let(:default_settings) do | ||
let(:settings) do | ||
{ | ||
adapter: adapter, | ||
ssl: false, | ||
hostname: '127.0.0.1', | ||
port: 8126, | ||
hostname: hostname, | ||
port: port, | ||
uds_path: uds_path, | ||
timeout_seconds: 1, | ||
deprecated_for_removal_transport_configuration_proc: nil, | ||
deprecated_for_removal_transport_configuration_options: nil | ||
deprecated_for_removal_transport_configuration_options: nil, | ||
} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can live with, but I don't quite like this change. The specs below all had the pattern of "when you do this, then the outcome is default settings + your change". With this change, it means that the settings may have been changed at another level, which I think is a bit less readable.
My alternative suggestion would be to have a uds_settings
let and use that in the uds tests.
it 'falls back to the defaults' do | ||
expect(resolver).to have_attributes default_settings | ||
expect(resolver).to have_attributes settings | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we could add a test here that checked that the uds socket was still used if it exists, since in this branch the custom config was invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unix socket won't be used here, as DD_TRACE_AGENT_PORT
was explicitly provided. We don't want to override use configuration, even if invalid: we want to have the tracer behave as instructed given the provided settings. Of course in this case an InvalidArgument
will be raised, but that will give the user a clear message on what's the issue in their configuration.
Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks for the great improvement + clean up :)
When
ddtrace
transport options are not provided, this PR now automatically connectsddtrace
to an existing agent unix socket if one is present at/var/run/datadog/apm.socket
, the default path for the Datadog agent Kubernetes Helm chart deployment. If such file doesn't exist, it falls back to the default host:port connection method.This change has no impact on users that have any transport option configured, regardless of their source (env var,
c.hostname = ...
,c.tracer.transport_options = proc { ...
), neither it affects clients where the file/var/run/datadog/apm.socket
is not present in the local file system.This change should ease onboarding for Kubernetes workflows, and bring
ddtrace
in line with tracers from other APM languages.