Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace correlation seems to be broken with v1.11.x #5017

Open
ithinkicancode opened this issue Apr 3, 2023 · 8 comments
Open

Trace correlation seems to be broken with v1.11.x #5017

ithinkicancode opened this issue Apr 3, 2023 · 8 comments
Labels
comp: context propagation Trace context propagation

Comments

@ithinkicancode
Copy link

ithinkicancode commented Apr 3, 2023

Tracing correlation for Datadog (across services) works with v1.10.1. I've tried all the 1.11.x releases. All seems to be broken for trace correlation. There is no change in my code - I'm still passing x-datadog-parent-id and x-datadog-trace-id as HTTP headers.

@bantonsson
Copy link
Contributor

Hi @ithinkicancode, thanks for reporting this. Would it be possible for you to provide the output of the DATADOG TRACER CONFIGURATION log statement, and some debug logs for both a working and non working tracer version?

@bantonsson bantonsson added the comp: context propagation Trace context propagation label Apr 4, 2023
@ithinkicancode
Copy link
Author

Hi @bantonsson, here's the DATADOG TRACER CONFIGURATION when using v1.10.1 (working version):

DATADOG TRACER CONFIGURATION {"version":"1.10.1~453981e186","os_name":"Linux","os_version":"4.14.232-176.381.amzn2.x86_64","architecture":"amd64","lang":"jvm","lang_version":"17","jvm_vendor":"Azul Systems, Inc.","jvm_version":"17+35-LTS","java_class_version":"61.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"<redacted>","agent_url":"<redacted>","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":false,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"ba7a6111-123e-40c9-87c1-82bf528866ef","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":false,"datadog_profiler_safe":false}

I can deploy a debug version with v1.11.2 and get you the output of DATADOG TRACER CONFIGURATION too if needed.

As for:

some debug logs for both a working and non working tracer version

Did you mean logs in the collector/agent or did you mean debug logs in our app (although I don't quite get how app logs can be related to tracing as we don't write logs for tracing itself - we do have logs correlated to traces but this is about tracing correlation across services. On its own, traces show up in Datadog but traces in other services no longer show up when using v1.11.x)? Can you clarify? Maybe you meant the data that I can use nc to watch the port (if I run it locally)?

@bantonsson
Copy link
Contributor

Hey @ithinkicancode, sorry for not being more specific.

It would be great if you could deploy a version with v1.11.2 for the DATADOG TRACER CONFIGURATION for reference.

When I said debug logs, I meant that you, if it is possible, would enabled DD_TRACE_DEBUG=true to get debug logs from the Java tracer with the non working version. This will dump the logs to the console. Enabling this will have performance impacts, and should not be done in a production setting.

@ithinkicancode
Copy link
Author

Sorry for the delay (holiday weekend). Here is the Tracer Config from v1.11.2:
DATADOG TRACER CONFIGURATION {"version":"1.11.2~4e957fc01e","os_name":"Linux","os_version":"4.14.232-176.381.amzn2.x86_64","architecture":"amd64","lang":"jvm","lang_version":"17","jvm_vendor":"Azul Systems, Inc.","jvm_version":"17+35-LTS","java_class_version":"61.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"<redacted>","agent_url":"<redacted>","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":false,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"dd7b834e-063a-4813-b6a8-5a52af760ef9","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":false,"datadog_profiler_safe":false}

@bantonsson
Copy link
Contributor

No worries @ithinkicancode. The settings for v1.11.2 still says that the trace propagation styles for extract and inject are datadog. It is very strange that you have issues with the header propagation when upgrading to v1.11.x. We have not received any other complaints about this breaking. Since not all settings are shown in the DATADOG TRACER CONFIGURATION line, could you list any DD_* environment variables and -Ddd.* settings that you are using?

@ithinkicancode
Copy link
Author

Thanks. I'll try - will need to talk SRE to get an exhaustive list. One thing I haven't pointed out is, the API that my service calls is using an older version of dd-trace (possibly v9). I'm not sure if that matters. At least v10 in my service works with v9 in the dependent API service, in terms of trace correlation.

@bantonsson
Copy link
Contributor

Hi @ithinkicancode, from what you are describing in your last comment, it seems like you have multiple services with multiple versions of the Java Tracer installed. Could you please explain the actual observed issue in more detail, i.e. a v1.10.0 tracer sending to a v1.9.0 tracer creates broken traces.

@vekaputra-sqe
Copy link

hi, since i have similar issues, i decided to ask here instead of creating new issues.

i'm trying to setup APM for golang service and keycloak, the golang service is a wrapper for keycloak and calling keycloak via HTTP API for authentication. the problem is, in datadog the golang service never seem to be connected to keycloak in APM traces, it just listed keycloak as http.request and keycloak also not registering golang service as the caller.

i checked the request sent by golang and received by keycloak, both have x-datadog-parent-id, x-datadog-trace-id, x-datadog-tags and x-datadog-sampling-priority HTTP headers. but still in datadog the 2 service is not connected and keycloak as the server seems to create it's own trace id instead of using the one sent via HTTP headers.
image
image

lib / service version that i used is:

go 1.19
gopkg.in/DataDog/dd-trace-go.v1 v1.47.0 -> datadog go trace library
github.com/go-resty/resty/v2 v2.7.0 -> http request library that send request to keycloak

keycloak v19.0.3
dd-trace-java v1.12.1
quarkus v2.7.6.Final -> web server

the other dd-trace-java version that i try and still the same result are 1.15.3, 1.11.2, 1.10.1, 0.98.1

i added some options to dd-trace-java which is -Ddd.integration.vertx.enabled=false -Ddd.trace.executors="org.jboss.threads.EnhancedQueueExecutor", somehow if vertx.enabled = true there is no trace at all from keycloak side.

sample request that i sent from golang service:

~~~ REQUEST ~~~
POST  /realms/:realmId/protocol/openid-connect/token  HTTP/1.1
HOST   : kc.dev.example.com
HEADERS:
	Authorization: Basic <basic_token>
	Content-Type: application/x-www-form-urlencoded
	Traceparent: 00-000000000000000066510249fb002c13-66510249fb002c13-01
	Tracestate: dd=s:1;t.dm:-1
	User-Agent: go-resty/2.7.0 (https://github.com/go-resty/resty)
	X-Datadog-Parent-Id: 7372676581749173267
	X-Datadog-Sampling-Priority: 1
	X-Datadog-Tags: _dd.p.dm=-1
	X-Datadog-Trace-Id: 7372676581749173267
BODY   :
client_id=account&grant_type=password&password=StrongPassword&response_type=token&scope=openid&username=%2B6281100010001

and the request received by keycloak (via quarkus):

2023-06-13 18:59:45,925 INFO  [io.quarkus.http.access-log] (executor-thread-3) POST /realms/:realmId/protocol/openid-connect/token HTTP/2
tracestate: dd=s:1;t.dm:-1
x-datadog-tags: _dd.p.dm=-1
authorization: Basic <basic_token>
x-datadog-trace-id: 7372676581749173267
x-datadog-sampling-priority: 1
content-type: application/x-www-form-urlencoded
traceparent: 00-000000000000000066510249fb002c13-4bb5a6e2240c78b2-01
x-datadog-parent-id: 5455450013826840754
user-agent: go-resty/2.7.0 (https://github.com/go-resty/resty)
accept-encoding: gzip
content-length: 123
host: kc.dev.example.com

here's the startup logs including datadog tracer config

Appending additional Java properties to JAVA_OPTS: -Djgroups.dns.query=keycloak-headless.default.svc.cluster.local -javaagent:dd-java-agent.jar -Ddd.integration.vertx.enabled=false -Ddd.trace.executors="org.jboss.threads.EnhancedQueueExecutor"
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[dd.trace 2023-06-13 19:04:34:177 +0000] [main] INFO com.datadog.appsec.AppSecSystem - AppSec is ENABLED_INACTIVE with powerwaf(libddwaf: 1.10.0) no rules loaded
[dd.trace 2023-06-13 19:04:34:269 +0000] [dd-task-scheduler] INFO datadog.trace.agent.core.StatusLogger - DATADOG TRACER CONFIGURATION {"version":"1.15.3~6c73dffd68","os_name":"Linux","os_version":"5.10.162+","architecture":"amd64","lang":"jvm","lang_version":"11.0.17","jvm_vendor":"Red Hat, Inc.","jvm_version":"11.0.17+8-LTS","java_class_version":"55.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"keycloak","agent_url":"http://dd-staging.example.com:8126","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":true,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"37c6eb63-db44-4f8a-86db-2b92a2ab3ca6","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":true,"datadog_profiler_safe":true}

are there any other config that i need to change to make the trace work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: context propagation Trace context propagation
Projects
None yet
Development

No branches or pull requests

3 participants