-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate trace context from SQS events #142
Conversation
696512a
to
e2fe7e3
Compare
parentId := traceId | ||
if pid := response.Header.Get(string(DdParentId)); pid != "" { | ||
parentId = pid | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the parent id is not set, later on we will throw out the found trace context completely.
@@ -126,21 +127,36 @@ func createDummySubsegmentForXrayConverter(ctx context.Context, traceCtx TraceCo | |||
return nil | |||
} | |||
|
|||
func getTraceContext(context map[string]string) (TraceContext, bool) { | |||
func getTraceContext(ctx context.Context, headers map[string]string) (TraceContext, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the extension is in charge of parsing for trace context, it is included in the response from calling start invocation. It is then added to the context object. If present, it therefore should be read in case there is no trace context found in the event payload headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by event payload headers, you mean the actual event that triggered the go lambda right?
Just wanna make sure I understand right, so we get trace context by either pinging the extension here which populates ctx
and headers
could have trace context via some event such as api gw like this?
So to be sure we extract and attach context we do check both ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly correct. Reading the "event payload headers" is hold over from before we added universal instrumentation. We could potentially get rid of it completely and just rely on the extension to do this work for us.
@@ -64,6 +64,10 @@ func (l *Listener) HandlerStarted(ctx context.Context, msg json.RawMessage) cont | |||
return ctx | |||
} | |||
|
|||
if l.universalInstrumentation && l.extensionManager.IsExtensionRunning() { | |||
ctx = l.extensionManager.SendStartInvocationRequest(ctx, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling SendStartInvocationRequest
gives the extension the event payload. It then parses it for trace context, creates any inferred spans, and returns the trace context. Therefore, we need to get this trace context from the extension before starting the execution span here in the function.
@@ -149,7 +149,9 @@ func startFunctionExecutionSpan(ctx context.Context, mergeXrayTraces bool, isDdS | |||
span.SetTag("_dd.parent_source", "xray") | |||
} | |||
|
|||
return span | |||
ctx = context.WithValue(ctx, extension.DdSpanId, fmt.Sprint(span.Context().SpanID())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the span id to the context for use later when calling end invocation on the extension. (see
req.Header.Set(string(DdSpanId), spanId) |
The extension uses this span id to replace the span id of the aws.lambda
span that it creates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a sanity check, that SpanID gets replaced here https://github.com/DataDog/datadog-agent/blob/570ed026d8dc3afc4324610db8f922027f8786ca/pkg/serverless/invocationlifecycle/trace.go#L110-L111?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly right. You'll notice that the tracer and the extension are creating aws.lambda
spans. However, the extension knows to throw out the one made in the tracer. The extension will ensure that both of these aws.lambda
spans have the same trace context (span id, trace, id, parent id).
We do it this way because we want to make sure the tracer has access to the aws.lambda
span as the parent of any spans it creates itself.
Codecov Report
@@ Coverage Diff @@
## main #142 +/- ##
==========================================
+ Coverage 83.13% 83.60% +0.46%
==========================================
Files 13 13
Lines 848 860 +12
==========================================
+ Hits 705 719 +14
+ Misses 114 113 -1
+ Partials 29 28 -1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
It seems your description sc shows the aws.lambda consumer span. Assuming the description is just out of date? if so np, just want to make sure I understand
|
Yes, great catch! I did get to the bottom of why I wasn't seeing the |
np doing a full review now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions, but everything looks good
* Default parent id to be trace id if not found elsewhere. * Look for trace context in context object as well as headers. * Apply trace context before starting the function execution span. * Update signature in tests. * Add spanid of execution span to context. * Do not ignore priority "-128". * Test that default parent id set to trace id. * Test span id added to context. * Test uses trace context from context object.
* Create codeql-analysis.yml (DataDog#100) * Create codeql-analysis.yml * Update codeql-analysis.yml * Update run_integration_tests.sh * Do not show error messages even if neither DD_API_KEY nor DD_KMS_API_KEY is set when Lambda Extension is running (DataDog#102) * Bump version to 1.4.0 * Bump go + fasthttp + lint (DataDog#104) * Consolidate serverless configurations into one place (DataDog#105) * Update README.md * Update README.md * Bump dd-trace-go to latest version to address some vulnerabilities (DataDog#109) * Bump dd-trace-go to latest version to address some vulnaribilities * update go.sum with `go mod tidy` * Bump version to 1.6.0 * bump codeql (DataDog#112) * Bump dd-trace-go to v1.41 (DataDog#115) * Bump version to 1.7.0 * [SLS-2330] Add support for universal instrumentation with the extension (DataDog#116) add option to use universal instrumentation * [EEP-444] include error in failed metric send log (DataDog#118) Co-authored-by: Corey Griffin <CoreyGriffin@users.noreply.github.com> * [SLS-2492] Upgrade aws sdk v2 (DataDog#113) upgrade sdk * Bump version to 1.8.0 * Use new account in integration tests (DataDog#119) * set the architecture explicitely (DataDog#122) * mask init runtime logs (DataDog#123) * Update libs (DataDog#121) * bump go 1.18 (DataDog#125) * Retry sending trace payloads on failure. (DataDog#128) * Bump version to 1.9.0 * Update DD Trace to v1.51.0(DataDog#133) * Bump go version to 1.20 (DataDog#140) Bump go version to 1.20 * Upgrade version of dd-trace-go to v1.54.1 (DataDog#141) * Bump version to 1.10.0 * Propagate trace context from SQS events (DataDog#142) * Default parent id to be trace id if not found elsewhere. * Look for trace context in context object as well as headers. * Apply trace context before starting the function execution span. * Update signature in tests. * Add spanid of execution span to context. * Do not ignore priority "-128". * Test that default parent id set to trace id. * Test span id added to context. * Test uses trace context from context object. * Bump version to 1.11.0 * feat: automate AppSec enablement setup (e.g: `AWS_LAMBDA_RUNTIME_API`) (DataDog#143) * feat: honor AWS_LAMBDA_EXEC_WRAPPER when AWS Lambda does not In order to simplify onboarding & make it more uniform across languages, inspect the value of the `AWS_LAMBDA_EXEC_WRAPPER` environment variable and apply select environment variable changes it perofrms upon decorating a handler. This is necessary/useful because that environment variable is not honored by custom runtimes (`provided`, `provided.al2`) as well as the `go1.x` runtime (which is a glorified provided runtime). The datadog Lambda wrapper starts a proxy to inject ASM functionality directly on the Lambda runtime API instead of having to manually instrument each and every lambda handler/application, and modifies `AWS_LAMBDA_RUNTIME_API` to instruct Lambda language runtime client libraries to go through it instead of directly interacting with the Lambda control plane. APPSEC-11534 * pivot to a different, cheaper strategy * typo fix * PR feedback * minor fixups * add warning in go1.x runtime if lambda.norpc build tag was not enabled * Bump version to 1.12.0 * Re-add configs after upstream rebase * Bump packages * Remove deprecated `io/ioutil` calls --------- Co-authored-by: Tian Chu <tian.chu@datadoghq.com> Co-authored-by: Soshi Katsuta <skatsuta@users.noreply.github.com> Co-authored-by: Maxime David <maxime.david@datadoghq.com> Co-authored-by: kimi <47579703+kimi-p@users.noreply.github.com> Co-authored-by: Kimi Wu <kimi.wu@datadoghq.com> Co-authored-by: Dylan Yang <dylan.yang@datadoghq.com> Co-authored-by: Corey Griffin <15809365+CoreyGriffin@users.noreply.github.com> Co-authored-by: Corey Griffin <CoreyGriffin@users.noreply.github.com> Co-authored-by: Marcin Rabenda <xrn.design@gmail.com> Co-authored-by: Rey Abolofia <purple4reina@gmail.com> Co-authored-by: Rey Abolofia <rey.abolofia@datadoghq.com> Co-authored-by: Andrew Rodriguez <49878080+zARODz11z@users.noreply.github.com> Co-authored-by: Ivan Topolcic <IvanTopolcic@users.noreply.github.com> Co-authored-by: Romain Marcadier <romain.muller@telecomnancy.net>
What does this PR do?
When this lambda function is a consumer for a SQS queue, collect the trace context passed back from the extension and apply it.
Motivation
Traces with a go SQS consumer were broken.
Testing Guidelines
Sample trace using these updates
Note that any child span created inside the lambda function currently will not appear in the trace. This is because the extension isn't reading sampling priority and therefore the tracer defaults to priority 0 (throw it away). See https://datadoghq.atlassian.net/browse/SVLS-3537.
Additional Notes
For some reason I am not seeing the
aws.lambda
span created for my sqs consumer. This should be created in the extension when universal instrumentation is turned on. I haven't been able to get to the bottom of this. However, I have been able to show that any child spans created in the SQS comsumer lambda will have the proper trace ID and show up properly in the trace.Types of changes
Checklist