-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry sending trace payloads on failure. #128
Conversation
e624728
to
9f199ec
Compare
9f199ec
to
9da8af7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
@@ -71,6 +71,7 @@ func (l *Listener) HandlerStarted(ctx context.Context, msg json.RawMessage) cont | |||
tracer.WithService("aws.lambda"), | |||
tracer.WithLambdaMode(!l.extensionManager.IsExtensionRunning()), | |||
tracer.WithGlobalTag("_dd.origin", "lambda"), | |||
tracer.WithSendRetries(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That or even exposing the ability to pass in our own http.Client
* Create codeql-analysis.yml (DataDog#100) * Create codeql-analysis.yml * Update codeql-analysis.yml * Update run_integration_tests.sh * Do not show error messages even if neither DD_API_KEY nor DD_KMS_API_KEY is set when Lambda Extension is running (DataDog#102) * Bump version to 1.4.0 * Bump go + fasthttp + lint (DataDog#104) * Consolidate serverless configurations into one place (DataDog#105) * Update README.md * Update README.md * Bump dd-trace-go to latest version to address some vulnerabilities (DataDog#109) * Bump dd-trace-go to latest version to address some vulnaribilities * update go.sum with `go mod tidy` * Bump version to 1.6.0 * bump codeql (DataDog#112) * Bump dd-trace-go to v1.41 (DataDog#115) * Bump version to 1.7.0 * [SLS-2330] Add support for universal instrumentation with the extension (DataDog#116) add option to use universal instrumentation * [EEP-444] include error in failed metric send log (DataDog#118) Co-authored-by: Corey Griffin <CoreyGriffin@users.noreply.github.com> * [SLS-2492] Upgrade aws sdk v2 (DataDog#113) upgrade sdk * Bump version to 1.8.0 * Use new account in integration tests (DataDog#119) * set the architecture explicitely (DataDog#122) * mask init runtime logs (DataDog#123) * Update libs (DataDog#121) * bump go 1.18 (DataDog#125) * Retry sending trace payloads on failure. (DataDog#128) * Bump version to 1.9.0 * Update DD Trace to v1.51.0(DataDog#133) * Bump go version to 1.20 (DataDog#140) Bump go version to 1.20 * Upgrade version of dd-trace-go to v1.54.1 (DataDog#141) * Bump version to 1.10.0 * Propagate trace context from SQS events (DataDog#142) * Default parent id to be trace id if not found elsewhere. * Look for trace context in context object as well as headers. * Apply trace context before starting the function execution span. * Update signature in tests. * Add spanid of execution span to context. * Do not ignore priority "-128". * Test that default parent id set to trace id. * Test span id added to context. * Test uses trace context from context object. * Bump version to 1.11.0 * feat: automate AppSec enablement setup (e.g: `AWS_LAMBDA_RUNTIME_API`) (DataDog#143) * feat: honor AWS_LAMBDA_EXEC_WRAPPER when AWS Lambda does not In order to simplify onboarding & make it more uniform across languages, inspect the value of the `AWS_LAMBDA_EXEC_WRAPPER` environment variable and apply select environment variable changes it perofrms upon decorating a handler. This is necessary/useful because that environment variable is not honored by custom runtimes (`provided`, `provided.al2`) as well as the `go1.x` runtime (which is a glorified provided runtime). The datadog Lambda wrapper starts a proxy to inject ASM functionality directly on the Lambda runtime API instead of having to manually instrument each and every lambda handler/application, and modifies `AWS_LAMBDA_RUNTIME_API` to instruct Lambda language runtime client libraries to go through it instead of directly interacting with the Lambda control plane. APPSEC-11534 * pivot to a different, cheaper strategy * typo fix * PR feedback * minor fixups * add warning in go1.x runtime if lambda.norpc build tag was not enabled * Bump version to 1.12.0 * Re-add configs after upstream rebase * Bump packages * Remove deprecated `io/ioutil` calls --------- Co-authored-by: Tian Chu <tian.chu@datadoghq.com> Co-authored-by: Soshi Katsuta <skatsuta@users.noreply.github.com> Co-authored-by: Maxime David <maxime.david@datadoghq.com> Co-authored-by: kimi <47579703+kimi-p@users.noreply.github.com> Co-authored-by: Kimi Wu <kimi.wu@datadoghq.com> Co-authored-by: Dylan Yang <dylan.yang@datadoghq.com> Co-authored-by: Corey Griffin <15809365+CoreyGriffin@users.noreply.github.com> Co-authored-by: Corey Griffin <CoreyGriffin@users.noreply.github.com> Co-authored-by: Marcin Rabenda <xrn.design@gmail.com> Co-authored-by: Rey Abolofia <purple4reina@gmail.com> Co-authored-by: Rey Abolofia <rey.abolofia@datadoghq.com> Co-authored-by: Andrew Rodriguez <49878080+zARODz11z@users.noreply.github.com> Co-authored-by: Ivan Topolcic <IvanTopolcic@users.noreply.github.com> Co-authored-by: Romain Marcadier <romain.muller@telecomnancy.net>
What does this PR do?
When sending traces to the extension fails, retry up to 2 times.
Motivation
In a very small percentage of cases for high throughput apps, traces are unsuccessfully sent to the extension. We're seeing errors like
While increasing the timeout helps, you can see how some failures happen before the timeout is hit. This is because the datadog lambda extension has been paused in the middle of the request. When this is done, the connection is abruptly closed.
Therefore, this pull request allows the tracer to retry sending the trace at its next earliest convenience.
Testing Guidelines
Additional Notes
See DataDog/dd-trace-go#1636 for corresponding change in the go tracer.
Types of changes
Checklist