Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry sending trace payloads on failure. #128

Merged
merged 1 commit into from
Feb 28, 2023

Conversation

purple4reina
Copy link
Contributor

@purple4reina purple4reina commented Feb 21, 2023

What does this PR do?

When sending traces to the extension fails, retry up to 2 times.

Motivation

In a very small percentage of cases for high throughput apps, traces are unsuccessfully sent to the extension. We're seeing errors like

2022/12/19 21:09:41 Datadog Tracer v1.45.1 ERROR: lost 1 traces: Post "http://localhost:8126/v0.4/traces": read tcp 127.0.0.1:44108->127.0.0.1:8126: read: connection reset by peer ([send duration: 0.327196ms]) (occurred: 19 Dec 22 21:07 UTC)

2022/12/19 21:15:44 Datadog Tracer v1.45.1 ERROR: lost 1 traces: Post "http://localhost:8126/v0.4/traces": write tcp 127.0.0.1:45932->127.0.0.1:8126: write: broken pipe ([send duration: 0.225527ms]) (occurred: 19 Dec 22 21:14 UTC)

2022/12/19 21:17:54 Datadog Tracer v1.45.1 ERROR: lost 1 traces: Post "http://localhost:8126/v0.4/traces": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ([send duration: 19.249s]) (occurred: 19 Dec 22 21:14 UTC)

While increasing the timeout helps, you can see how some failures happen before the timeout is hit. This is because the datadog lambda extension has been paused in the middle of the request. When this is done, the connection is abruptly closed.

Therefore, this pull request allows the tracer to retry sending the trace at its next earliest convenience.

Testing Guidelines

Additional Notes

See DataDog/dd-trace-go#1636 for corresponding change in the go tracer.

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Checklist

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog

@purple4reina purple4reina marked this pull request as ready for review February 28, 2023 16:49
@purple4reina purple4reina requested a review from a team as a code owner February 28, 2023 16:49
Copy link
Contributor

@hghotra hghotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@purple4reina purple4reina merged commit 169262a into main Feb 28, 2023
@purple4reina purple4reina deleted the rey.abolofia/send-retries branch February 28, 2023 16:54
@@ -71,6 +71,7 @@ func (l *Listener) HandlerStarted(ctx context.Context, msg json.RawMessage) cont
tracer.WithService("aws.lambda"),
tracer.WithLambdaMode(!l.extensionManager.IsExtensionRunning()),
tracer.WithGlobalTag("_dd.origin", "lambda"),
tracer.WithSendRetries(2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this configurable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That or even exposing the ability to pass in our own http.Client

peterdeme pushed a commit to spacelift-io/datadog-lambda-go that referenced this pull request Nov 15, 2023
peterdeme added a commit to spacelift-io/datadog-lambda-go that referenced this pull request Dec 4, 2023
* Create codeql-analysis.yml (DataDog#100)

* Create codeql-analysis.yml

* Update codeql-analysis.yml

* Update run_integration_tests.sh

* Do not show error messages even if neither DD_API_KEY nor DD_KMS_API_KEY is set when Lambda Extension is running (DataDog#102)

* Bump version to 1.4.0

* Bump go + fasthttp + lint (DataDog#104)

* Consolidate serverless configurations into one place (DataDog#105)

* Update README.md

* Update README.md

* Bump dd-trace-go to latest version to address some vulnerabilities (DataDog#109)

* Bump dd-trace-go to latest version to address some vulnaribilities
* update go.sum with `go mod tidy`

* Bump version to 1.6.0

* bump codeql (DataDog#112)

* Bump dd-trace-go to v1.41 (DataDog#115)

* Bump version to 1.7.0

* [SLS-2330] Add support for universal instrumentation with the extension (DataDog#116)

add option to use universal instrumentation

* [EEP-444] include error in failed metric send log (DataDog#118)

Co-authored-by: Corey Griffin <CoreyGriffin@users.noreply.github.com>

* [SLS-2492] Upgrade aws sdk v2 (DataDog#113)

upgrade sdk

* Bump version to 1.8.0

* Use new account in integration tests (DataDog#119)

* set the architecture explicitely (DataDog#122)

* mask init runtime logs (DataDog#123)

* Update libs (DataDog#121)

* bump go 1.18 (DataDog#125)

* Retry sending trace payloads on failure. (DataDog#128)

* Bump version to 1.9.0

* Update DD Trace to  v1.51.0(DataDog#133)

* Bump go version to 1.20 (DataDog#140)

Bump go version to 1.20

* Upgrade version of dd-trace-go to v1.54.1 (DataDog#141)

* Bump version to 1.10.0

* Propagate trace context from SQS events (DataDog#142)

* Default parent id to be trace id if not found elsewhere.

* Look for trace context in context object as well as headers.

* Apply trace context before starting the function execution span.

* Update signature in tests.

* Add spanid of execution span to context.

* Do not ignore priority "-128".

* Test that default parent id set to trace id.

* Test span id added to context.

* Test uses trace context from context object.

* Bump version to 1.11.0

* feat: automate AppSec enablement setup (e.g: `AWS_LAMBDA_RUNTIME_API`) (DataDog#143)

* feat: honor AWS_LAMBDA_EXEC_WRAPPER when AWS Lambda does not

In order to simplify onboarding & make it more uniform across languages,
inspect the value of the `AWS_LAMBDA_EXEC_WRAPPER` environment variable
and apply select environment variable changes it perofrms upon
decorating a handler.

This is necessary/useful because that environment variable is not
honored by custom runtimes (`provided`, `provided.al2`) as well as the
`go1.x` runtime (which is a glorified provided runtime). The datadog
Lambda wrapper starts a proxy to inject ASM functionality directly on
the Lambda runtime API instead of having to manually instrument each and
every lambda handler/application, and modifies `AWS_LAMBDA_RUNTIME_API`
to instruct Lambda language runtime client libraries to go through it
instead of directly interacting with the Lambda control plane.

APPSEC-11534

* pivot to a different, cheaper strategy

* typo fix

* PR feedback

* minor fixups

* add warning in go1.x runtime if lambda.norpc build tag was not enabled

* Bump version to 1.12.0

* Re-add configs after upstream rebase

* Bump packages

* Remove deprecated `io/ioutil` calls

---------

Co-authored-by: Tian Chu <tian.chu@datadoghq.com>
Co-authored-by: Soshi Katsuta <skatsuta@users.noreply.github.com>
Co-authored-by: Maxime David <maxime.david@datadoghq.com>
Co-authored-by: kimi <47579703+kimi-p@users.noreply.github.com>
Co-authored-by: Kimi Wu <kimi.wu@datadoghq.com>
Co-authored-by: Dylan Yang <dylan.yang@datadoghq.com>
Co-authored-by: Corey Griffin <15809365+CoreyGriffin@users.noreply.github.com>
Co-authored-by: Corey Griffin <CoreyGriffin@users.noreply.github.com>
Co-authored-by: Marcin Rabenda <xrn.design@gmail.com>
Co-authored-by: Rey Abolofia <purple4reina@gmail.com>
Co-authored-by: Rey Abolofia <rey.abolofia@datadoghq.com>
Co-authored-by: Andrew Rodriguez <49878080+zARODz11z@users.noreply.github.com>
Co-authored-by: Ivan Topolcic <IvanTopolcic@users.noreply.github.com>
Co-authored-by: Romain Marcadier <romain.muller@telecomnancy.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants