Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions aws/logs_monitoring/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,14 @@ If you can't install the Forwarder using the provided CloudFormation template, y
5. Some AWS accounts are configured such that triggers will not automatically create resource-based policies allowing Cloudwatch log groups to invoke the forwarder. Reference the [CloudWatchLogPermissions][103] to see which permissions are required for the forwarder to be invoked by Cloudwatch Log Events.
6. [Configure triggers][104].
7. Create an S3 bucket, and set environment variable `DD_S3_BUCKET_NAME` to the bucket name. Also provide `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, and `s3:DeleteObject` permissions on this bucket to the Lambda execution role. This bucket is used to store the different tags cache i.e. Lambda, S3, Step Function and Log Group. Additionally, this bucket will be used to store unforwarded events incase of forwarding exceptions.
8. Set environment variable `DD_STORE_FAILED_EVENTS` to `true` to enable the forwarder to also store event data in the S3 bucket. In case of exceptions when sending logs, metrics or traces to intake, the forwarder will store relevant data in the S3 bucket. On custom invocations i.e. on receiving an event with the `retry` keyword set to a non empty string (which can be manually triggered - see below), the forwarder will retry sending the stored events. When successful it will clear up the storage in the bucket.
8. Set the environment variable `DD_STORE_FAILED_EVENTS` to `true`, so you can enable the forwarder to also store event data in the S3 bucket. If an exception occurs when sending logs, metrics, or traces to intake, the forwarder stores relevant data in the S3 bucket. On custom invocations, such as on receiving an event with the `retry` keyword explicitly set to `true`, the forwarder retries sending the stored events. Upon a successful forwarding, the forwarder cleans up the stored logs.

```bash
aws lambda invoke --function-name <function-name> --payload '{"retry":"true"}' --cli-binary-format raw-in-base64-out --log-type Tail /dev/stdout
aws lambda invoke --function-name <function-name> \
--payload '{"retry":true}' \
--cli-binary-format raw-in-base64-out \
--log-type Tail /dev/stdout |
jq -r 'select(.LogResult) | .LogResult' | base64 -d | xargs -0 printf "%s"
```

<div class="alert alert-warning">
Expand Down Expand Up @@ -312,6 +316,14 @@ Otherwise, if you are using Web Proxy:
7. Set `DdNoSsl` to `true` if connecting to the proxy using `http`.
8. Set `DdSkipSslValidation` to `true` if connecting to the proxy using `https` with a self-signed certificate.

### Scheduled retry

When you enable `DdStoreFailedEvents`, the Lambda forwarder stores any events that couldn’t be sent to Datadog in an S3 bucket. These events can be logs, metrics, or traces. They aren’t automatically re‑processed on each Lambda invocation; instead, you must trigger a [manual Lambda run](https://docs.datadoghq.com/logs/guide/forwarder/?tab=manual) to process them again.

You can automate this re‑processing by enabling `DdScheduleRetryFailedEvents` parameter, creating a scheduled Lambda invocation through [AWS EventBridge](https://docs.aws.amazon.com/lambda/latest/dg/with-eventbridge-scheduler.html). By default, the forwarder attempts re‑processing every six hours.

Keep in mind that log events can only be submitted with [timestamps up to 18 hours in the past](https://docs.datadoghq.com/logs/log_collection/?tab=host#custom-log-forwarding); older timestamps will cause the events to be discarded.

### Code signing

The Datadog Forwarder is signed by Datadog. To verify the integrity of the Forwarder, use the manual installation method. [Create a Code Signing Configuration][19] that includes Datadog’s Signing Profile ARN (`arn:aws:signer:us-east-1:464622532012:/signing-profiles/DatadogLambdaSigningProfile/9vMI9ZAGLc`) and associate it with the Forwarder Lambda function before uploading the Forwarder ZIP file.
Expand Down Expand Up @@ -456,6 +468,15 @@ To test different patterns against your logs, turn on [debug logs](#troubleshoot
`AdditionalTargetLambdaArns`
: Comma separated list of Lambda ARNs that will get called asynchronously with the same `event` the Datadog Forwarder receives.

`DdStoreFailedEvents`
: Set to true to enable the forwarder to store events that failed to send to Datadog.

`DdScheduleRetryFailedEvents`
: Set to true to enable a scheduled forwarder invocation (via AWS EventBridge) to process stored failed events.

`DdScheduleRetryInterval`
: Interval in hours for scheduled forwarder invocation (via AWS EventBridge).

`InstallAsLayer`
: Whether to use the layer-based installation flow. Set to false to use the legacy installation flow, which installs a second function that copies the forwarder code from GitHub to an S3 bucket. Defaults to true.

Expand Down Expand Up @@ -622,6 +643,9 @@ To test different patterns against your logs, turn on [debug logs](#troubleshoot
`ADDITIONAL_TARGET_LAMBDA_ARNS`
: Comma separated list of Lambda ARNs that will get called asynchronously with the same `event` the Datadog Forwarder receives.

`DD_STORE_FAILED_EVENTS`
: Set to true to enable the forwarder to store events that failed to send to Datadog.

`INSTALL_AS_LAYER`
: Whether to use the layer-based installation flow. Set to false to use the legacy installation flow, which installs a second function that copies the forwarder code from GitHub to an S3 bucket. Defaults to true.

Expand Down
14 changes: 12 additions & 2 deletions aws/logs_monitoring/lambda_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,17 @@ def datadog_forwarder(event, context):
init_cache_layer(function_prefix)
init_forwarder(function_prefix)

if len(event) == 1 and str(event.get(DD_RETRY_KEYWORD, "false")).lower() == "true":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which cases would we receive a len(event) > 1 AFAIK the forwarder is always triggered by a single event which could include several logs

Copy link
Contributor Author

@ViBiOh ViBiOh Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but we want to launch the retry only mode in the very specific {"retry":true} event.

For example if SQS or async invocation of the lambda add this kind of entry in the received event, we'll do both the retry failed events and processing the event.

To avoid forwarding this specific event (the {"retry":true}), we exit the function when it's the case, so I want to clearly identify this event.

logger.info("Retry-only invocation")

try:
forwarder.retry()
except Exception as e:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Failed to retry forwarding {e}")

return

parsed = parse(event, context, cache_layer)
enriched = enrich(parsed, cache_layer)
transformed = transform(enriched)
Expand All @@ -71,12 +82,11 @@ def datadog_forwarder(event, context):
parse_and_submit_enhanced_metrics(logs, cache_layer)

try:
if bool(event.get(DD_RETRY_KEYWORD, False)) is True:
if str(event.get(DD_RETRY_KEYWORD, "false")).lower() == "true":
forwarder.retry()
except Exception as e:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Failed to retry forwarding {e}")
pass


def init_cache_layer(function_prefix):
Expand Down
70 changes: 66 additions & 4 deletions aws/logs_monitoring/template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,17 @@ Parameters:
- true
- false
Description: Set to true to enable the forwarder to store events that failed to send to Datadog.
DdScheduleRetryFailedEvents:
Type: String
Default: false
AllowedValues:
- true
- false
Description: Set to true to enable a scheduled forwarder invocation (via AWS EventBridge) to process stored failed events.
DdScheduleRetryInterval:
Type: Number
Default: 6
Description: Interval in hours for scheduled forwarder invocation (via AWS EventBridge).
DdForwarderExistingBucketName:
Type: String
Default: ""
Expand Down Expand Up @@ -292,7 +303,7 @@ Parameters:
KmsKeyList:
Type: CommaDelimitedList
Default: ""
Description: List of KMS Key ARNs the Lambda forwarder function can use to decrypt, seperated by comma
Description: List of KMS Key ARNs the Lambda forwarder function can use to decrypt, seperated by comma
Conditions:
IsAWSChina: !Equals [!Ref "AWS::Partition", aws-cn]
IsGovCloud: !Equals [!Ref "AWS::Partition", aws-us-gov]
Expand Down Expand Up @@ -348,7 +359,8 @@ Conditions:
SetLayerARN: !Not
- !Equals [!Ref LayerARN, ""]
SetDdForwardLog: !Equals [!Ref DdForwardLog, false]
SetDdStepFunctionsTraceEnabled: !Equals [!Ref DdStepFunctionsTraceEnabled, true]
SetDdStepFunctionsTraceEnabled:
!Equals [!Ref DdStepFunctionsTraceEnabled, true]
SetDdUseCompression: !Equals [!Ref DdUseCompression, false]
SetDdCompressionLevel: !Not
- !Equals [!Ref DdCompressionLevel, 6]
Expand Down Expand Up @@ -384,6 +396,9 @@ Conditions:
- !Equals [!Ref DdLogLevel, ""]
SetDdForwarderDecryptKeys: !Not
- !Equals [!Join ["", !Ref KmsKeyList], ""]
CreateRetryScheduler: !And
- !Equals [!Ref DdStoreFailedEvents, true]
- !Equals [!Ref DdScheduleRetryFailedEvents, true]
Rules:
MustSetDdApiKey:
Assertions:
Expand Down Expand Up @@ -431,7 +446,10 @@ Resources:
- !Ref DdForwarderExistingBucketName
S3Key: !Sub
- "aws-dd-forwarder-${DdForwarderVersion}.zip"
- {DdForwarderVersion: !FindInMap [Constants, DdForwarder, Version]}
- {
DdForwarderVersion:
!FindInMap [Constants, DdForwarder, Version],
}
- ZipFile: " "
MemorySize: !Ref MemorySize
Runtime: python3.13
Expand Down Expand Up @@ -831,7 +849,7 @@ Resources:
- !Ref SourceZipUrl
- !Sub
- "https://github.com/DataDog/datadog-serverless-functions/releases/download/aws-dd-forwarder-${DdForwarderVersion}/aws-dd-forwarder-${DdForwarderVersion}.zip"
- {DdForwarderVersion: !FindInMap [Constants, DdForwarder, Version]}
- { DdForwarderVersion: !FindInMap [Constants, DdForwarder, Version] }
# The Forwarder's source code is too big to fit the inline code size limit for CloudFormation. In most of AWS
# partitions and regions, the Forwarder is able to load its source code from a Lambda layer attached to it.
# In places where Datadog can't/doesn't yet publish Lambda layers, use another Lambda to copy the source code
Expand Down Expand Up @@ -970,6 +988,50 @@ Resources:
- - "arn:*:s3:::"
- !Select [1, !Split ["s3://", !Ref SourceZipUrl]]
- !Ref AWS::NoValue
SchedulerRole:
Type: AWS::IAM::Role
Condition: CreateRetryScheduler
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service: !If
- IsAWSChina
- "scheduler.amazonaws.com.cn"
- "scheduler.amazonaws.com"
PermissionsBoundary: !If
- SetPermissionsBoundary
- !Ref PermissionsBoundaryArn
- !Ref AWS::NoValue
Policies:
- PolicyName: SchedulerRolePolicy0
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- lambda:InvokeFunction
Resource:
- !GetAtt
- Forwarder
- Arn
Scheduler:
Type: AWS::Scheduler::Schedule
Condition: CreateRetryScheduler
Properties:
Name: !Sub "${AWS::StackName}-retry"
Description: Retry the failed events from the Datadog Lambda Forwarder
ScheduleExpression: !Sub "rate(${DdScheduleRetryInterval} hours)"
FlexibleTimeWindow:
Mode: "OFF"
Target:
Arn: !GetAtt "Forwarder.Arn"
RoleArn: !GetAtt "SchedulerRole.Arn"
Input: '{"retry": true}'
Outputs:
DatadogForwarderArn:
Description: Datadog Forwarder Lambda Function ARN
Expand Down
Loading