Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This Terraform module creates the Datadog Log Lambda Forwarder infrastructure in
- **Lambda Permissions**: For invocation by CloudWatch Logs, S3, SNS, and EventBridge
- **Secrets Management**: Support for storing Datadog API key in Secrets Manager or SSM Parameter Store
- **VPC Support**: Deploy forwarder in VPC with proxy
- **Scheduler**: For scheduled retry of stored failed events

## Usage

Expand Down Expand Up @@ -120,17 +121,19 @@ For complete usage examples demonstrating different configuration scenarios, see

### Advanced Configuration

| Name | Description | Type | Default |
| --------------------------------- | -------------------------------- | -------- | ------- |
| dd_compression_level | Compression level (0-9) | `string` | `null` |
| dd_max_workers | Max concurrent workers | `string` | `null` |
| dd_log_level | Log level | `string` | `null` |
| dd_store_failed_events | Store failed events in S3 | `bool` | `null` |
| dd_forwarder_bucket_name | Custom S3 bucket name | `string` | `null` |
| dd_forwarder_existing_bucket_name | Existing S3 bucket name | `string` | `null` |
| dd_api_url | Custom API URL | `string` | `null` |
| dd_trace_intake_url | Custom trace intake URL | `string` | `null` |
| additional_target_lambda_arns | Additional Lambda ARNs to invoke | `string` | `null` |
| Name | Description | Type | Default |
| --------------------------------- | ------------------------------------------------------ | -------- | ------- |
| dd_compression_level | Compression level (0-9) | `string` | `null` |
| dd_max_workers | Max concurrent workers | `string` | `null` |
| dd_log_level | Log level | `string` | `null` |
| dd_store_failed_events | Store failed events in S3 | `bool` | `null` |
| dd_schedule_retry_failed_events | Periodically retry failed events (via AWS EventBridge) | `bool` | `null` |
| dd_schedule_retry_interval | Retry interval in hours for failed events | `number` | `6` |
| dd_forwarder_bucket_name | Custom S3 bucket name | `string` | `null` |
| dd_forwarder_existing_bucket_name | Existing S3 bucket name | `string` | `null` |
| dd_api_url | Custom API URL | `string` | `null` |
| dd_trace_intake_url | Custom trace intake URL | `string` | `null` |
| additional_target_lambda_arns | Additional Lambda ARNs to invoke | `string` | `null` |

### IAM Configuration

Expand Down Expand Up @@ -273,6 +276,14 @@ module "datadog_forwarder_us_west_2" {
- Your IAM role must have appropriate permissions for resources in each target region
- Secrets/parameters containing the Datadog API key should exist in each target region

## Scheduled retry

When you enable `dd_store_failed_events`, the Lambda forwarder stores any events that couldn’t be sent to Datadog in an S3 bucket. These events can be logs, metrics, or traces. They aren’t automatically re‑processed on each Lambda invocation; instead, you must trigger a [manual Lambda run](https://docs.datadoghq.com/logs/guide/forwarder/?tab=manual) to process them again.

You can automate this re‑processing by enabling `dd_schedule_retry_failed_events` parameter, creating a scheduled Lambda invocation through [AWS EventBridge](https://docs.aws.amazon.com/lambda/latest/dg/with-eventbridge-scheduler.html). By default, the forwarder attempts re‑processing every six hours.

Keep in mind that log events can only be submitted with [timestamps up to 18 hours in the past](https://docs.datadoghq.com/logs/log_collection/?tab=host#custom-log-forwarding); older timestamps will cause the events to be discarded.

## Troubleshooting

### Common Issues
Expand Down
61 changes: 61 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -290,3 +290,64 @@ resource "aws_cloudwatch_log_group" "forwarder_log_group" {

tags = var.tags
}

# Scheduled retry

resource "aws_iam_role" "scheduled_retry" {
count = var.dd_store_failed_events && var.dd_schedule_retry_failed_events ? 1 : 0

name = "${var.function_name}-${local.region}-retry"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = data.aws_partition.current.partition == "aws-cn" ? "scheduler.amazonaws.com.cn" : "scheduler.amazonaws.com"
}
}
]
})

permissions_boundary = var.permissions_boundary_arn != null ? var.permissions_boundary_arn : null

tags = var.tags
}

resource "aws_iam_role_policy" "scheduled_retry" {
count = var.dd_store_failed_events && var.dd_schedule_retry_failed_events ? 1 : 0

name = "${var.function_name}-${local.region}-retry-policy"
role = aws_iam_role.scheduled_retry[0].id

policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"lambda:InvokeFunction",
]
Effect = "Allow"
Resource = aws_lambda_function.forwarder.arn
},
]
})
}

resource "aws_scheduler_schedule" "scheduled_retry" {
count = var.dd_store_failed_events && var.dd_schedule_retry_failed_events ? 1 : 0

name = "${var.function_name}-${local.region}-retry"
description = "Retry the failed events from the Datadog Lambda Forwarder ${var.function_name}"
schedule_expression = "rate(${var.dd_schedule_retry_interval} hours)"
flexible_time_window {
mode = "OFF"
}
target {
arn = aws_lambda_function.forwarder.arn
role_arn = aws_iam_role.scheduled_retry[0].arn
input = jsonencode({ retry = true })
}
}
12 changes: 12 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,18 @@ variable "dd_store_failed_events" {
description = "Set to true to enable the forwarder to store events that failed to send to Datadog."
}

variable "dd_schedule_retry_failed_events" {
type = bool
default = null
description = "Set to true to enable a scheduled forwarder invocation (via AWS EventBridge) to process stored failed events."
}

variable "dd_schedule_retry_interval" {
type = number
default = 6
description = "Interval in hours for scheduled forwarder invocation (via AWS EventBridge)."
}

variable "dd_forwarder_existing_bucket_name" {
type = string
default = null
Expand Down