Monitor concurrent AWS Backup job sizes for DynamoDB using EventBridge, Lambda, Step Functions and CloudWatch with configurable alerting before hitting service limits.
AWS Backup Events → EventBridge Rule → Lambda Function → Step Functions → CloudWatch Metrics → Alarms → SNS
- Real-time monitoring of AWS Backup RUNNING events
- Continuous metric emission via Step Functions for long-running backup jobs
- Automatic table size lookup from DynamoDB API
- Configurable service limits in bytes (default: 50TB)
- Automatic threshold calculation (80% warning, 90% critical)
- Self-contained - entire solution in one CloudFormation template
-
Deploy the stack:
aws cloudformation create-stack \ --stack-name backup-monitoring \ --template-body file://backup-monitoring.json \ --capabilities CAPABILITY_IAM
-
Subscribe to notifications:
aws sns subscribe \ --topic-arn $(aws cloudformation describe-stacks \ --stack-name backup-monitoring \ --query 'Stacks[0].Outputs[?OutputKey==`SNSTopicArn`].OutputValue' \ --output text) \ --protocol email \ --notification-endpoint your-email@example.com
-
Monitor your backups - alerts will trigger at 80% (warning) and 90% (critical) of your configured limit.
| Parameter | Default | Description |
|---|---|---|
| ServiceLimitBytes | 54975581388800 | Concurrent backup limit in bytes (50TB) |
| WarningThresholdBytes | 43980465111040 | Warning threshold in bytes (80% of limit) |
| CriticalThresholdBytes | 49477772799360 | Critical threshold in bytes (90% of limit) |
| LambdaConcurrencyLimit | 50 | Reserved concurrent executions for Lambda function |
- EventBridge Rule: Filters AWS Backup RUNNING events
- Main Lambda Function: Processes events, gets DynamoDB table sizes, starts Step Functions
- Step Functions State Machine: Continuously monitors backup jobs and emits periodic metrics
- Metric Emitter Lambda: Emits CloudWatch metrics every 60 seconds during backup execution
- CloudWatch Metrics:
BackupJobSizeinAWS/BackupMonitoringnamespace - CloudWatch Alarms: Warning and critical thresholds with SNS notifications
- SNS Topic: Notification delivery
- IAM Roles: Scoped permissions for Lambda, Step Functions, DynamoDB and AWS Backup
- Event Capture: EventBridge rule captures AWS Backup "RUNNING" events
- Size Lookup: Lambda function calls DynamoDB DescribeTable API to get actual table size
- Continuous Monitoring: Step Function execution started for each backup job
- Periodic Metrics: Step Function emits metrics every 60 seconds while backup is RUNNING
- Graceful Completion: Step Function exits when backup job state ≠ RUNNING
- Aggregation: CloudWatch SUM statistic calculates total concurrent backup size
- Alerting: Alarms trigger when concurrent size exceeds thresholds
- Namespace:
AWS/BackupMonitoring - Metric:
BackupJobSize(Bytes) - Dimensions: Region, BackupType, VaultName
- Aggregation: SUM for total concurrent backup size
- Frequency: Every 60 seconds during backup execution
backup-monitoring.json- CloudFormation template
Deploying this solution will incur AWS charges. Costs depend on the number and frequency of your backup operations. Primary cost drivers include:
- Lambda function executions
- Step Functions state transitions
- CloudWatch metrics and alarms
- SNS notifications
- Currently supports DynamoDB backup monitoring only
- Requires backup jobs to generate AWS Backup events
- Table size lookup uses current table size (not backup size)
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.