Skip to content

CCM-14782: Lambda Alarms#205

Merged
jamesthompson26-nhs merged 4 commits into
mainfrom
feature/CCM-14782_Lambda_Alarms
May 21, 2026
Merged

CCM-14782: Lambda Alarms#205
jamesthompson26-nhs merged 4 commits into
mainfrom
feature/CCM-14782_Lambda_Alarms

Conversation

@jamesthompson26-nhs
Copy link
Copy Markdown
Contributor

@jamesthompson26-nhs jamesthompson26-nhs commented May 21, 2026

Description

Implemented a fuller CloudWatch alarm baseline for the shared Lambda Terraform module, including the original error-rate alarm plus additional reliability alarms.

What changed

  • Added Lambda error-rate alarm:

    • infrastructure/terraform/modules/lambda/cloudwatch_metric_alarm_lambda_error_rate.tf
  • Added Lambda throttles alarm:

    • infrastructure/terraform/modules/lambda/cloudwatch_metric_alarm_lambda_throttles.tf
  • Added Lambda duration percentile alarm (p95 default):

    • infrastructure/terraform/modules/lambda/cloudwatch_metric_alarm_lambda_duration.tf
  • Added Lambda DLQ visible messages alarm:

    • infrastructure/terraform/modules/lambda/cloudwatch_metric_alarm_lambda_dlq_messages.tf
  • Extended module inputs in:

    • infrastructure/terraform/modules/lambda/variables.tf

    New flags/configs:

    • enable_error_rate_alarm
    • lambda_error_rate_alarm_config
    • enable_throttles_alarm
    • lambda_throttles_alarm_config
    • enable_duration_alarm
    • lambda_duration_alarm_config
    • enable_dlq_messages_alarm
    • lambda_dlq_messages_alarm_config
  • Extended module outputs in:

    • infrastructure/terraform/modules/lambda/outputs.tf

    New outputs:

    • lambda_error_rate_alarm_name
    • lambda_error_rate_alarm_arn
    • lambda_throttles_alarm_name
    • lambda_throttles_alarm_arn
    • lambda_duration_alarm_name
    • lambda_duration_alarm_arn
    • lambda_dlq_messages_alarm_name
    • lambda_dlq_messages_alarm_arn
  • Updated module documentation in:

    • infrastructure/terraform/modules/lambda/README.md
    • Regenerated TF docs to include all new resources/inputs/outputs
    • Included pass-through output example for consumers

Alarm behavior summary

  • Error-rate alarm:
    • metric math expression: IF(m2>0,(m1/m2)*100,0) where:
      • m1 = Errors
      • m2 = Invocations
    • defaults: threshold 1, period 300, evaluation periods 1
  • Throttles alarm:
    • metric: Throttles
    • defaults: threshold 0, statistic Sum, period 300, evaluation periods 1
  • Duration alarm:
    • metric: Duration
    • default percentile: p95
    • threshold default: computed when unset (timeout * 800 ms)
  • DLQ messages alarm:
    • metric: ApproximateNumberOfMessagesVisible
    • defaults: threshold 0, statistic Sum
    • created only when both enable_dlq_and_notifications and enable_dlq_messages_alarm are true

Context

We recently added SQS alarms and needed equivalent, reusable monitoring in the shared Lambda module.
This PR introduces a consistent alarm baseline for Lambda reliability across consumers:

  • error-rate monitoring for invocation quality
  • throttling detection for concurrency pressure
  • duration percentile monitoring for latency regression
  • DLQ backlog detection for async failure handling

This reduces duplicated alarm logic in downstream repositories and helps enforce standard observability patterns.

Type of changes

  • Refactoring (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would change existing functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I am familiar with the contributing guidelines
  • I have followed the code style of the project
  • I have added tests to cover my changes
  • I have updated the documentation accordingly
  • This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

@jamesthompson26-nhs jamesthompson26-nhs requested a review from a team as a code owner May 21, 2026 11:33
@jamesthompson26-nhs jamesthompson26-nhs changed the title Feature/ccm 14782 lambda alarms CCM-14782: Lambda Alarms May 21, 2026
@jamesthompson26-nhs jamesthompson26-nhs merged commit 100d865 into main May 21, 2026
30 checks passed
@jamesthompson26-nhs jamesthompson26-nhs deleted the feature/CCM-14782_Lambda_Alarms branch May 21, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants