Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitleaksignore
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the phase 2 work but I think we can delete lines 3-6 - those secrets have never been in our repo at any point

Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ cd9c0efec38c5d63053dd865e5d4e207c0760d91:docs/guides/Perform_static_analysis.md:
cd9c0efec38c5d63053dd865e5d4e207c0760d91:docs/guides/Perform_static_analysis.md:sonar-api-token:37
96096685ab3d6876671e2bc9a6ff4d48fc56e521:src/helloworld/helloworld.sln:ipv4:4
4f4e8c15629b2cb09356a7fed4d72953590227ce:docs/Gemfile.lock:ipv4:4
231b9cb259d92c3defc27de00a4196682d11c231:lambdas/https-client-lambda/src/__tests__/tls-agent-factory.test.ts:private-key:49
3 changes: 2 additions & 1 deletion eslint.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ export default defineConfig([
"**/test-results",
"**/playwright-report*",
"eslint.config.mjs",
"**/lua-transform.js",
]),

//imports
Expand Down Expand Up @@ -200,7 +201,7 @@ export default defineConfig([
},
},
{
files: ["**/utils/**", "tests/test-team/**", "tests/performance/helpers/**", "lambdas/**/src/**"],
files: ["**/utils/**", "tests/test-team/**", "tests/performance/helpers/**", "lambdas/**/src/**", "src/**/src/**"],
rules: {
"import-x/prefer-default-export": 0,
},
Expand Down
10 changes: 8 additions & 2 deletions infrastructure/terraform/components/callbacks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
| <a name="input_default_tags"></a> [default\_tags](#input\_default\_tags) | A map of default tags to apply to all taggable resources within the component | `map(string)` | `{}` | no |
| <a name="input_deploy_mock_clients"></a> [deploy\_mock\_clients](#input\_deploy\_mock\_clients) | Flag to deploy mock webhook lambda for integration testing (test/dev environments only) | `bool` | `false` | no |
| <a name="input_enable_event_anomaly_detection"></a> [enable\_event\_anomaly\_detection](#input\_enable\_event\_anomaly\_detection) | Enable CloudWatch anomaly detection alarm for inbound event queue message reception | `bool` | `true` | no |
| <a name="input_enable_xray_tracing"></a> [enable\_xray\_tracing](#input\_enable\_xray\_tracing) | Enable AWS X-Ray active tracing for Lambda functions | `bool` | `false` | no |
| <a name="input_enable_xray_tracing"></a> [enable\_xray\_tracing](#input\_enable\_xray\_tracing) | Enable AWS X-Ray active tracing for Lambda functions | `bool` | `true` | no |
| <a name="input_environment"></a> [environment](#input\_environment) | The name of the tfscaffold environment | `string` | n/a | yes |
| <a name="input_event_anomaly_band_width"></a> [event\_anomaly\_band\_width](#input\_event\_anomaly\_band\_width) | The width of the anomaly detection band. Higher values (e.g. 4-6) reduce sensitivity and noise, lower values (e.g. 2-3) increase sensitivity. Recommended: 2-4. | `number` | `3` | no |
| <a name="input_event_anomaly_evaluation_periods"></a> [event\_anomaly\_evaluation\_periods](#input\_event\_anomaly\_evaluation\_periods) | Number of evaluation periods for the anomaly alarm. Each period is defined by event\_anomaly\_period. | `number` | `2` | no |
Expand All @@ -30,6 +30,12 @@
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | The log level to be used in lambda functions within the component. Any log with a lower severity than the configured value will not be logged: https://docs.python.org/3/library/logging.html#levels | `string` | `"INFO"` | no |
| <a name="input_log_retention_in_days"></a> [log\_retention\_in\_days](#input\_log\_retention\_in\_days) | The retention period in days for the Cloudwatch Logs events to be retained, default of 0 is indefinite | `number` | `0` | no |
| <a name="input_message_root_uri"></a> [message\_root\_uri](#input\_message\_root\_uri) | The root URI used for constructing message links in callback payloads | `string` | n/a | yes |
| <a name="input_mtls_cert_secret_arn"></a> [mtls\_cert\_secret\_arn](#input\_mtls\_cert\_secret\_arn) | Secrets Manager ARN for the shared mTLS client certificate (production) | `string` | `""` | no |
| <a name="input_mtls_mock_server_cert_s3_key"></a> [mtls\_mock\_server\_cert\_s3\_key](#input\_mtls\_mock\_server\_cert\_s3\_key) | S3 key for the mock webhook server certificate PEM (signed by the test CA) | `string` | `""` | no |
| <a name="input_mtls_mock_server_key_s3_key"></a> [mtls\_mock\_server\_key\_s3\_key](#input\_mtls\_mock\_server\_key\_s3\_key) | S3 key for the mock webhook server private key PEM | `string` | `""` | no |
| <a name="input_mtls_test_ca_s3_key"></a> [mtls\_test\_ca\_s3\_key](#input\_mtls\_test\_ca\_s3\_key) | S3 key for the test CA certificate PEM bundle used for server verification and the mock webhook server cert chain | `string` | `""` | no |
| <a name="input_mtls_test_cert_s3_key"></a> [mtls\_test\_cert\_s3\_key](#input\_mtls\_test\_cert\_s3\_key) | S3 key for the test mTLS client certificate bundle (non-production) | `string` | `""` | no |
| <a name="input_mtls_test_certs_s3_bucket"></a> [mtls\_test\_certs\_s3\_bucket](#input\_mtls\_test\_certs\_s3\_bucket) | S3 bucket containing test mTLS certificate material (non-production) | `string` | `""` | no |
| <a name="input_parent_acct_environment"></a> [parent\_acct\_environment](#input\_parent\_acct\_environment) | Name of the environment responsible for the acct resources used, affects things like DNS zone. Useful for named dev environments | `string` | `"main"` | no |
| <a name="input_pipe_event_patterns"></a> [pipe\_event\_patterns](#input\_pipe\_event\_patterns) | value | `list(string)` | `[]` | no |
| <a name="input_pipe_log_level"></a> [pipe\_log\_level](#input\_pipe\_log\_level) | Log level for the EventBridge Pipe. | `string` | `"ERROR"` | no |
Expand All @@ -45,7 +51,7 @@
| Name | Source | Version |
|------|--------|---------|
| <a name="module_client_config_bucket"></a> [client\_config\_bucket](#module\_client\_config\_bucket) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/3.0.7/terraform-s3bucket.zip | n/a |
| <a name="module_client_destination"></a> [client\_destination](#module\_client\_destination) | ../../modules/client-destination | n/a |
| <a name="module_client_delivery"></a> [client\_delivery](#module\_client\_delivery) | ../../modules/client-delivery | n/a |
| <a name="module_client_transform_filter_lambda"></a> [client\_transform\_filter\_lambda](#module\_client\_transform\_filter\_lambda) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/3.0.7/terraform-lambda.zip | n/a |
| <a name="module_kms"></a> [kms](#module\_kms) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/3.0.7/terraform-kms.zip | n/a |
| <a name="module_mock_webhook_lambda"></a> [mock\_webhook\_lambda](#module\_mock\_webhook\_lambda) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/3.0.7/terraform-lambda.zip | n/a |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,9 @@ resource "aws_cloudwatch_event_bus" "main" {
name = local.csi
kms_key_identifier = module.kms.key_arn
}

resource "aws_cloudwatch_event_archive" "main" {
name = "${local.csi}-archive"
event_source_arn = aws_cloudwatch_event_bus.main.arn
retention_days = 7
}

This file was deleted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want an alarm on storage - e.g. 80% used

Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
resource "aws_elasticache_serverless_cache" "delivery_state" {
name = "${local.csi}-delivery-state"
engine = "redis"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cost saving I think we should switch to valkey.
You are billed in gigabyte-hours (GB-hrs) and the minimum for redis is 1GB vs 100mb in valkey.
Not sure we'll go above 100mb even in prod.
https://aws.amazon.com/elasticache/pricing/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting on agreement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valkey is redis compatible, is half the cost and faster. Can we use that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting on agreement.

major_engine_version = "7"
description = "Per-target rate limiting and circuit breaker state for callback delivery"

snapshot_retention_limit = 0

security_group_ids = [aws_security_group.elasticache_delivery_state.id]
subnet_ids = local.acct.private_subnet_ids

kms_key_id = module.kms.key_arn

cache_usage_limits {
data_storage {
maximum = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1GB is the minimum with redis but can go down to 100mb if we make the valkey switch.
Keeping this low for dev/test environments is good for cost saving.
We should see how much each client will take in storage.

unit = "GB"
}

ecpu_per_second {
maximum = 1000
}
}

tags = merge(
local.default_tags,
{
Name = "${local.csi}-delivery-state"
Description = "Callback delivery rate limiter and circuit breaker state"
},
)
}

resource "aws_security_group" "elasticache_delivery_state" {
name = "${local.csi}-elasticache-delivery-state"
description = "Security group for ElastiCache delivery state cluster"
vpc_id = local.acct.vpc_id

tags = merge(
local.default_tags,
{
Name = "${local.csi}-elasticache-delivery-state"
},
)
}

resource "aws_vpc_security_group_ingress_rule" "elasticache_from_lambda" {
security_group_id = aws_security_group.elasticache_delivery_state.id
referenced_security_group_id = aws_security_group.https_client_lambda.id
from_port = 6379
to_port = 6379
ip_protocol = "tcp"
description = "Allow HTTPS Client Lambda to connect to ElastiCache"

tags = local.default_tags
}

resource "aws_security_group" "https_client_lambda" {
name = "${local.csi}-https-client-lambda"
description = "Security group for per-client HTTPS Client Lambda functions"
vpc_id = local.acct.vpc_id

tags = merge(
local.default_tags,
{
Name = "${local.csi}-https-client-lambda"
},
)
}

resource "aws_vpc_security_group_egress_rule" "lambda_to_elasticache" {
security_group_id = aws_security_group.https_client_lambda.id
referenced_security_group_id = aws_security_group.elasticache_delivery_state.id
from_port = 6379
to_port = 6379
ip_protocol = "tcp"
description = "Allow Lambda to connect to ElastiCache"

tags = local.default_tags
}

resource "aws_vpc_security_group_egress_rule" "lambda_to_https" {
security_group_id = aws_security_group.https_client_lambda.id
cidr_ipv4 = "0.0.0.0/0"
from_port = 443
to_port = 443
ip_protocol = "tcp"
description = "Allow Lambda outbound HTTPS for webhook delivery"

tags = local.default_tags
}

resource "aws_cloudwatch_metric_alarm" "elasticache_ecpu_utilisation" {
alarm_name = "${local.csi}-elasticache-ecpu-utilisation"
alarm_description = join(" ", [
"PERFORMANCE: ElastiCache processing units utilisation is high.",
"Consider scaling up or optimising Redis commands.",
])

comparison_operator = "GreaterThanThreshold"
evaluation_periods = 3
metric_name = "ElastiCacheProcessingUnits"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 80
actions_enabled = true
treat_missing_data = "notBreaching"

dimensions = {
CacheClusterId = aws_elasticache_serverless_cache.delivery_state.name
}

tags = merge(
local.default_tags,
{
Name = "${local.csi}-elasticache-ecpu-utilisation"
},
)
}

resource "aws_cloudwatch_metric_alarm" "elasticache_connections" {
alarm_name = "${local.csi}-elasticache-connections"
alarm_description = join(" ", [
"RELIABILITY: ElastiCache connection count is high.",
"Review per-client Lambda connection pool sizing.",
])

comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CurrConnections"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Maximum"
threshold = 500
actions_enabled = true
treat_missing_data = "notBreaching"

dimensions = {
CacheClusterId = aws_elasticache_serverless_cache.delivery_state.name
}

tags = merge(
local.default_tags,
{
Name = "${local.csi}-elasticache-connections"
},
)
}

resource "aws_cloudwatch_metric_alarm" "elasticache_throttled_ops" {
alarm_name = "${local.csi}-elasticache-throttled-ops"
alarm_description = join(" ", [
"PERFORMANCE: ElastiCache throttled operations detected.",
"Increase ECPU limit or reduce request rate.",
])

comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "ThrottledCmds"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Sum"
threshold = 0
actions_enabled = true
treat_missing_data = "notBreaching"

dimensions = {
CacheClusterId = aws_elasticache_serverless_cache.delivery_state.name
}

tags = merge(
local.default_tags,
{
Name = "${local.csi}-elasticache-throttled-ops"
},
)
}
48 changes: 19 additions & 29 deletions infrastructure/terraform/components/callbacks/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -20,47 +20,37 @@ locals {
targets = [
for target in try(client.targets, []) :
merge(target, {
invocationEndpoint = "${aws_lambda_function_url.mock_webhook[0].function_url}${target.targetId}"
invocationEndpoint = try(target.mtls.enabled, false) ? "https://${aws_lb.mock_webhook_mtls[0].dns_name}/${target.targetId}" : "${aws_lambda_function_url.mock_webhook[0].function_url}${target.targetId}"
apiKey = merge(target.apiKey, { headerValue = random_password.mock_webhook_api_key[0].result })
})
]
})
} : local.config_clients


config_targets = merge([
for client_id, data in local.config_clients : {
for target in try(data.targets, []) : target.targetId => {
client_id = client_id
target_id = target.targetId
invocation_endpoint = var.deploy_mock_clients ? "${aws_lambda_function_url.mock_webhook[0].function_url}${target.targetId}" : target.invocationEndpoint
invocation_rate_limit_per_second = target.invocationRateLimit
http_method = target.invocationMethod
header_name = target.apiKey.headerName
header_value = var.deploy_mock_clients ? random_password.mock_webhook_api_key[0].result : target.apiKey.headerValue
}
}
]...)

config_subscriptions = merge([
for client_id, data in local.config_clients : {
for subscription in try(data.subscriptions, []) : subscription.subscriptionId => {
client_id = client_id
client_subscriptions = {
for client_id, data in local.config_clients :
client_id => {
for subscription in try(data.subscriptions, []) :
subscription.subscriptionId => {
subscription_id = subscription.subscriptionId
target_ids = try(subscription.targetIds, [])
}
}
]...)

subscription_targets = merge([
for subscription_id, subscription in local.config_subscriptions : {
for target_id in subscription.target_ids :
"${subscription_id}-${target_id}" => {
subscription_id = subscription_id
target_id = target_id
}

client_subscription_targets = {
for client_id, data in local.config_clients :
client_id => merge([
for subscription in try(data.subscriptions, []) : {
for target_id in try(subscription.targetIds, []) :
"${subscription.subscriptionId}-${target_id}" => {
subscription_id = subscription.subscriptionId
target_id = target_id
}
}
}
]...)
]...)
}

applications_map_parameter_name = coalesce(var.applications_map_parameter_name, "/${var.project}/${var.environment}/${var.component}/applications-map")
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
module "client_delivery" {
source = "../../modules/client-delivery"
for_each = local.config_clients

project = var.project
aws_account_id = var.aws_account_id
region = var.region
component = var.component
environment = var.environment
group = var.group

client_id = each.key
client_bus_name = aws_cloudwatch_event_bus.main.name
kms_key_arn = module.kms.key_arn

subscriptions = local.client_subscriptions[each.key]
subscription_targets = local.client_subscription_targets[each.key]

client_config_bucket = module.client_config_bucket.bucket
client_config_bucket_arn = module.client_config_bucket.arn

applications_map_parameter_name = local.applications_map_parameter_name

lambda_s3_bucket = local.acct.s3_buckets["lambda_function_artefacts"]["id"]
lambda_code_base_path = local.aws_lambda_functions_dir_path

force_lambda_code_deploy = var.force_lambda_code_deploy
log_level = var.log_level
log_retention_in_days = var.log_retention_in_days
enable_xray_tracing = var.enable_xray_tracing

log_destination_arn = local.log_destination_arn
log_subscription_role_arn = local.acct.log_subscription_role_arn

elasticache_endpoint = aws_elasticache_serverless_cache.delivery_state.endpoint[0].address
elasticache_cache_name = aws_elasticache_serverless_cache.delivery_state.name
elasticache_iam_username = "${var.project}-${var.environment}-${var.component}-elasticache-user"

mtls_cert_secret_arn = var.mtls_cert_secret_arn
mtls_test_cert_s3_bucket = var.mtls_test_certs_s3_bucket
mtls_test_cert_s3_key = var.mtls_test_cert_s3_key

vpc_subnet_ids = local.acct.private_subnet_ids
lambda_security_group_id = aws_security_group.https_client_lambda.id

deploy_mock_clients = var.deploy_mock_clients
}
Loading
Loading