Skip to content

Add Cloudwatch monitoring#214

Merged
liam-lloyd merged 3 commits intomainfrom
add_monitoring
Dec 17, 2025
Merged

Add Cloudwatch monitoring#214
liam-lloyd merged 3 commits intomainfrom
add_monitoring

Conversation

@liam-lloyd
Copy link
Member

We need to know when our Archivematica instances go down, so this commit adds Cloudwatch alarms in case they drop below the expected number of running pods.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CloudWatch monitoring for Archivematica instances across dev, staging, and production environments by installing the CloudWatch Observability add-on and creating alarms to detect when pod counts drop below expected thresholds. Additionally, it adds explicit namespace declarations to all Kubernetes resources and fixes a naming bug in the production ingress configuration.

  • Installs amazon-cloudwatch-observability EKS addon with proper IAM roles for Container Insights metrics
  • Creates CloudWatch alarms with SNS email notifications to alert when running pods fall below threshold of 7
  • Adds explicit kubernetes_namespace resources and references them throughout all deployments, services, and volume claims

Reviewed changes

Copilot reviewed 21 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
archivematica/test_cluster/variables.tf Adds alert_email variable with default email address; applies formatting alignment
archivematica/test_cluster/staging_redis_deployment.tf Adds namespace references to deployment, service, and PVC resources
archivematica/test_cluster/staging_ingress.tf Adds namespace references to all ingress resources
archivematica/test_cluster/staging_gearman_deployment.tf Adds namespace references to deployment and service resources
archivematica/test_cluster/staging_archivematica_deployment.tf Creates namespace resource, adds namespace references throughout, applies formatting alignment
archivematica/test_cluster/staging_alerts.tf Creates SNS topic, subscription, and CloudWatch alarm for staging environment
archivematica/test_cluster/secrets.tf Adds namespace references to secret resources
archivematica/test_cluster/load_balancer.tf Applies formatting alignment
archivematica/test_cluster/eks-cluster.tf Installs amazon-cloudwatch-observability addon, creates IAM role for CloudWatch access, applies formatting alignment
archivematica/test_cluster/dev_redis_deployment.tf Adds namespace references to deployment, service, and PVC resources
archivematica/test_cluster/dev_ingress.tf Adds namespace references to all ingress resources
archivematica/test_cluster/dev_gearman_deployment.tf Adds namespace references to deployment and service resources
archivematica/test_cluster/dev_archivematica_deployment.tf Creates namespace resource, adds namespace references throughout, applies formatting alignment
archivematica/test_cluster/dev_alerts.tf Creates SNS topic, subscription, and CloudWatch alarm for dev environment
archivematica/prod_cluster/variables.tf Adds alert_email variable with default email address
archivematica/prod_cluster/secrets.tf Adds namespace reference to secret resource; applies formatting alignment
archivematica/prod_cluster/redis_deployment.tf Adds namespace references to deployment, service, and PVC resources
archivematica/prod_cluster/ingress.tf Fixes naming bug (dev→prod), adds namespace references to all ingress resources
archivematica/prod_cluster/gearman_deployment.tf Adds namespace references to deployment and service resources
archivematica/prod_cluster/eks-cluster.tf Installs amazon-cloudwatch-observability addon, creates IAM role for CloudWatch access, applies formatting alignment
archivematica/prod_cluster/archivematica_deployment.tf Creates namespace resource (with ordering issue), adds namespace references throughout
archivematica/prod_cluster/alerting.tf Creates SNS topic, subscription, and CloudWatch alarm for prod environment
archivematica/prod_cluster/--addon-name Appears to be an accidentally committed empty file that should be removed

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1 to 14
data "kubernetes_resource" "archivematica_prod" {
kind = "Deployment"
api_version = "apps/v1"
metadata { name = "archivematica-prod" }
metadata {
name = "archivematica-prod"
namespace = kubernetes_namespace.archivematica_prod.metadata[0].name
}
}

resource "kubernetes_namespace" "archivematica_prod" {
metadata {
name = "archivematica-prod"
}
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data.kubernetes_resource.archivematica_prod data source is defined before the kubernetes_namespace.archivematica_prod resource it depends on. This creates incorrect resource ordering in Terraform. The namespace resource should be defined first (lines 10-14), followed by the data source (lines 1-8). This matches the correct pattern used in dev_archivematica_deployment.tf and staging_archivematica_deployment.tf where the namespace is created before being referenced.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a requirement functionally; the terraform runs fine as is, but I'll make the change for consistency across the environments

Copy link
Member

@cecilia-donnelly cecilia-donnelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

metadata {
name = "prod-storage-share"
name = "prod-storage-share"
namespace = kubernetes_namespace.archivematica_prod.metadata[0].name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does namespace give us, out of curiosity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monitor needs a namespace to exist so it can count the pods therein. Technically we could have told the prod monitor to look at the default namespace, but we need explicit ones in the test envs so I went with that in prod too.

Several terraform files aren't formatted correctly; this commit fixes
that.
Currently, deploys will fail if existing copies of our deployments can't
be found, even if values for all the images have been specified in
image_overrides. This commit updates the config to only require existing
image values when they're actually needed.
We need to know when our Archivematica instances go down, so this commit
adds Cloudwatch alarms in case they drop below the expected number of
running pods.
@liam-lloyd
Copy link
Member Author

Waiting to deploy this until #216 is merged, because I rebased this atop that to make troubleshooting the monkeypatching issues easier.

@liam-lloyd liam-lloyd merged commit ee27638 into main Dec 17, 2025
3 checks passed
@liam-lloyd liam-lloyd deleted the add_monitoring branch December 17, 2025 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants