Skip to content

feat: Smooth cross-track upgrades#305

Draft
MichaelThamm wants to merge 10 commits intomainfrom
test/lifecycle-input
Draft

feat: Smooth cross-track upgrades#305
MichaelThamm wants to merge 10 commits intomainfrom
test/lifecycle-input

Conversation

@MichaelThamm
Copy link
Copy Markdown
Contributor

@MichaelThamm MichaelThamm commented Apr 24, 2026

Blocked by:

Workaround for:

Blocks:

Relates to:

Issue

We want a smooth UX for users upgrading across tracks for our product modules: COS and COS Lite. Ideally, a user should be able to take their track/2 state and plan an upgrade to track/3.0 without any manual intervention.

Solution

  1. Use the juju_charm datasource to provide the latest revisions for all charms in their respective tracks.
  2. Use lifecycle {replace_triggered_by = [...]} to replace the Grafana ingress juju_integration in the event that the interface changes.
  3. Use Grafana's replace_triggers = [terraform_data.grafana_litestream_resource.id] to replace the Grafana juju_application in the event that the litestream-image resource was removed.

Important

I made a design decision to host all the upgrade logic in the product modules centralized in a upgrades.tf file. This is done by featuring the replace_triggers input into the Grafana TF module. In the future, other charm modules which require juju_application replacements will need this feature as well. This design allows both the charm and product modules to have full control over when the application gets replaced vs. just the charm module having that control.

To summarize, we will have this design:
image

instead of this one:
image

Warning

A cross-track upgrade that works today is not guaranteed to work tomorrow because the revisions are now refreshed in-track. Also, our users may be on any combination of revisions before they try to upgrade to track/3.0. This might be a good reason to have everyone terraform apply with a tested set of revision pins prior to upgrading tracks.

Checklist

  • PR title makes an appropriate release note and follows conventional commits syntax.
  • Merge target is the correct branch, and relevant tandem backport PRs opened.
  • Create a Juju-doctor probe which has a RuleSet defining all the expected integrations and apps for COS Lite and COS
  • Add tfa twice to itest to ensure that everything is applied prior to the juju-doctor probe. Otherwise, we should add the lifecycles to all resources
  • Update the README

Context

In the future charms will have unique tracks which the products needs to map to:

Testing Instructions

See this comment for details:

The general idea is to:

  1. Deploy COS 2/stable
  2. Update TF module source to dev/edge
  3. terraform init -upgrade; terraform apply

Documentation

Documentation will be addressed in:

Upgrade Notes

Since this will re-create the Grafana application, users will lose any custom configurations e.g., plugins that they have added to their Grafana. See the release notes for upgrading from track/2 to the following track/3.0

@MichaelThamm MichaelThamm changed the title feat: Charmhub module for upgrades feat: Cross-track upgrades Apr 24, 2026
@MichaelThamm MichaelThamm changed the title feat: Cross-track upgrades feat: Smooth cross-track upgrades Apr 24, 2026
@MichaelThamm
Copy link
Copy Markdown
Contributor Author

MichaelThamm commented Apr 24, 2026

Testing COS Lite from track/2 -> dev/edge

terraform {
  required_version = ">= 1.5"
  required_providers {
    juju = {
      source  = "juju/juju"
      version = "~> 1.0"
    }
  }
}

resource "juju_model" "cos-lite" {
  name = "cos-lite"
}

module "cos-lite" {
  source = "git::https://github.com/canonical/observability-stack//terraform/cos-lite?ref=track/2"
  channel      = "2/stable"
  model_uuid   = juju_model.cos-lite.uuid
  internal_tls = false
}
❯ tf init
❯ tf apply
Apply complete! Resources: 32 added, 0 changed, 0 destroyed.
image

then update the module and apply:

module "cos-lite" {
  source = "git::https://github.com/canonical/observability-stack//terraform/cos-lite?ref=test/lifecycle-input-track-2"
  channel      = "2/stable"
  model_uuid   = juju_model.cos-lite.uuid
  internal_tls = false
}
❯ tf init -upgrade
❯ tf apply
Terraform will perform the following actions:

  # module.cos-lite.terraform_data.grafana_ingress_interface will be created
  + resource "terraform_data" "grafana_ingress_interface" {
      + id               = (known after apply)
      + triggers_replace = "traefik_route"
    }

  # module.cos-lite.terraform_data.grafana_litestream_resource will be created
  + resource "terraform_data" "grafana_litestream_resource" {
      + id               = (known after apply)
      + triggers_replace = true
    }

  # module.cos-lite.module.grafana.terraform_data.app_replace_trigger will be created
  + resource "terraform_data" "app_replace_trigger" {
      + id               = (known after apply)
      + triggers_replace = []
    }

Plan: 3 to add, 0 to change, 0 to destroy.

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.

then update the module & risk and apply:

module "cos-lite" {
  source       = "git::https://github.com/canonical/observability-stack//terraform/cos-lite?ref=test/lifecycle-input"
  risk         = "edge"
  model_uuid   = juju_model.cos-lite.uuid
  internal_tls = false
}
❯ tf init -upgrade
❯ tf apply
Terraform will perform the following actions:

  # module.cos-lite.juju_integration.grafana_ingress[0] will be replaced due to changes in replace_triggered_by
  # (moved from module.cos-lite.juju_integration.grafana_ingress)
-/+ resource "juju_integration" "grafana_ingress" {
      ~ id         = "57288026-c98a-44f1-80e8-4ea2107eed93:traefik:traefik-route:grafana:ingress" -> (known after apply)
        # (1 unchanged attribute hidden)

      - application { # forces replacement
          - endpoint = "traefik-route" -> null
          - name     = "traefik" -> null
        }
      + application { # forces replacement
          + endpoint = "ingress"
          + name     = "traefik"
        }

        # (1 unchanged block hidden)
    }

  # module.cos-lite.juju_offer.grafana_dashboards will be replaced due to changes in replace_triggered_by
-/+ resource "juju_offer" "grafana_dashboards" {
      ~ id               = "admin/cos-lite.grafana-dashboards" -> (known after apply)
        name             = "grafana-dashboards"
      ~ url              = "admin/cos-lite.grafana-dashboards" -> (known after apply)
        # (3 unchanged attributes hidden)
    }

  # module.cos-lite.terraform_data.grafana_ingress_interface must be replaced
-/+ resource "terraform_data" "grafana_ingress_interface" {
      ~ id               = "d9992689-d99c-0072-04ae-3547fcf5dd6f" -> (known after apply)
      ~ triggers_replace = "traefik_route" -> "ingress"
    }

  # module.cos-lite.terraform_data.grafana_litestream_resource must be replaced
-/+ resource "terraform_data" "grafana_litestream_resource" {
      ~ id               = "1db520be-d508-1f9b-7519-84d8ccfa29a9" -> (known after apply)
      ~ triggers_replace = true -> false
    }

  # module.cos-lite.module.grafana.juju_application.grafana will be replaced due to changes in replace_triggered_by
-/+ resource "juju_application" "grafana" {
      ~ id                 = "57288026-c98a-44f1-80e8-4ea2107eed93:grafana" -> (known after apply)
      ~ machines           = [] -> (known after apply)
      ~ model_type         = "caas" -> (known after apply)
        name               = "grafana"
      ~ storage            = [
          - {
              - count = 1 -> null
              - label = "database-3" -> null
              - pool  = "kubernetes" -> null
              - size  = "1G" -> null
            },
        ] -> (known after apply)
        # (6 unchanged attributes hidden)

      ~ charm {
          ~ base     = "ubuntu@24.04" -> (known after apply)
          ~ channel  = "2/stable" -> "dev/edge"
            name     = "grafana-k8s"
          ~ revision = 180 -> 186
        }
    }

  # module.cos-lite.module.grafana.terraform_data.app_replace_trigger must be replaced
-/+ resource "terraform_data" "app_replace_trigger" {
      ~ id               = "530eb70d-5220-db2d-c247-c196d6db44a9" -> (known after apply)
      ~ triggers_replace = [
          + (known after apply),
        ]
    }

Plan: 10 to add, 5 to change, 10 to destroy.

Apply complete! Resources: 10 added, 5 changed, 10 destroyed.
image

Warning

Although COS is now active/idle, there are some resources missing (specifically all the Grafana resources) fixed with another tf apply. We can fix this by adding more lifecycle definitions for those specific resources.

terraform apply
Apply complete! Resources: 9 added, 0 changed, 0 destroyed.

@MichaelThamm MichaelThamm force-pushed the test/lifecycle-input branch from 7e48e9f to 213b505 Compare April 25, 2026 18:49
Comment thread terraform/cos-lite/applications.tf Outdated

module "grafana" {
source = "git::https://github.com/canonical/grafana-k8s-operator//terraform"
source = "git::https://github.com/canonical/grafana-k8s-operator//terraform?ref=test/lifecycle-input"
Copy link
Copy Markdown
Contributor Author

@MichaelThamm MichaelThamm Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert before merge

Comment thread terraform/cos-lite/offers.tf Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run terraform docs on the READMEs before merging

Comment thread terraform/cos/integrations.tf Outdated
Comment thread terraform/cos-lite/integrations.tf Outdated
Comment thread terraform/cos-lite/offers.tf Outdated
Comment thread terraform/cos-lite/upgrades.tf
Comment thread terraform/cos-lite/applications.tf
}

resource "juju_integration" "grafana_ingress" {
count = var.ingress["grafana"] ? 1 : 0
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if this works

Suggested change
count = var.ingress["grafana"] ? 1 : 0
count = var.ingress.grafana ? 1 : 0


module "cos" {
source = "git::https://github.com/canonical/observability-stack//terraform/cos?ref=track/2"
source = "git::https://github.com/canonical/observability-stack//terraform/cos?ref=test/lifecycle-input-track-2"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert and other itest branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants