Skip to content

ci: verify platform stack during bootstrap#22

Merged
vitramir merged 9 commits intomainfrom
noa/issue-21
Mar 3, 2026
Merged

ci: verify platform stack during bootstrap#22
vitramir merged 9 commits intomainfrom
noa/issue-21

Conversation

@casey-brooks
Copy link
Copy Markdown
Contributor

Summary

  • add concurrency limits, job timeout, kubeconfig exports, and kubectl/jq setup in bootstrap workflow
  • apply the platform stack with explicit TF_VAR values, wait for jobs/deployments/statefulsets, and verify platform pods
  • run optional Argo CD application health checks and ensure PR cleanups destroy platform -> system -> k8s

Testing

  • terraform fmt -check -recursive
  • terraform -chdir=stacks/k8s validate
  • terraform -chdir=stacks/system validate
  • terraform -chdir=stacks/platform validate
  • terraform -chdir=stacks/k8s apply -auto-approve -input=false
  • terraform -chdir=stacks/system apply -auto-approve -input=false
  • terraform -chdir=stacks/platform apply -auto-approve -input=false
  • bash /tmp/platform-health.sh
  • bash /tmp/argocd-health.sh
  • terraform -chdir=stacks/platform destroy -auto-approve -input=false
  • terraform -chdir=stacks/system destroy -auto-approve -input=false
  • terraform -chdir=stacks/k8s destroy -auto-approve -input=false

Fixes #21

@casey-brooks casey-brooks requested a review from a team as a code owner March 3, 2026 10:36
@casey-brooks
Copy link
Copy Markdown
Contributor Author

Test & Lint Summary

  • terraform fmt -check -recursive
  • terraform -chdir=stacks/k8s validate
  • terraform -chdir=stacks/system validate
  • terraform -chdir=stacks/platform validate
  • terraform -chdir=stacks/k8s apply -auto-approve -input=false
  • terraform -chdir=stacks/system apply -auto-approve -input=false
  • terraform -chdir=stacks/platform apply -auto-approve -input=false
  • bash /tmp/platform-health.sh
  • bash /tmp/argocd-health.sh
  • terraform -chdir=stacks/platform destroy -auto-approve -input=false
  • terraform -chdir=stacks/system destroy -auto-approve -input=false
  • terraform -chdir=stacks/k8s destroy -auto-approve -input=false

@casey-brooks
Copy link
Copy Markdown
Contributor Author

Test & Lint Summary (update)

  • ./actionlint .github/workflows/bootstrap.yml
  • /usr/local/bin/terraform fmt -check -recursive
  • /tmp/workflow-steps.sh (runs k3d bring-up, terraform init/apply/destroy for k8s/system/platform, platform & Argo CD health checks)

Copy link
Copy Markdown

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes after full review.

Blocking items:

  • [major] Concurrency group naming does not match the agreed issue specification (bootstrap-pr-<pr_number> for PRs, bootstrap-main for main). Please align this exactly to avoid unintended overlap behavior and keep workflow policy explicit.

Everything else in this PR looks aligned with the requested scope (platform apply, health checks, optional Argo verification, and PR-only reverse destroy).

Comment thread .github/workflows/bootstrap.yml Outdated
noa-lucent
noa-lucent previously approved these changes Mar 3, 2026
Copy link
Copy Markdown

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review complete. The previously requested concurrency fix is now implemented correctly with explicit PR/main groups, and the workflow remains aligned with issue #21 scope. Approving.

Comment thread .github/workflows/bootstrap.yml Outdated
Comment on lines +18 to +42
TF_VAR_kubeconfig_path: ${{ github.workspace }}/stacks/k8s/.kube/agyn-local-kubeconfig.yaml
TF_VAR_argocd_server_addr: localhost:8080
TF_VAR_argocd_admin_username: admin
TF_VAR_argocd_admin_password: admin
TF_VAR_platform_repo_url: https://github.com/agynio/platform.git
TF_VAR_platform_repo_username: ""
TF_VAR_platform_repo_password: ""
TF_VAR_platform_target_revision: v0.15.2
TF_VAR_platform_namespace: platform
TF_VAR_destination_server: https://kubernetes.default.svc
TF_VAR_platform_server_image_tag: 0.15.2
TF_VAR_docker_runner_image_tag: 0.15.2
TF_VAR_docker_runner_replica_count: "1"
TF_VAR_argocd_automated_sync_enabled: "true"
TF_VAR_argocd_prune_enabled: "true"
TF_VAR_argocd_self_heal_enabled: "true"
TF_VAR_platform_db_password: agents
TF_VAR_platform_db_pvc_size: 5Gi
TF_VAR_litellm_db_password: change-me
TF_VAR_litellm_db_pvc_size: 5Gi
TF_VAR_litellm_master_key: sk-dev-master
TF_VAR_litellm_salt_key: sk-dev-salt
TF_VAR_docker_runner_shared_secret: change-me
TF_VAR_vault_pvc_size: 5Gi
TF_VAR_registry_mirror_pvc_size: 5Gi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Comment thread .github/workflows/bootstrap.yml Outdated
runs-on: ubuntu-latest
timeout-minutes: 45
env:
KUBECONFIG: ${{ github.workspace }}/stacks/k8s/.kube/agyn-local-kubeconfig.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the relevant step

@casey-brooks
Copy link
Copy Markdown
Contributor Author

casey-brooks commented Mar 3, 2026

Local Validation

  • terraform fmt -check -recursive
  • ./actionlint .github/workflows/bootstrap.yml
  • terraform -chdir=stacks/k8s init -input=false
  • terraform -chdir=stacks/k8s apply -auto-approve -input=false
  • terraform -chdir=stacks/system init -input=false
  • terraform -chdir=stacks/system apply -auto-approve -input=false
  • terraform -chdir=stacks/platform init -input=false
  • terraform -chdir=stacks/platform apply -auto-approve -input=false
  • ./.github/scripts/verify_platform_health.sh
  • terraform -chdir=stacks/platform destroy -auto-approve -input=false
  • terraform -chdir=stacks/system destroy -auto-approve -input=false
  • terraform -chdir=stacks/k8s destroy -auto-approve -input=false

All commands completed successfully with no failures.

noa-lucent
noa-lucent previously approved these changes Mar 3, 2026
Copy link
Copy Markdown

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review complete. The workflow now uses the strict single health verification script and remains aligned with issue #21 requirements (trigger scope, stack apply order, explicit vars, health gating, PR-only reverse destroy, and concurrency policy). Approving.

Comment thread .github/workflows/bootstrap.yml Outdated
Comment on lines +17 to +39
TF_VAR_argocd_server_addr: localhost:8080
TF_VAR_argocd_admin_username: admin
TF_VAR_argocd_admin_password: admin
TF_VAR_platform_repo_url: https://github.com/agynio/platform.git
TF_VAR_platform_repo_username: ""
TF_VAR_platform_repo_password: ""
TF_VAR_platform_target_revision: v0.15.2
TF_VAR_platform_namespace: platform
TF_VAR_destination_server: https://kubernetes.default.svc
TF_VAR_platform_server_image_tag: 0.15.2
TF_VAR_docker_runner_image_tag: 0.15.2
TF_VAR_docker_runner_replica_count: "1"
TF_VAR_argocd_automated_sync_enabled: "true"
TF_VAR_argocd_prune_enabled: "true"
TF_VAR_argocd_self_heal_enabled: "true"
TF_VAR_platform_db_password: agents
TF_VAR_platform_db_pvc_size: 5Gi
TF_VAR_litellm_db_password: change-me
TF_VAR_litellm_db_pvc_size: 5Gi
TF_VAR_litellm_master_key: sk-dev-master
TF_VAR_litellm_salt_key: sk-dev-salt
TF_VAR_docker_runner_shared_secret: change-me
TF_VAR_vault_pvc_size: 5Gi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete. apply should work without any additional variables provided

Comment thread .github/workflows/bootstrap.yml Outdated
Comment on lines +70 to +71
echo "KUBECONFIG=${{ github.workspace }}/stacks/k8s/.kube/agyn-local-kubeconfig.yaml" >> "$GITHUB_ENV"
echo "TF_VAR_kubeconfig_path=${{ github.workspace }}/stacks/k8s/.kube/agyn-local-kubeconfig.yaml" >> "$GITHUB_ENV"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete. apply should work without additional variables

Copy link
Copy Markdown

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review complete.\n\nRequesting changes due to one blocking regression:\n- [major] Workflow no longer sets the explicit platform TF_VAR values required by issue #21. The issue/spec explicitly asked to pass these in CI even when defaults exist. Please restore them so behavior remains explicit and requirement-compliant.\n\nEverything else reviewed in this update looks fine.

Comment thread .github/workflows/bootstrap.yml
@rowan-stein
Copy link
Copy Markdown
Collaborator

Maintainer request incorporated: workflow now relies solely on Terraform defaults and removes TF_VAR overrides. This updates the acceptance criteria in Issue #21 (see linked comment) — apply should work out of the box with no additional variables.

Requesting re-review based on updated criteria.

noa-lucent
noa-lucent previously approved these changes Mar 3, 2026
Copy link
Copy Markdown

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review complete against the revised issue #21 criteria (documented in the maintainer comment). The workflow now correctly relies on Terraform defaults without TF_VAR overrides, while still meeting the required trigger scope, tool setup, apply order, health verification, PR cleanup, timeout, and concurrency behavior. Approving.

Comment thread .github/workflows/bootstrap.yml Outdated
Comment on lines +44 to +47
- name: Export kubeconfig for kubectl
run: |
echo "KUBECONFIG=${{ github.workspace }}/stacks/k8s/.kube/agyn-local-kubeconfig.yaml" >> "$GITHUB_ENV"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be used only inside health check. The rest of the flow should work without any extra env variables!

@casey-brooks
Copy link
Copy Markdown
Contributor Author

CI run https://github.com/agynio/bootstrap_v2/actions/runs/22628550932 finished green after removing all workflow env exports. @rowan-stein this is ready for re-review.

@vitramir vitramir merged commit f6ba5cc into main Mar 3, 2026
1 check passed
casey-brooks added a commit that referenced this pull request Mar 4, 2026
* ci: expand bootstrap pipeline

* ci: fix concurrency format

* ci: simplify concurrency group

* ci: align concurrency naming

* ci: fix platform rollout loop

* ci: enforce issue-21 concurrency

* feat(ci): harden bootstrap health checks

* fix(ci): rely on terraform defaults

* fix(ci): drop workflow env exports
rowan-stein added a commit that referenced this pull request Mar 31, 2026
Gateway: fix(ziti) managed identity resolution (#123)
Agents: feat(authz) agent org membership tuples (#30)
LLM-Proxy: fix(identity) managed identity parsing (#22)
vitramir pushed a commit that referenced this pull request Mar 31, 2026
#203)

Gateway: fix(ziti) managed identity resolution (#123)
Agents: feat(authz) agent org membership tuples (#30)
LLM-Proxy: fix(identity) managed identity parsing (#22)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CI to apply k8s, system, and platform stacks; verify platform services health on PR and main

4 participants