Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading from v0.17.0 canary to v0.18 + helm chart scales to infinity and beyond #427

Closed
Puneeth-n opened this issue Mar 31, 2021 · 26 comments

Comments

@Puneeth-n
Copy link
Contributor

2021-03-31T13:50:43.908Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "runner-controller", "request": "ci/comtravo-github-actions-deployment-w77qk-88mgw", "error": "Runner.actions.summerwind.dev \"comtravo-github-actions-deployment-w77qk-88mgw\" is invalid: [status.message: Required value, status.phase: Required value, status.reason: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
2021-03-31T13:50:44.918Z	ERROR	actions-runner-controller.runner	Failed to update runner status for Registration	{"runner": "comtravo-github-actions-deployment-hx2nb-4k8dd", "error": "Runner.actions.summerwind.dev \"comtravo-github-actions-deployment-hx2nb-4k8dd\" is invalid: [status.message: Required value, status.phase: Required value, status.reason: Required value]"}

Configuration

resource "helm_release" "actions" {
  name         = local.actions.name
  repository   = "https://summerwind.github.io/actions-runner-controller"
  chart        = "actions-runner-controller"
  version      = "0.10.4"
  namespace    = kubernetes_namespace.ci.metadata[0].name
  lint         = false
  reset_values = true

  values = [
    yamlencode(
      {
        # syncPeriod = "1m"
        authSecret = {
          enabled                    = true
          github_app_id              = data.aws_ssm_parameter.app_id.value
          github_app_installation_id = data.aws_ssm_parameter.app_installation_id.value
          github_app_private_key     = data.aws_ssm_parameter.app_private_key.value
        }
        nodeSelector = {
          "${var.eks_node_labels.on_demand.key}" = var.eks_node_labels.on_demand.value
        }

        image = {
          repository                  = "harbor.infra.foo.com/cache/summerwind/actions-runner-controller"
          tag                         = "v0.18.1"
          dindSidecarRepositoryAndTag = "harbor.infra.foo.com/cache/library/docker:dind"
          pullPolicy                  = "Always"
        }
        serviceAccount = {
          create = true
          name   = local.actions.service_account_name
          annotations = {
            "eks.amazonaws.com/role-arn" = aws_iam_role.actions.arn
          }
        }

        githubWebhookServer = {
          enabled      = true
          replicaCount = 1
          syncPeriod   = "10m"
          serviceAccount = {
            create = true
            name   = "actions-webhook-server"
          }
          service = {
            port = 80
          }
          secret = {
            create                      = true
            github_webhook_secret_token = random_password.github_webhook_secret_token.result
          }
          ingress = {
            enabled = true
            annotations = {
              "kubernetes.io/ingress.class"               = "alb"
              "alb.ingress.kubernetes.io/scheme"          = "internet-facing"
              "alb.ingress.kubernetes.io/listen-ports"    = jsonencode([{ HTTPS = 443 }])
              "alb.ingress.kubernetes.io/group.name"      = "infra-external"
              "alb.ingress.kubernetes.io/certificate-arn" = aws_acm_certificate.subdomain.arn
              "alb.ingress.kubernetes.io/target-type"     = "ip"
              "alb.ingress.kubernetes.io/success-codes"   = "200-499"
            }

            hosts = [
              {
                host = local.actions.hostname
                paths = [{
                  path = "/*"
                }]
              }
            ]
          }
        }
      }
    )
  ]
}

locals {
  actions_runner_deployment_config = <<EOF
kubectl --kubeconfig=${module.eks.kubeconfig_filename} apply -f - <<MANIFEST
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: foo-github-actions-deployment
  namespace: ${kubernetes_namespace.ci.metadata[0].name}
spec:
  template:
    spec:
      nodeSelector:
        ${var.eks_node_labels.spot.key}: ${var.eks_node_labels.spot.value}
      image: harbor.infra.foo.com/cache/foo/actions-runner:v2.277.1
      imagePullPolicy: Always
      repository: ${local.actions.git_repository}
      serviceAccountName: ${local.actions.service_account_name}
      securityContext:
        fsGroup: 1447
      resources:
        limits:
          cpu: "1"
          memory: "4Gi"
        requests:
          cpu: "1m"
          memory: "256Mi"
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: foo-github-actions-deployment-autoscaler
  namespace: ${kubernetes_namespace.ci.metadata[0].name}
spec:
  scaleTargetRef:
    name: foo-github-actions-deployment
  minReplicas: 4
  maxReplicas: 500
  metrics:
  - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
    repositoryNames:
      - summerwind/actions-runner-controller
  scaleUpTriggers:
  - githubEvent:
      checkRun:
        types: ["created"]
        status: "queued"
    amount: 1
    duration: "1m"
MANIFEST
EOF
}

resource "null_resource" "actions_runner_deployment" {
  provisioner "local-exec" {
    command = local.actions_runner_deployment_config
  }

  triggers = {
    "key" = sha256(local.actions_runner_deployment_config)
  }

  depends_on = [helm_release.actions]
}
@Puneeth-n
Copy link
Contributor Author

DIFF

@@ -117,7 +137,7 @@ resource "helm_release" "actions" {
   name         = local.actions.name
   repository   = "https://summerwind.github.io/actions-runner-controller"
   chart        = "actions-runner-controller"
-  version      = "0.7.0"
+  version      = "0.10.4"
   namespace    = kubernetes_namespace.ci.metadata[0].name
   lint         = false
   reset_values = true
@@ -138,7 +158,7 @@ resource "helm_release" "actions" {

         image = {
           repository                  = "harbor.infra.foo.com/cache/summerwind/actions-runner-controller"
-          tag                         = "canary"
+          tag                         = "v0.18.0"
           dindSidecarRepositoryAndTag = "harbor.infra.foo.com/cache/library/docker:dind"
           pullPolicy                  = "Always"
         }
@@ -205,7 +225,7 @@ spec:
     spec:
       nodeSelector:
         ${var.eks_node_labels.spot.key}: ${var.eks_node_labels.spot.value}
-      image: harbor.infra.foo.com/cache/foo/actions-runner:v2.276.1
+      image: harbor.infra.foo.com/cache/foo/actions-runner:v2.277.1
       imagePullPolicy: Always
       repository: ${local.actions.git_repository}
       serviceAccountName: ${local.actions.service_account_name}
@@ -228,7 +248,7 @@ spec:
   scaleTargetRef:
     name: foo-github-actions-deployment
   minReplicas: 4
-  maxReplicas: 100
+  maxReplicas: 500
   metrics:
   - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
     repositoryNames:
@@ -239,7 +259,7 @@ spec:
         types: ["created"]
         status: "queued"
     amount: 1
-    duration: "2m"
+    duration: "1m"
 MANIFEST
 EOF
 }

@Puneeth-n Puneeth-n changed the title status.message: Required value, status.phase: Required value, status.reason: Required value Upgrading from v0.17.0 canary to v0.18 + helm chart results in runners not coming up Mar 31, 2021
@Puneeth-n
Copy link
Contributor Author

something definitely with v0.18. I reverted everything back and just upgraded the chart to 0.10.4 and everything works. when I bump the controller image to v.0.18.0 and v.0.18.1 something goes haywire.

Is the helm chart compatible with v0.18.0?

Screen Shot 2021-03-31 at 4 39 21 PM
`

@Puneeth-n Puneeth-n changed the title Upgrading from v0.17.0 canary to v0.18 + helm chart results in runners not coming up Upgrading from v0.17.0 canary to v0.18 + helm chart scales to infinity and beyond Mar 31, 2021
@Puneeth-n
Copy link
Contributor Author

Screen Shot 2021-03-31 at 5 16 23 PM

@callum-tait-pbx
Copy link
Contributor

Haven't got a solution but I'm running 0.10.4 and 0.18.0 together fine, they are considered compatible but there is probably a bug in the app code

@Puneeth-n
Copy link
Contributor Author

The old canary I had deployed was 0.17 with github webhook server and health check route. is it possible for me to somehow get that docker image? now my CI is kinda broken

@callum-tait-pbx
Copy link
Contributor

The old canary I had deployed was 0.17 with github webhook server and health check route. is it possible for me to somehow get that docker image? now my CI is kinda broken

I don't think I quite get what you mean? The containers get published to Docker Hub so you should be able to just reference the old tag and it gets pulled down? https://hub.docker.com/r/summerwind/actions-runner-controller/tags That or pull a copy locally and push it up to your private registry if that is the source of your controllers?

@Puneeth-n
Copy link
Contributor Author

I had pointed it to a canary image. now since the canary image has moved on, I don't know what docker tag to freeze to.

I have a bigger issue to figure out. I uninstalled the helm chart, I am unable to delete runners . I am unable to delete the CRD runners.actions.summerwind.dev

Screen Shot 2021-03-31 at 9 45 30 PM

@Puneeth-n
Copy link
Contributor Author

Deleted the runners CRD finally with the below command

kubectl patch crd runners.actions.summerwind.dev -p '{"metadata":{"finalizers":[]}}' --type=merge

@Puneeth-n
Copy link
Contributor Author

@callum-tait-pbx so I deleted everything. The helm chart and the CRDs. When I install the helm chart again, and run k -n ci get runners, I get some 5000 runners. The pods are not running. Do you know how I could delete this "state" ?

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 31, 2021

@Puneeth-n Probably HRA status and other custom resources like RunnerDeployment and RunnerReplicaSet? Could you share us dump of all those custom resources except Runners?

I think the important point here is that you deleting CRDs doesn't automatically delete CRs (especially when you forced the deletion of it by manually removing the CRD finalized as I remember).

// FWIW, the cause of infinite scale does seem due to mismatch between CRD definition and the controller code. I'm afraid there's any ultimate way to prevent that.

@Puneeth-n
Copy link
Contributor Author

@mumoshu I deleted the helm chart and the corresponding runner autoscaler and runner deployment. I just have the CRDs now. Is there a way to reset and delete everything? PFB some remaining info

➜  infra git:(feature/upgrade-harbor-gha) ✗ k get crds
NAME                                                 CREATED AT
alertmanagers.monitoring.coreos.com                  2020-10-25T00:44:17Z
certificates.cert-manager.io                         2021-01-26T10:16:32Z
challenges.acme.cert-manager.io                      2021-01-26T10:16:35Z
clusterissuers.cert-manager.io                       2021-01-26T10:16:38Z
eniconfigs.crd.k8s.amazonaws.com                     2020-10-22T09:50:24Z
horizontalrunnerautoscalers.actions.summerwind.dev   2021-03-31T19:54:20Z
issuers.cert-manager.io                              2021-01-26T10:16:43Z
orders.acme.cert-manager.io                          2021-01-26T10:16:45Z
podmonitors.monitoring.coreos.com                    2020-10-25T00:44:17Z
probes.monitoring.coreos.com                         2020-10-25T00:44:18Z
prometheuses.monitoring.coreos.com                   2020-10-25T00:44:18Z
prometheusrules.monitoring.coreos.com                2020-10-25T00:44:19Z
runnerdeployments.actions.summerwind.dev             2021-03-31T19:54:20Z
runnerreplicasets.actions.summerwind.dev             2021-03-31T19:54:21Z
runners.actions.summerwind.dev                       2021-03-31T19:54:21Z
securitygrouppolicies.vpcresources.k8s.aws           2020-10-22T09:50:27Z
servicemonitors.monitoring.coreos.com                2020-10-25T00:44:19Z
targetgroupbindings.elbv2.k8s.aws                    2020-10-27T11:49:53Z
thanosrulers.monitoring.coreos.com                   2020-10-25T00:44:20Z
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci get all
NAME                                            READY   STATUS    RESTARTS   AGE
pod/harbor-harbor-core-5bb55885bb-rrfnf         1/1     Running   0          81m
pod/harbor-harbor-jobservice-79dbc67dfb-f75th   1/1     Running   0          81m
pod/harbor-harbor-portal-7569cdf587-shz5c       1/1     Running   0          4h2m
pod/harbor-harbor-registry-79cbfb6f4-542hm      2/2     Running   0          81m

NAME                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/harbor-harbor-core         NodePort    172.20.164.44    <none>        80:31718/TCP        4h16m
service/harbor-harbor-jobservice   ClusterIP   172.20.99.37     <none>        80/TCP              4h16m
service/harbor-harbor-portal       NodePort    172.20.130.238   <none>        80:30081/TCP        4h16m
service/harbor-harbor-registry     ClusterIP   172.20.0.119     <none>        5000/TCP,8080/TCP   4h16m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/harbor-harbor-core         1/1     1            1           4h16m
deployment.apps/harbor-harbor-jobservice   1/1     1            1           4h16m
deployment.apps/harbor-harbor-portal       1/1     1            1           4h16m
deployment.apps/harbor-harbor-registry     1/1     1            1           4h16m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/harbor-harbor-core-54f685cd87         0         0         0       4h2m
replicaset.apps/harbor-harbor-core-5bb55885bb         1         1         1       81m
replicaset.apps/harbor-harbor-core-66bbd5f59c         0         0         0       134m
replicaset.apps/harbor-harbor-core-66c6d694bc         0         0         0       4h16m
replicaset.apps/harbor-harbor-core-678bd5c54f         0         0         0       109m
replicaset.apps/harbor-harbor-core-6f8b5df58          0         0         0       94m
replicaset.apps/harbor-harbor-core-7dcd6cb455         0         0         0       4h9m
replicaset.apps/harbor-harbor-jobservice-57db7bd8bc   0         0         0       134m
replicaset.apps/harbor-harbor-jobservice-588d59b4b7   0         0         0       4h9m
replicaset.apps/harbor-harbor-jobservice-6d75857d58   0         0         0       4h16m
replicaset.apps/harbor-harbor-jobservice-764d678448   0         0         0       109m
replicaset.apps/harbor-harbor-jobservice-79dbc67dfb   1         1         1       81m
replicaset.apps/harbor-harbor-jobservice-7f457766b5   0         0         0       4h2m
replicaset.apps/harbor-harbor-jobservice-b94f95dcf    0         0         0       94m
replicaset.apps/harbor-harbor-portal-5486c7ff4        0         0         0       4h16m
replicaset.apps/harbor-harbor-portal-7569cdf587       1         1         1       4h2m
replicaset.apps/harbor-harbor-registry-56b7bb8447     0         0         0       109m
replicaset.apps/harbor-harbor-registry-6897b4866d     0         0         0       134m
replicaset.apps/harbor-harbor-registry-76d5bf9689     0         0         0       4h16m
replicaset.apps/harbor-harbor-registry-79cbfb6f4      1         1         1       81m
replicaset.apps/harbor-harbor-registry-7d64c666b4     0         0         0       4h2m
replicaset.apps/harbor-harbor-registry-8546f48b45     0         0         0       4h9m
replicaset.apps/harbor-harbor-registry-86d5665d94     0         0         0       94m

NAME                                                          SERVICE-NAME           SERVICE-PORT   TARGET-TYPE   AGE
targetgroupbinding.elbv2.k8s.aws/k8s-ci-harborha-15f9795950   harbor-harbor-core     80             ip            4h16m
targetgroupbinding.elbv2.k8s.aws/k8s-ci-harborha-84853f67e1   harbor-harbor-portal   80             ip            4h16m
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci get runners.actions.summerwind.dev | wc -l
    4675
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci describe runnerreplicasets.actions.summerwind.dev
No resources found in ci namespace.
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci describe runnerdeployments.actions.summerwind.dev
No resources found in ci namespace.
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci describe horizontalrunnerautoscalers.actions.summerwind.dev
No resources found in ci namespace.
k -n ci get runners

comtravo-github-actions-deployment-nvsln-zwbwb                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zwjcr                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zwjtf                               comtravo/ct-backend            Running
comtravo-github-actions-deployment-nvsln-zwlc4                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zwmcn                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zwnr4                               comtravo/ct-backend
comtravo-github-actions-deployment-nvsln-zwrp2                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zxccn                               comtravo/ct-backend            Running
comtravo-github-actions-deployment-nvsln-zxktv                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zxq7b                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zxxpb                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zzf6f                               comtravo/ct-backend            Running
comtravo-github-actions-deployment-nvsln-zzhcp                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zznsx                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zzqcx                               comtravo/ct-backend
comtravo-github-actions-deployment-nvsln-zzqm9                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-nvsln-zzqnn                               comtravo/ct-backend
comtravo-github-actions-deployment-nvsln-zztbm                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-rpbk6-268jz                               comtravo/ct-backend
comtravo-github-actions-deployment-rpbk6-8p4k2                               comtravo/ct-backend
comtravo-github-actions-deployment-rpbk6-n2xcf                               comtravo/ct-backend
comtravo-github-actions-deployment-rpbk6-v6j8t                               comtravo/ct-backend
comtravo-github-actions-deployment-vnqf7-7lk2p                               comtravo/ct-backend
comtravo-github-actions-deployment-vnqf7-ch4nl                               comtravo/ct-backend
comtravo-github-actions-deployment-vnqf7-g5zxk                               comtravo/ct-backend
comtravo-github-actions-deployment-vnqf7-gz8p5                               comtravo/ct-backend
comtravo-github-actions-deployment-zmn2v-4hr8k                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-zmn2v-b6ckh                               comtravo/ct-backend            Running
comtravo-github-actions-deployment-zmn2v-c8h9x                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-zmn2v-f4zgf                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-zmn2v-kbl85                               comtravo/ct-backend            Pending
comtravo-github-actions-deployment-zmn2v-lhczp                               comtravo/ct-backend            Running
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci describe runner comtravo-github-actions-deployment-nvsln-zwbwb
Name:         comtravo-github-actions-deployment-nvsln-zwbwb
Namespace:    ci
Labels:       runner-template-hash=58bb9549db
Annotations:  <none>
API Version:  actions.summerwind.dev/v1alpha1
Kind:         Runner
Metadata:
  Creation Timestamp:             2021-03-31T14:38:05Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2021-03-31T15:26:03Z
  Finalizers:
    runner.actions.summerwind.dev
  Generate Name:  comtravo-github-actions-deployment-nvsln-
  Generation:     2
  Managed Fields:
    API Version:  actions.summerwind.dev/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
        f:generateName:
        f:labels:
          .:
          f:runner-template-hash:
        f:ownerReferences:
      f:spec:
        .:
        f:dockerdContainerResources:
        f:image:
        f:imagePullPolicy:
        f:nodeSelector:
          .:
          f:node.k8s.comtravo.com/workergroup-name:
        f:repository:
        f:resources:
          .:
          f:limits:
            .:
            f:cpu:
            f:memory:
          f:requests:
            .:
            f:cpu:
            f:memory:
        f:securityContext:
          .:
          f:fsGroup:
        f:serviceAccountName:
      f:status:
        .:
        f:message:
        f:phase:
        f:reason:
        f:registration:
          .:
          f:expiresAt:
          f:repository:
          f:token:
    Manager:    manager
    Operation:  Update
    Time:       2021-03-31T15:05:47Z
  Owner References:
    API Version:           actions.summerwind.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  RunnerReplicaSet
    Name:                  comtravo-github-actions-deployment-nvsln
    UID:                   746143f7-de00-49bb-b175-f4301e92ee83
  Resource Version:        63455270
  Self Link:               /apis/actions.summerwind.dev/v1alpha1/namespaces/ci/runners/comtravo-github-actions-deployment-nvsln-zwbwb
  UID:                     81669d3a-0d25-462d-86c4-c80e222b6ae5
Spec:
  Dockerd Container Resources:
  Image:              harbor.infra.comtravo.com/cache/comtravo/actions-runner:v2.277.1
  Image Pull Policy:  Always
  Node Selector:
    node.k8s.comtravo.com/workergroup-name:  spot
  Repository:                                comtravo/ct-backend
  Resources:
    Limits:
      Cpu:     1
      Memory:  4Gi
    Requests:
      Cpu:     1m
      Memory:  256Mi
  Security Context:
    Fs Group:            1447
  Service Account Name:  actions
Status:
  Message:
  Phase:    Pending
  Reason:
  Registration:
    Expires At:  2021-03-31T15:42:07Z
    Repository:  comtravo/ct-backend
    Token:       ASS5GHPTBJ4LSRJGZ5CZJKLAMSME7AVPNFXHG5DBNRWGC5DJN5XF62LEZYANUGERWFUW443UMFWGYYLUNFXW4X3UPFYGLN2JNZ2GKZ3SMF2GS33OJFXHG5DBNRWGC5DJN5XA
Events:          <none>
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci describe runner comtravo-github-actions-deployment-rpbk6-n2xcf
Name:         comtravo-github-actions-deployment-rpbk6-n2xcf
Namespace:    ci
Labels:       runner-template-hash=f7f4d7b7d
Annotations:  <none>
API Version:  actions.summerwind.dev/v1alpha1
Kind:         Runner
Metadata:
  Creation Timestamp:             2021-03-31T15:41:59Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2021-03-31T19:24:21Z
  Finalizers:
    runner.actions.summerwind.dev
  Generate Name:  comtravo-github-actions-deployment-rpbk6-
  Generation:     2
  Managed Fields:
    API Version:  actions.summerwind.dev/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
        f:generateName:
        f:labels:
          .:
          f:runner-template-hash:
        f:ownerReferences:
      f:spec:
        .:
        f:dockerdContainerResources:
        f:image:
        f:imagePullPolicy:
        f:nodeSelector:
          .:
          f:node.k8s.comtravo.com/workergroup-name:
        f:repository:
        f:resources:
          .:
          f:limits:
            .:
            f:cpu:
            f:memory:
          f:requests:
            .:
            f:cpu:
            f:memory:
        f:securityContext:
          .:
          f:fsGroup:
        f:serviceAccountName:
      f:status:
        .:
        f:message:
        f:phase:
        f:reason:
        f:registration:
          .:
          f:expiresAt:
          f:token:
    Manager:    manager
    Operation:  Update
    Time:       2021-03-31T16:14:26Z
  Owner References:
    API Version:           actions.summerwind.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  RunnerReplicaSet
    Name:                  comtravo-github-actions-deployment-rpbk6
    UID:                   ec333cc7-c418-4aff-8f49-f5496ce55424
  Resource Version:        63547947
  Self Link:               /apis/actions.summerwind.dev/v1alpha1/namespaces/ci/runners/comtravo-github-actions-deployment-rpbk6-n2xcf
  UID:                     1792ebe8-6503-419b-ad53-7dd6e4641bc0
Spec:
  Dockerd Container Resources:
  Image:              comtravo/actions-runner:v2.277.1
  Image Pull Policy:  Always
  Node Selector:
    node.k8s.comtravo.com/workergroup-name:  spot
  Repository:                                comtravo/ct-backend
  Resources:
    Limits:
      Cpu:     1
      Memory:  4Gi
    Requests:
      Cpu:     1m
      Memory:  256Mi
  Security Context:
    Fs Group:            1447
  Service Account Name:  actions
Events:                  <none>
➜  infra git:(feature/upgrade-harbor-gha) ✗ k -n ci describe runner comtravo-github-actions-deployment-zmn2v-lhczp
Name:         comtravo-github-actions-deployment-zmn2v-lhczp
Namespace:    ci
Labels:       runner-deployment-name=comtravo-github-actions-deployment
              runner-template-hash=567488d8cf
Annotations:  <none>
API Version:  actions.summerwind.dev/v1alpha1
Kind:         Runner
Metadata:
  Creation Timestamp:             2021-03-31T14:35:12Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2021-03-31T15:30:50Z
  Finalizers:
    runner.actions.summerwind.dev
  Generate Name:  comtravo-github-actions-deployment-zmn2v-
  Generation:     2
  Managed Fields:
    API Version:  actions.summerwind.dev/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
        f:generateName:
        f:labels:
          .:
          f:runner-deployment-name:
          f:runner-template-hash:
        f:ownerReferences:
      f:spec:
        .:
        f:dockerdContainerResources:
        f:image:
        f:imagePullPolicy:
        f:nodeSelector:
          .:
          f:node.k8s.comtravo.com/workergroup-name:
        f:repository:
        f:resources:
          .:
          f:limits:
            .:
            f:cpu:
            f:memory:
          f:requests:
            .:
            f:cpu:
            f:memory:
        f:securityContext:
          .:
          f:fsGroup:
        f:serviceAccountName:
      f:status:
        .:
        f:message:
        f:phase:
        f:reason:
        f:registration:
          .:
          f:expiresAt:
          f:repository:
          f:token:
    Manager:    manager
    Operation:  Update
    Time:       2021-03-31T14:52:34Z
  Owner References:
    API Version:           actions.summerwind.dev/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  RunnerReplicaSet
    Name:                  comtravo-github-actions-deployment-zmn2v
    UID:                   c1b5a027-6aed-470b-8052-5a4a64688946
  Resource Version:        63460917
  Self Link:               /apis/actions.summerwind.dev/v1alpha1/namespaces/ci/runners/comtravo-github-actions-deployment-zmn2v-lhczp
  UID:                     a4b54a57-14d7-4303-af50-ddc1aeaf20b7
Spec:
  Dockerd Container Resources:
  Image:              harbor.infra.comtravo.com/cache/comtravo/actions-runner:v2.277.1
  Image Pull Policy:  Always
  Node Selector:
    node.k8s.comtravo.com/workergroup-name:  spot
  Repository:                                comtravo/ct-backend
  Resources:
    Limits:
      Cpu:     1
      Memory:  4Gi
    Requests:
      Cpu:     1m
      Memory:  256Mi
  Security Context:
    Fs Group:            1447
  Service Account Name:  actions
Status:
  Message:
  Phase:    Running
  Reason:
  Registration:
    Expires At:  2021-03-31T15:42:07Z
    Repository:  comtravo/ct-backend
    Token:       ASS5GHPTBJ4LSRJGZ5CZJKLAMSME7AVPNFXHG5DBNRWGC5DJN5XF62LEZYANUGERWFUW443UMFWGYYLUNFXW4X3UPFYGLN2JNZ2GKZ3SMF2GS33OJFXHG5DBNRWGC5DJN5XA
Events:          <none>

@Puneeth-n
Copy link
Contributor Author

@mumoshu k -n ci delete runners --all doesn't help. the number seems to remain the same

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 31, 2021

@Puneeth-n Hey! It seems like you already "isolated" runners so there's no way actions-runner-controller could take back the control of them- You'll see comtravo-github-actions-deployment-rpbk6 and comtravo-github-actions-deployment-zmn2v in owner references of Runners. Those are names of RunnerReplicaSets that does not exist anymore.

Probably the only way forward would be to remove runners.

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 31, 2021

k -n ci delete runners --all doesn't help. the number seems to remain the same

@Puneeth-n Which number? kubect get runner | wc -l ?

@Puneeth-n
Copy link
Contributor Author

Puneeth-n commented Mar 31, 2021

k -n ci delete runners --all doesn't help. the number seems to remain the same

@Puneeth-n Which number? kubect get runner | wc -l ?

@mumoshu after running k -n ci delete runners --all and checking after 5 mins, when I check if there are any runners, the count is still the same.

@mumoshu
Copy link
Collaborator

mumoshu commented Mar 31, 2021

@Puneeth-n That seems due to actions-runner-controller isn't deployed. Runners have the finalizer added by actions-runner-controller so to normally delete runners you need to run actions-runner-conntroller.

I think you could try kubectl delete --force runner $RUNNER and then removing the finalizer with kubectl-patch. Or either start actions-runner-conotroller and run kubectl delete runner $RUNNER.

If you used the former method, also note that you perhaps need to remove any remaining runners on GitHub-side with GitHub API or Web UI.

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Mar 31, 2021

Deleting runners can get a bit weird, you'll probably need to do this:

  1. Uninstall the controller chart
  2. Delete the pods
  3. Post directly to the k8s API to tell it to drop the finalizers for the runners

Roughly this:

  1. kubectl get runner %RUNNER_NAME% -o json > runner.json
  2. Find your finalizers in your JSON and remove the entries, they should end up looking something like this:
"finalizers": [
 
        ],
  1. Post your JSON directly to the k8s API
    kubectl replace --raw "/apis/actions.summerwind.dev/v1alpha1/namespaces/%YOUR_ACTIONS_NAMESPACE%/runners/%RUNNER_NAME%" -f ./runner.json

@Puneeth-n
Copy link
Contributor Author

@Puneeth-n That seems due to actions-runner-controller isn't deployed. Runners have the finalizer added by actions-runner-controller so to normally delete runners you need to run actions-runner-conntroller.

I think you could try kubectl delete --force runner $RUNNER and then removing the finalizer with kubectl-patch. Or either start actions-runner-conotroller and run kubectl delete runner $RUNNER.

If you used the former method, also note that you perhaps need to remove any remaining runners on GitHub-side with GitHub API or Web UI.

So I uninstalled the helm chart, deleted the CRDs and did a fresh install of ONLY the chart 0.10.4 and tag v0.18.1. I did not apply any deployments or horizontal runner autoscaler. Now I am deleting all the runners and it seems to work. The count of runners is decreasing.

I ran a script in loop to fetch all runners from GitHub and delete them. so I cleaned up on the GitHub side. I had like some 200 offline runners

@Puneeth-n
Copy link
Contributor Author

Excuse my lack of deep understanding of kubernetes :)

@Puneeth-n
Copy link
Contributor Author

Wanted to freeze the controller version as I had it pointing to canary #367. Ended up with this mess :/

@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Mar 31, 2021

In general don't run the canary image unless you know why you are running it (and you'll know why if so). It's an especially unstable development image as any push to master that isn't for the runners will trigger a build and a publish to Dockerhub. Version pinning is your friend my friend! :D

@Puneeth-n
Copy link
Contributor Author

In general don't run the canary image unless you know why you are running it (and you'll know why if so). It's an especially unstable development image as any push to master that isn't for the runners will trigger a build and a publish to Dockerhub. Version pinning is your friend my friend! :D

@callum-tait-pbx that was my intent behind this. #427 (comment) But ended up with this mess.

@Puneeth-n
Copy link
Contributor Author

@mumoshu @callum-tait-pbx so now, once all the runners are deleted, when I deploy RunnerDeployment and HorizontalRunnerAutoscaler everything should be back to normal right?

@callum-tait-pbx
Copy link
Contributor

Looks like it based on the output I can see in the issue and your comments. Your logs for your controller should look happy at this point if everything is working as expected, it's worth taking a peak at them before deploying your runner setup to confirm.

@Puneeth-n
Copy link
Contributor Author

Looks like it based on the output I can see in the issue and your comments. Your logs for your controller should look happy at this point if everything is working as expected, it's worth taking a peak at them before deploying your runner setup to confirm.

@callum-tait-pbx yep. controller looks good, webhook server looks good, runners came up and registered in GitHub. Awesome! :) Thanks guys!

@callum-tait-pbx
Copy link
Contributor

@Puneeth-n could you close the issue if it's all working please 🙏 , ty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants