Skip to content

gha_job_execution_duration_seconds_sum reports wrong value in some cases #3731

@hpedrorodrigues

Description

@hpedrorodrigues

Checks

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Install `gha-runner-scale-set-controller` using the Helm chart via FluxCD
2. Install a few `gha-runner-scale-set`s using the Helm chart via FluxCD
3. Run a few workflows to use these runner sets (including canceling a few of them / either manually or due to `concurrency.group`)

Describe the bug

In a few cases (don't know exact reason yet) the listener reports the metric gha_job_execution_duration_seconds_sum with a wrong value.

Example:

gha_job_execution_duration_seconds_sum{enterprise="",event_name="repository_dispatch",job_name="create-gh-deployment",job_result="canceled",job_workflow_ref="[redacted]/.github/workflows/gh-deployment.yml@refs/heads/master",organization="[redacted]",repository="[redacted]",runner_id="0",runner_name=""} 1.27722295721e+11

Looking at the repository, all runs take less than 60 seconds to finish. The other ones are canceled even before starting because the branch has a new commit.

Screenshot 2024-09-05 at 14 26 34 Screenshot 2024-09-05 at 14 26 50

Describe the expected behavior

Not sure if this is caused only by canceled runs, but I'd expect the listener to return 0 for such runs.

Additional Context

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: arc-controller
  namespace: arc
spec:
  chart:
    spec:
      chart: gha-runner-scale-set-controller
      sourceRef:
        name: arc
        kind: HelmRepository
        namespace: flux-system
      version: '>=0.9.3'
  interval: 1m
  install:
    crds: CreateReplace
  upgrade:
    crds: CreateReplace
  values:
    replicaCount: 1
    image:
      repository: [redacted]
    serviceAccount:
      create: true
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 200m
        memory: 200Mi
    metrics:
      controllerManagerAddr: ':8080'
      listenerAddr: ':8080'
      listenerEndpoint: '/metrics'
    flags:
      logFormat: 'json'
      watchSingleNamespace: 'arc'
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: cp-small-runner-set
  namespace: arc
spec:
  chart:
    spec:
      chart: gha-runner-scale-set
      sourceRef:
        name: arc
        kind: HelmRepository
        namespace: flux-system
      version: '>=0.9.3'
  interval: 1m
  values:
    githubConfigUrl: [redacted]
    githubConfigSecret: gh-app-secret
    maxRunners: 10
    minRunners: 0
    runnerGroup: default
    runnerScaleSetName: cp-small
    containerMode:
      type: dind
    template:
      metadata:
        annotations:
          cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
      spec:
        nodeSelector:
          spot: 'false'
          dedicated-for: github-actions
        tolerations:
          - effect: NoSchedule
            key: dedicated-for
            value: github-actions-2x
        containers:
          - name: runner
            image: arc-default-runner
            command: ['/home/runner/run.sh']
            resources:
              requests:
                cpu: 2
                memory: 4Gi
              limits:
                cpu: 2
                memory: 4Gi
        terminationGracePeriodSeconds: 600

Controller Logs

N/A

Runner Pod Logs

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions