Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task run restarts processing of failed execution after completionTime is set. #7895

Open
rinckm opened this issue Apr 22, 2024 · 0 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@rinckm
Copy link
Contributor

rinckm commented Apr 22, 2024

Expected Behavior

If a task run has status.completionTime set and has a success condition with status: "False" this task run is finished and not changing anymore.

Actual Behavior

Sporadically a task run updates the success condition with message build failed for unspecified reasons. and a completionTime is set.
After some time the message changes to Pending and completionTime is not set anymore.

This occurs with an unmodified tekton version v0.57.0

Steps to Reproduce the Problem

  1. Change tekton-pipelines-controller to force task run to stay in init state
    For this change the args shell-image value to an invalid image:
    containers:
       - args:
         ...
         - -shell-image
         - non-existent-image@sha256:0000000000000000000000000000000000000000000000000000000000000000
    
  2. Start a taks run and wait until 'prepare' container completes
  3. Delete the pod started by the task run

Additional Info

  • Kubernetes version:

    Output of kubectl version:

    Client Version: v1.28.4
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.28.6
    
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

v0.57.0

This is the example of a single TaskRun

Output from kubectl get taskrun -o yaml --watch

Initial state after creation

apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
  creationTimestamp: "2024-04-18T13:32:28Z"
  ...
  uid: 6cc3e733-8766-42b8-b5cf-2fb6a9376d86
status:
  conditions:
  - lastTransitionTime: "2024-04-18T13:32:33Z"
    message: 'pod status "Initialized":"False"; message: "containers with incomplete
      status: [place-scripts]"'
    reason: Pending
    status: Unknown
    type: Succeeded
  podName: steward-jenkinsfile-runner-pod
  provenance:
    featureFlags:
      AwaitSidecarReadiness: true
      Coschedule: workspaces
      DisableAffinityAssistant: false
      DisableCredsInit: false
      EnableAPIFields: beta
      EnableCELInWhenExpression: false
      EnableKeepPodOnCancel: false
      EnableParamEnum: false
      EnableProvenanceInStatus: true
      EnableStepActions: false
      EnableTektonOCIBundles: false
      EnforceNonfalsifiability: none
      MaxResultSize: 4096
      RequireGitSSHSecretKnownHosts: false
      ResultExtractionMethod: termination-message
      RunningInEnvWithInjectedSidecars: true
      ScopeWhenExpressionsToTask: false
      SendCloudEventsForRuns: false
      SetSecurityContext: false
      VerificationNoMatchPolicy: ignore

Pod of taskrun is deleted

Task run fails, completionTime is set, Success condition with status "False" exists.

status:
  completionTime: "2024-04-18T13:34:27Z"
  conditions:
  - lastTransitionTime: "2024-04-18T13:34:27Z"
    message: build failed for unspecified reasons.
    reason: Failed
    status: "False"
    type: Succeeded
  podName: steward-jenkinsfile-runner-pod
  provenance:
    featureFlags:
      AwaitSidecarReadiness: true
      Coschedule: workspaces
      ...

After some time completion time is removed, Success condition status is set to "Unknown"

status:
  conditions:
  - lastTransitionTime: "2024-04-18T13:34:30Z"
    message: Pending
    reason: Pending
    status: Unknown
    type: Succeeded

After some time, New pod is started for task run

status:
  conditions:
  - lastTransitionTime: "2024-04-18T13:34:30Z"
    message: 'pod status "Initialized":"False"; message: "containers with incomplete
      status: [prepare place-scripts]"'
    reason: Pending
    status: Unknown
    type: Succeeded

New pod is deleted

New completion Time is set to a new value

status:
  completionTime: "2024-04-18T13:35:26Z"
  conditions:
  - lastTransitionTime: "2024-04-18T13:35:26Z"
    message: build failed for unspecified reasons.
    reason: Failed
    status: "False"
    type: Succeeded

After some time completion time is removed, Success condition status is set to "Unknown"

status:
  conditions:
  - lastTransitionTime: "2024-04-18T13:35:29Z"
    message: Pending
    reason: Pending
    status: Unknown
    type: Succeeded

After some time, New pod is started for task run

status:
  conditions:
  - lastTransitionTime: "2024-04-18T13:35:29Z"
    message: 'pod status "Initialized":"False"; message: "containers with incomplete
      status: [prepare place-scripts]"'
    reason: Pending
    status: Unknown
    type: Succeeded
@rinckm rinckm added the kind/bug Categorizes issue or PR as related to a bug. label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant