Skip to content

Never terminating workflow step #179

Open
@luka5

Description

@luka5

Today we debugged some never terminating workflow step and we ended up here. Turns out, if you configure a container workflow, then a single container step will take forever and never terminate if it does not print something for a while. We were able to easily reproduce this with this workflow:

name: Debug issue
on: [workflow_dispatch]
jobs:
  debug:
    container:
      image: **some-image-path**
    runs-on: [**some-runner**]
    steps:
      - name: step1
        shell: bash
        run: |
          echo "Hello"

      - name: Wait for 5m
        shell: bash
        run: |
          sleep 5m

      - name: step3
        shell: bash
        run: |
          echo "Ciao"

If we instead change the second step to print continuously, we do not run into an error.

      - name: Wait for 5m
        shell: bash
        run: |
          for ((i=0; i<300; i++)); do
            echo "."
            sleep 1
          done
          echo "done"

We do not get any errors or valuable information when re-running in debug mode. We expect the issue to be in execPodStep. The version of @kubernetes/client-node is quite old (0.18.1 vs 0.22.0) - but to be fair, the release notes and the changes do not like there was a bug fixed.

Have you seen a similar behavior before? Do you have another recommendation to us to simply be more verbose and write progress on stdout?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions