Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when volumeMount does not exist #3436

Closed
2 of 4 tasks
Mastergalen opened this issue Jul 9, 2020 · 11 comments · Fixed by #3437 or #3451
Closed
2 of 4 tasks

Panic when volumeMount does not exist #3436

Mastergalen opened this issue Jul 9, 2020 · 11 comments · Fixed by #3437 or #3451
Assignees
Labels
type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@Mastergalen
Copy link
Contributor

Mastergalen commented Jul 9, 2020

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:

When submitting a workflow that depends on a WorkflowTemplate, the workflow crashes and immediately fails (without jumping to onExit).

Turns out that the crash actually occurred due to a volumeMount not existing.

The message in argo watch is "Message: runtime error: invalid memory address or nil pointer dereference"

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The same workflow used to work in Argo v2.8.2

Environment:

  • Argo version:
$ argo version

argo: v2.9.2
  BuildDate: 2020-07-08T23:55:01Z
  GitCommit: 65c2bd44e45c11e0a0b03adeef8d6800b72cd551
  GitTreeState: clean
  GitTag: v2.9.2
  GoVersion: go1.13.4
  Compiler: gc
  Platform: darwin/amd64
  • Kubernetes version :
$ kubectl version -o yaml

clientVersion:
  buildDate: "2020-05-21T14:51:23Z"
  compiler: gc
  gitCommit: 2e7996e3e2712684bc73f0dec0200d64eec7fe40
  gitTreeState: clean
  gitVersion: v1.18.3
  goVersion: go1.14.3
  major: "1"
  minor: "18"
  platform: darwin/amd64
serverVersion:
  buildDate: "2020-03-12T20:55:23Z"
  compiler: gc
  gitCommit: 8d8aa39598534325ad77120c120a22b3a990b5ea
  gitTreeState: clean
  gitVersion: v1.17.4
  goVersion: go1.13.8
  major: "1"
  minor: "17"
  platform: linux/amd64

Other debugging information (if applicable):

  • workflow-controller logs:
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

2020-07-09T17:27:32.816871832Z time="2020-07-09T17:27:32Z" level=info msg="Processing workflow" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.816928736Z time="2020-07-09T17:27:32Z" level=info msg="Updated phase  -> Running" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.825983852Z time="2020-07-09T17:27:32Z" level=info msg="Steps node test-workflow-xqv44 initialized Running" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.826032384Z time="2020-07-09T17:27:32Z" level=info msg="StepGroup node test-workflow-xqv44-3078889322 initialized Running" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.827444929Z time="2020-07-09T17:27:32Z" level=info msg="Steps node test-workflow-xqv44-4071128617 initialized Running" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.827485903Z time="2020-07-09T17:27:32Z" level=info msg="StepGroup node test-workflow-xqv44-985411005 initialized Running" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.828114123Z time="2020-07-09T17:27:32Z" level=info msg="Pod node test-workflow-xqv44-137232839 initialized Pending" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.828794253Z time="2020-07-09T17:27:32Z" level=error msg="Recovered from panic" namespace=default r="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 532 [running]:
runtime/debug.Stack(0xc08d9b674e, 0x173d160, 0x28710d0)
    /usr/local/go/src/runtime/debug/stack.go:24
+0x9d
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).operate.func2(0xc001ea0580)
    /go/src/github.com/argoproj/argo/workflow/controller/operator.go:161
+0xc4
panic(0x173d160, 0x28710d0)
    /usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeTemplate(0xc001ea0580, 0xc00cfd0f80, 0x77, 0x1bb9bc0, 0xc0006e4420, 0xc00ab578c0, 0xc0064cac60, 0x5, 0x6, 0x0, ...)
    /go/src/github.com/argoproj/argo/workflow/controller/operator.go:1510
+0x1866
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeStepGroup(0xc001ea0580, 0xc0064caf20, 0x2, 0x4, 0xc002908380, 0x68, 0xc0012563b8, 0x1bb9b00)
    /go/src/github.com/argoproj/argo/workflow/controller/steps.go:233 +0x11e1
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeSteps(0xc001ea0580, 0xc0029081c0, 0x65, 0xc00ab578c0, 0xc009296cf0, 0x23, 0xc00a583440, 0x1bb9bc0, 0xc00069d970, 0xc0012585b8, ...)
    /go/src/github.com/argoproj/argo/workflow/controller/steps.go:91
+0x46e
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeTemplate(0xc001ea0580, 0xc0029081c0, 0x65, 0x1bb9bc0, 0xc00069d970, 0xc00ab56c40, 0xc000988580, 0x6, 0x6, 0x0, ...)
    /go/src/github.com/argoproj/argo/workflow/controller/operator.go:1496
+0x19a4
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeStepGroup(0xc001ea0580, 0xc00069d8c0, 0x1, 0x4, 0xc009cc1880, 0x20, 0xc001259950, 0x1bb9b00)
    /go/src/github.com/argoproj/argo/workflow/controller/steps.go:233
+0x11e1
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeSteps(0xc001ea0580, 0xc00e5583a0, 0x1d, 0xc00ab56c40, 0xc009296870, 0x23, 0xc00a582240, 0x1bb9bc0, 0xc001ea0a50, 0xc0006bfef0, ...)
    /go/src/github.com/argoproj/argo/workflow/controller/steps.go:91
+0x46e
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeTemplate(0xc001ea0580, 0xc00e5583a0, 0x1d, 0x1bb9bc0, 0xc001ea0a50, 0xc00ab56ac0, 0xc001ea0790, 0x3, 0x3, 0x0, ...)
    /go/src/github.com/argoproj/argo/workflow/controller/operator.go:1496
+0x19a4
github.com/argoproj/argo/workflow/controller.(*wfOperationCtx).operate(0xc001ea0580)
    /go/src/github.com/argoproj/argo/workflow/controller/operator.go:287 +0x125d
github.com/argoproj/argo/workflow/controller.(*WorkflowController).processNextItem(0xc0004d1180, 0x0)
    /go/src/github.com/argoproj/argo/workflow/controller/controller.go:433 +0x8c2
github.com/argoproj/argo/workflow/controller.(*WorkflowController).runWorker(0xc0004d1180)
    /go/src/github.com/argoproj/argo/workflow/controller/controller.go:359 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00108c1f0)
    /go/pkg/mod/k8s.io/apimachinery@v0.16.7-beta.0/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00108c1f0, 0x3b9aca00, 0x0, 0x1, 0xc0006aa9c0)
    /go/pkg/mod/k8s.io/apimachinery@v0.16.7-beta.0/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00108c1f0, 0x3b9aca00, 0xc0006aa9c0)
    /go/pkg/mod/k8s.io/apimachinery@v0.16.7-beta.0/pkg/util/wait/wait.go:88
+0x4d
created by github.com/argoproj/argo/workflow/controller.(*WorkflowController).Run
    /go/src/github.com/argoproj/argo/workflow/controller/controller.go:172 +0x975
" workflow=test-workflow-xqv44

2020-07-09T17:27:32.828856307Z time="2020-07-09T17:27:32Z" level=info msg="Updated phase Running -> Error" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.828865379Z time="2020-07-09T17:27:32Z" level=info msg="Updated message  -> runtime error: invalid memory address or nil pointer dereference" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.828873448Z time="2020-07-09T17:27:32Z" level=info msg="Marking workflow completed" namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.828881787Z time="2020-07-09T17:27:32Z" level=info msg="Checking daemoned children of " namespace=default workflow=test-workflow-xqv44
2020-07-09T17:27:32.839810882Z time="2020-07-09T17:27:32Z" level=info msg="Workflow update successful" namespace=default phase=Error resourceVersion=51728589 workflow=test-workflow-xqv44

Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@alexec alexec added the type/regression Regression from previous behavior (a specific type of bug) label Jul 9, 2020
@alexec
Copy link
Contributor

alexec commented Jul 9, 2020

if err != nil {
		node = woc.markNodeError(node.Name, err)
		// If retry policy is not set, or if it is not set to Always or OnError, we won't attempt to retry an errored container
		// and we return instead.
		if processedTmpl.RetryStrategy == nil ||
			(processedTmpl.RetryStrategy.RetryPolicy != wfv1.RetryPolicyAlways &&
				processedTmpl.RetryStrategy.RetryPolicy != wfv1.RetryPolicyOnError) {
			return node, err
		}
	}

This code makes no sense to me, node can be expected to be nil at this line, but this code has not changed for a while.

@alexec
Copy link
Contributor

alexec commented Jul 9, 2020

@Mastergalen please can you attach the workflow YAML? I'm assuming it contains a DAG - but I'd like to be able to repro.

@alexec alexec self-assigned this Jul 9, 2020
alexec added a commit to alexec/argo-workflows that referenced this issue Jul 9, 2020
@Mastergalen
Copy link
Contributor Author

Turns out the crash has nothing to do with WorkflowTemplates, but actually has to do with a volumeMount not existing.

Here is a workflow that produces the crash:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: repo-crash-
spec:
  entrypoint: start
  templates:
    - name: start
      container:
        image: alpine
        command: [echo, "hello"]
        volumeMounts:
          - mountPath: /mnt/doesnotexist
            name: doesnotexist

@Mastergalen Mastergalen changed the title invalid memory address or nil pointer dereference when submitting template with WorkflowTemplate Panic when volumeMount does not exist Jul 9, 2020
@alexec
Copy link
Contributor

alexec commented Jul 9, 2020

Did you change fix it?

@JayLeeResBio
Copy link

JayLeeResBio commented Jul 10, 2020

@alexec I encounter this issue even if the volume does exist, defined in the WorkflowTemplate. Invoking the wftmpl from a Workflow using workflowTemplateRef causes the panic as if the volume is missing.

argo: v2.9.2
  BuildDate: 2020-07-08T23:54:33Z
  GitCommit: 65c2bd44e45c11e0a0b03adeef8d6800b72cd551
  GitTreeState: clean
  GitTag: v2.9.2
  GoVersion: go1.13.4
  Compiler: gc
  Platform: linux/amd64

WorkflowTemplate:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-whalesay
spec:
  serviceAccountName: workflow-service-account
  volumes:
    - name: data
      emptyDir: {}
  arguments:
    parameters:
      - name: message
  entrypoint: start
  templates:
    - name: start
      container:
        image: docker/whalesay
        command: [cowsay]
        args: ["{{workflow.parameters.message}}"]
        volumeMounts:
          - name: data
            path: /mnt/data

Workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: try-say-
spec:
  arguments:
    parameters:
      - name: message
        value: "hello world"
  workflowTemplateRef:
    name: workflow-template-whalesay

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

You YAML has mistakes, it is mountPath not path:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-whalesay
  namespace: argo
spec:
  serviceAccountName: workflow-service-account
  volumes:
    - name: data
      emptyDir: {}
  arguments:
    parameters:
      - name: message
        value: hello
  entrypoint: start
  templates:
    - name: start
      container:
        image: docker/whalesay
        command:
          - cowsay
        args:
          - '{{workflow.parameters.message}}'
        volumeMounts:
          - name: data
            mountPath: /mnt/data

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

Panics anyway:

 controller | time="2020-07-10T08:42:27-07:00" level=error msg="Recovered from panic" namespace=argo r="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 237 [running]:\nruntime/debug.Stack(0xc0ac33818b, 0x239f480, 0x3523c30)\n\t/usr/local/Cellar/go/1.13.4/libexec/src/runtime/debug/stack.go:24 +0x9d\ngithub.com/argoproj/argo/workflow/controller.(*wfOperationCtx).operate.func2(0xc000c364d0)\n\t/Users/acollins8/go/src/github.com/argoproj/argo/workflow/controller/operator.go:162 +0xc4\npanic(0x239f480, 0x3523c30)\n\t/usr/local/Cellar/go/1.13.4/libexec/src/runtime/panic.go:679 +0x1b2\ngithub.com/argoproj/argo/workflow/controller.(*wfOperationCtx).executeTemplate(0xc000c364d0, 0xc00037ef10, 0xd, 0x2827860, 0xc000c36580, 0xc000b76f40, 0xc0007b08c0, 0x1, 0x1, 0x0, ...)\n\t/Users/acollins8/go/src/github.com/argoproj/argo/workflow/controller/operator.go:1502 
+0x186f\ngithub.com/argoproj/argo/workflow/controller.(*wfOperationCtx).operate(0xc000c364d0)\n\t/Users/acollins8/go/src/github.com/argoproj/argo/workflow/controller/operator.go:288 +0x125d\ngithub.com/argoproj/argo/workflow/controller.(*WorkflowController).processNextItem(0xc000550e00, 0xc000687600)\n\t/Users/acollins8/go/src/github.com/argoproj/argo/workflow/controller/controller.go:438

 +0x8c2\ngithub.com/argoproj/argo/workflow/controller.(*WorkflowController).runWorker(0xc000550e00)\n\t/Users/acollins8/go/src/github.com/argoproj/argo/workflow/controller/controller.go:364
 +0x2b\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000bb9cc0)\n\t/Users/acollins8/go/pkg/mod/k8s.io/apimachinery@v0.17.5/pkg/util/wait/wait.go:152
 +0x5e\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000bb9cc0, 0x3b9aca00, 0x0, 0x1, 0xc0007048a0)\n\t/Users/acollins8/go/pkg/mod/k8s.io/apimachinery@v0.17.5/pkg/util/wait/wait.go:153
 +0xf8\nk8s.io/apimachinery/pkg/util/wait.Until(0xc000bb9cc0, 0x3b9aca00, 0xc0007048a0)\n\t/Users/acollins8/go/pkg/mod/k8s.io/apimachinery@v0.17.5/pkg/util/wait/wait.go:88 +0x4d\ncreated by github.com/argoproj/argo/workflow/controller.(*WorkflowController).Run\n\t/Users/acollins8/go/src/github.com/argoproj/argo/workflow/controller/controller.go:177
 +0x975\n" workflow=try-say-rh5zs

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

Bug fixe reveals the true error: volume 'data' not found in workflow spec

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

Original workflow: volume 'doesnotexist' not found in workflow spec

@alexec
Copy link
Contributor

alexec commented Jul 10, 2020

So workflow templates are not respecting mount paths, and I suspect other things.

@simster7 @sarabala1979

@JayLeeResBio
Copy link

@alexec I believe this error is specific to using workflowTemplateRef, since this mode of use prohibits volume definitions in the consuming Workflow, forcing us to define the volumes in the WorkflowTemplate, which as you've discovered does not appear to respect the volume definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
3 participants