Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Document node field selector. Closes #2860 #2882

Merged
merged 14 commits into from
May 1, 2020
2 changes: 1 addition & 1 deletion api/openapi-spec/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -7078,7 +7078,7 @@
"nodeFieldSelector": {
"type": "string"
},
"restartSuccesful": {
"restartSuccessful": {
"type": "boolean",
"format": "boolean"
}
Expand Down
4 changes: 2 additions & 2 deletions cmd/argo/commands/retry.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ func NewRetryCommand() *cobra.Command {
wf, err := serviceClient.RetryWorkflow(ctx, &workflowpkg.WorkflowRetryRequest{
Name: name,
Namespace: namespace,
RestartSuccesful: retryOps.restartSuccessful,
RestartSuccessful: retryOps.restartSuccessful,
NodeFieldSelector: selector.String(),
})
if err != nil {
Expand All @@ -53,7 +53,7 @@ func NewRetryCommand() *cobra.Command {
command.Flags().StringVarP(&cliSubmitOpts.output, "output", "o", "", "Output format. One of: name|json|yaml|wide")
command.Flags().BoolVarP(&cliSubmitOpts.wait, "wait", "w", false, "wait for the workflow to complete")
command.Flags().BoolVar(&cliSubmitOpts.watch, "watch", false, "watch the workflow until it completes")
command.Flags().BoolVar(&retryOps.restartSuccessful, "restart-successful", false, "indicates to restart succesful nodes matching the --node-field-selector")
command.Flags().BoolVar(&retryOps.restartSuccessful, "restart-successful", false, "indicates to restart successful nodes matching the --node-field-selector")
command.Flags().StringVar(&retryOps.nodeFieldSelector, "node-field-selector", "", "selector of nodes to reset, eg: --node-field-selector inputs.paramaters.myparam.value=abc")
return command
}
2 changes: 2 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Some use-case specific documentation is available:
* [Contributing](CONTRIBUTING.md)
* [Argo Workflow Architecture](architecture.md)
* [Argo Server](argo-server.md)
* [Asynchronous Job Pattern](async-pattern.md)
* [CLI](cli.md)
* [Cluster Workflow Templates](cluster-workflow-templates.md)
* [Configuring Your Artifact Repository](configure-artifact-repository.md)
Expand All @@ -29,6 +30,7 @@ Some use-case specific documentation is available:
* [Links](links.md)
* [Managed Namespace](managed-namespace.md)
* [Prometheus Metrics](metrics.md)
* [Node Field Selectors](node-field-selector.md)
* [Offloading Large Workflows](offloading-large-workflows.md)
* [Public API](public-api.md)
* [Release Instructions](releasing.md)
Expand Down
96 changes: 96 additions & 0 deletions docs/async-pattern.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Asynchronous Job Pattern

## Introduction

If triggering an external job (eg an Amazon EMR job) from Argo that does not run to completion in a container, there are two options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is great


- create a container that polls the external job completion status
- combine a trigger step that starts the job with a `Suspend` step that is unsuspended by an API call to Argo when the external job is complete.

This document describes the second option in more detail.

## The pattern

The pattern involves two steps - the first step is a short-running step that triggers a long-running job outside Argo (eg an HTTP submission), and the second step is a `Suspend` step that suspends workflow exection and is ultimately either resumed or stopped (ie failed) via a call to the Argo API when the job outside Argo succeeds or fails.

When implemented as a `WorkflowTemplate` it can look something like this:

```
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: external-job-template
spec:
templates:
- name: run-external-job
inputs:
parameters:
- name: "job-cmd"
steps:
- - name: trigger-job
template: trigger-job
arguments:
parameters:
- name: "job-cmd"
value: "{{inputs.parameters.job-cmd}}"
- - name: wait-completion
template: wait-completion
arguments:
parameters:
- name: uuid
value: "{{steps.trigger-job.outputs.result}}"

- name: trigger-job
inputs:
parameters:
- name: "job-cmd"
value: "{{inputs.parameters.job-cmd}}"
image: appropriate/curl:latest
command: ["/bin/sh", "-c"]
args: ["{{inputs.parameters.cmd}}"]

- name: wait-completion
inputs:
parameters:
- name: uuid
suspend: {}
```

In this case the ```job-cmd``` parameter can be a command that makes an http call via curl to an endpoint that returns a job uuid. More sophisticated submission and parsing of submission output could be done with something like a Python script step.

On job completion the external job would need to call either resume if successful:

```
curl --request PUT \
--url http://localhost:2746/api/v1/workflows/<NAMESPACE>/<WORKFLOWNAME>/resume
--header 'content-type: application/json' \
--data '{
"namespace": "<NAMESPACE>",
"name": "<WORKFLOWNAME>",
"nodeFieldSelector": "inputs.parameters.uuid.value=<UUID>"
}'
```

or stop if unsuccessful:

```
curl --request PUT \
--url http://localhost:2746/api/v1/workflows/<NAMESPACE>/<WORKFLOWNAME>/stop
--header 'content-type: application/json' \
--data '{
"namespace": "<NAMESPACE>",
"name": "<WORKFLOWNAME>",
"nodeFieldSelector": "inputs.parameters.uuid.value=<UUID>",
"message": "<FAILURE-MESSAGE>"
}'
```

## Retrying failed jobs

Using `argo retry` on failed jobs that follow this pattern will cause Argo to re-attempt the Suspend step without re-triggering the job.

Instead you need to use the `--restart-successful` option, eg if using the template from above:

```
argo retry <WORKFLOWNAME> --restart-successful --node-field-selector templateRef.template=run-external-job,phase=Failed
```
42 changes: 42 additions & 0 deletions docs/node-field-selector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Node Field Selectors

![alpha](assets/alpha.svg)

> v2.8 and after

## Introduction

The resume, stop and retry Argo CLI and API commands support a `--node-field-selector` parameter to allow the user to select a subset of nodes for the command to apply to.

In the case of the resume and stop commands these are the nodes that should be resumed or stopped.

In the case of the retry command it allows specifying nodes that should be restarted even if they were previously successful (and must be used in combination with `--restart-successful`)

The format of this when used with the CLI is:

```--node-field-selector=FIELD=VALUE```

## Possible options

The field can be any of:

| Field | Description|
|----------|------------|
| displayName | Display name of the node |
| templateName | Template name of the node |
| phase | Phase status of the node - eg Running |
| templateRef.name | The name of the WorkflowTemplate the node is referring to |
| templateRef.template | The template within the WorkflowTemplate the node is referring to |
| inputs.parameters.<NAME>.value | The value of input parameter NAME |

The operator can be '=' or '!='. Multiple selectors can be combined with a comma, in which case they are ANDed together.

## Examples

To filter for nodes where the input parameter 'foo' is equal to 'bar':

```--node-field-selector=inputs.parameters.foo.value=bar```

To filter for nodes where the input parameter 'foo' is equal to 'bar' and phase is not running:

```--node-field-selector=foo1=bar1,phase!=Running```