Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): HA Leader election support on Workflow-controller #4622

Merged
merged 21 commits into from
Dec 9, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion Procfile
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
controller: ALWAYS_OFFLOAD_NODE_STATUS=${ALWAYS_OFFLOAD_NODE_STATUS} OFFLOAD_NODE_STATUS_TTL=30s WORKFLOW_GC_PERIOD=30s UPPERIO_DB_DEBUG=${UPPERIO_DB_DEBUG} ARCHIVED_WORKFLOW_GC_PERIOD=30s ./dist/workflow-controller --executor-image argoproj/argoexec:${VERSION} --namespaced=${NAMESPACED} --namespace ${NAMESPACE} --loglevel ${LOG_LEVEL}
controller: LEADER_ELECTION_IDENTITY=local ALWAYS_OFFLOAD_NODE_STATUS=${ALWAYS_OFFLOAD_NODE_STATUS} OFFLOAD_NODE_STATUS_TTL=30s WORKFLOW_GC_PERIOD=30s UPPERIO_DB_DEBUG=${UPPERIO_DB_DEBUG} ARCHIVED_WORKFLOW_GC_PERIOD=30s ./dist/workflow-controller --executor-image argoproj/argoexec:${VERSION} --namespaced=${NAMESPACED} --namespace ${NAMESPACE} --loglevel ${LOG_LEVEL}
argo-server: UPPERIO_DB_DEBUG=${UPPERIO_DB_DEBUG} ./dist/argo --loglevel ${LOG_LEVEL} server --namespaced=${NAMESPACED} --namespace ${NAMESPACE} --auth-mode ${AUTH_MODE} --secure=$SECURE
18 changes: 18 additions & 0 deletions docs/disaster-recovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Disaster Recovery (DR)

We only store data in your Kubernetes cluster. You should consider backing this up regularly.

Exporting example:

```
kubectl get wf,cwf,cwft,wftmpl -o yaml > backup.yaml
```

Importing example:

```
kubectl apply -f backup.yaml
```

You should also back-up any SQL persistence you use regularly with whatever tool is provided with it.
18 changes: 18 additions & 0 deletions docs/high-availability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# High-Availability (HA)

## Workflow Controller

Only one controller can run at once. If it crashes, Kubernetes will start another pod.

> v3.0
For many users, a short loss of workflow service maybe acceptable - the new controller will just continue running workflows if it restarts. However, with high service guarantees, new pods may take too long to start running workflows. You should run two replicas, and one of which will be kept on hot-standby.

## Argo Server

> v2.6
Run a minimum of two replicas, typically three, should be run, otherwise it maybe possible that API and webhook requests are dropped.

!!! Tip
Consider using [multi AZ-deployment using pod anti-affinity](https://www.verygoodsecurity.com/blog/posts/kubernetes-multi-az-deployments-using-pod-anti-affinity).
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ spec:
- workflow-controller-configmap
- --executor-image
- argoproj/argoexec:latest
env:
- name: LEADER_ELECTION_IDENTITY
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
securityContext:
runAsNonRoot: true
nodeSelector:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,18 @@ kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- apiGroups:
- coordination.k8s.io
resources:
- leases
resourceNames:
- workflow-controller
verbs:
- '*'
- apiGroups:
- ""
resources:
- secrets
verbs:
- get

14 changes: 14 additions & 0 deletions manifests/install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- coordination.k8s.io
resourceNames:
- workflow-controller
resources:
- leases
verbs:
- '*'
- apiGroups:
- ""
resources:
Expand Down Expand Up @@ -508,6 +516,12 @@ spec:
- argoproj/argoexec:latest
command:
- workflow-controller
env:
- name: LEADER_ELECTION_IDENTITY
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: argoproj/workflow-controller:latest
name: workflow-controller
securityContext:
Expand Down
14 changes: 14 additions & 0 deletions manifests/namespace-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- coordination.k8s.io
resourceNames:
- workflow-controller
resources:
- leases
verbs:
- '*'
- apiGroups:
- ""
resources:
Expand Down Expand Up @@ -403,6 +411,12 @@ spec:
- --namespaced
command:
- workflow-controller
env:
- name: LEADER_ELECTION_IDENTITY
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: argoproj/workflow-controller:latest
name: workflow-controller
securityContext:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,94 +3,102 @@ kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- ""
resources:
- pods
- pods/exec
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- watch
- list
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- delete
- get
- apiGroups:
- argoproj.io
resources:
- workflows
- workflows/finalizers
verbs:
- get
- list
- watch
- update
- patch
- delete
- create
- apiGroups:
- argoproj.io
resources:
- workflowtemplates
- workflowtemplates/finalizers
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- get
- list
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- apiGroups:
- argoproj.io
resources:
- cronworkflows
- cronworkflows/finalizers
verbs:
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "policy"
resources:
- poddisruptionbudgets
verbs:
- create
- get
- delete
- apiGroups:
- coordination.k8s.io
resources:
- leases
resourceNames:
- workflow-controller
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
- pods/exec
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- watch
- list
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- delete
- get
- apiGroups:
- argoproj.io
resources:
- workflows
- workflows/finalizers
verbs:
- get
- list
- watch
- update
- patch
- delete
- create
- apiGroups:
- argoproj.io
resources:
- workflowtemplates
- workflowtemplates/finalizers
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- get
- list
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- apiGroups:
- argoproj.io
resources:
- cronworkflows
- cronworkflows/finalizers
verbs:
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "policy"
resources:
- poddisruptionbudgets
verbs:
- create
- get
- delete
14 changes: 14 additions & 0 deletions manifests/quick-start-minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,14 @@ kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- coordination.k8s.io
resourceNames:
- workflow-controller
resources:
- leases
verbs:
- '*'
- apiGroups:
- ""
resources:
Expand Down Expand Up @@ -656,6 +664,12 @@ spec:
- --namespaced
command:
- workflow-controller
env:
- name: LEADER_ELECTION_IDENTITY
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: argoproj/workflow-controller:latest
name: workflow-controller
securityContext:
Expand Down
14 changes: 14 additions & 0 deletions manifests/quick-start-mysql.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,14 @@ kind: Role
metadata:
name: argo-role
rules:
- apiGroups:
- coordination.k8s.io
resourceNames:
- workflow-controller
resources:
- leases
verbs:
- '*'
- apiGroups:
- ""
resources:
Expand Down Expand Up @@ -745,6 +753,12 @@ spec:
- --namespaced
command:
- workflow-controller
env:
- name: LEADER_ELECTION_IDENTITY
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: argoproj/workflow-controller:latest
name: workflow-controller
securityContext:
Expand Down