Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions chart/templates/skyhook-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -536,20 +536,30 @@ spec:
type: object
type: object
x-kubernetes-map-type: atomic
runtimeRequired:
default: false
description: This skyhook is required to have been completed before
any workloads can start
type: boolean
priority:
description: Priority determines the order in which skyhooks are applied. Lower values are applied first.
type: integer
minimum: 1
default: 200
runtimeRequired:
default: false
description: This skyhook is required to have been completed before
any workloads can start
type: boolean
sequencing:
default: node
description: |-
Sequencing controls whether priority ordering is enforced globally or per-node.
"node" (default): a node can proceed past this skyhook independently once it completes on that node.
"all": all nodes must complete this skyhook before any node starts the next priority.
enum:
- all
- node
type: string
serial:
default: false
description: Serial tells skyhook if it allowed to run in parallel or
not when applying packages
description: Serial tells skyhook if it allowed to run in packages
in parallel. If true, the operator will run one package at a time.
type: boolean
type: object
status:
Expand Down
112 changes: 86 additions & 26 deletions docs/ordering_of_skyhooks.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,102 @@
# Ordering of Skyhooks
## What
Skyhooks are applied in a repeatable and specific order based on their `priority` field. Each custom resource supports a `priority` field which is a non-zero positive integer. Skyhooks will be processed in order starting from 1, any Skyhooks with the same `priority` will be processed by sorting them by their `metadata.name` field.

## Priority

Skyhooks are applied in a repeatable and specific order based on their `priority` field. Each custom resource supports a `priority` field which is a non-zero positive integer. Skyhooks will be processed in order starting from 1. Skyhooks with the same `priority` are processed by sorting on their `metadata.name` field.

**NOTE**: Any Skyhook which does NOT provide a `priority` field will be assigned a priority value of 200.

## Per-Node Ordering
---

## Sequencing

The `sequencing` field on each Skyhook controls how it gates the next priority level. This determines whether nodes progress independently or must synchronize.

### `sequencing: node` (default)

Per-node ordering. A node proceeds past this skyhook independently once it completes on that node. Other nodes do not need to finish first.

```
Node A completes Skyhook 1 → Node A immediately starts Skyhook 2
Node B still on Skyhook 1 → Node B shows "waiting" on Skyhook 2
Node A completes Skyhook 2 → Node A is fully complete
Node B completes Skyhook 1 → Node B starts Skyhook 2
```

This prevents deadlocks where stuck or bad nodes block healthy nodes from progressing.

### `sequencing: all`

Global ordering. **ALL** nodes must complete this skyhook before **ANY** node starts the next priority level. Use this when the next priority depends on every node being at the same stage (e.g., cluster-wide configuration that must be applied everywhere before proceeding).

```yaml
apiVersion: skyhook.nvidia.com/v1alpha1
kind: Skyhook
metadata:
name: cluster-config
spec:
priority: 10
sequencing: all # all nodes must finish before priority 11+ starts
...
```

```
Node A completes cluster-config → Node A waits
Node B still on cluster-config → both nodes blocked from priority 11
Node B completes cluster-config → both nodes start priority 11
```

When a skyhook with `sequencing: all` is not yet globally complete, it shows `waiting` status at the skyhook level. Individual nodes inherit this waiting state rather than being evaluated independently.

**Important**: Priority ordering is enforced **per-node**, not globally across all nodes. This means:
- Node A can proceed to Skyhook 2 as soon as Skyhook 1 completes on Node A
- Node A does NOT wait for Node B to complete Skyhook 1
- If Node B is stuck on Skyhook 1, Node A can still progress through all its skyhooks
### Mixing modes

This per-node behavior prevents deadlocks where a few stuck/bad nodes would block all other healthy nodes from progressing through their skyhook sequence.
Different skyhooks can use different sequencing modes. A skyhook's `sequencing` field determines how **it** gates the next priority:

### Example
With two nodes (A, B) and two skyhooks (priority 1 and priority 2):
- Node A completes Skyhook 1 → Node A immediately starts Skyhook 2
- Node B is still processing Skyhook 1 → Node B shows "waiting" status on Skyhook 2
- Node A completes Skyhook 2 → Node A is fully complete
- Node B eventually completes Skyhook 1 → Node B starts Skyhook 2
```
Priority 1: driver-install (sequencing: node) ← nodes progress independently
Priority 2: cluster-config (sequencing: all) ← sync point: all must finish
Priority 3: workload-setup (sequencing: node) ← resumes per-node after sync
```

In this example, fast nodes can install drivers independently, but all nodes must complete the cluster config before any node starts workload setup.

### Caution: Deadlock risks

**`sequencing: all` + `runtimeRequired: true`** — This combination can deadlock your cluster. With `runtimeRequired`, nodes are tainted until the skyhook completes, preventing workloads from scheduling. With `sequencing: all`, every node must complete before any node moves to the next priority. If a single node fails (unhealthy, can't schedule pods, bad hardware), all nodes remain tainted and blocked indefinitely. New nodes joining the cluster with the same selector will also be tainted and must complete before the gate releases — if those nodes aren't healthy, the deadlock worsens.

**`sequencing: all` with unreliable packages** — Even without `runtimeRequired`, `sequencing: all` means one stuck node blocks all nodes from progressing to the next priority. If your package has a bug or a node has an issue that prevents completion, the entire rollout stalls. Prefer `sequencing: node` (the default) unless you have a strong reason to require cluster-wide synchronization.

**`runtimeRequired: true` with untested packages** — Since `runtimeRequired` leaves nodes tainted until the skyhook completes, a broken package image or misconfigured package will leave nodes tainted and unable to run workloads. Always test packages on a small node group first before applying with `runtimeRequired` to your full cluster.

---

## Flow Control Annotations

Two flow control features can be set in the annotations of each skyhook:
* `skyhook.nvidia.com/disable`: bool. When `true` it will skip this Skyhook from processing and continue with any other ones further down the priority order.
* `skyhook.nvidia.com/pause`: bool. When `true` it will NOT process this Skyhook and it WILL NOT continue to process any Skyhook's after this one on that node. This will effectively stop all application of Skyhooks starting with this one. NOTE: This ability used to be on the Skyhook spec itself as the `pause` field and has been moved here to be consistent with `disable` and to avoid incrementing the generation of a Skyhook Custom Resource instance when changing it.

- `skyhook.nvidia.com/disable`: bool. When `true`, skips this Skyhook from processing and continues with any others further down the priority order.
- `skyhook.nvidia.com/pause`: bool. When `true`, does NOT process this Skyhook and will NOT continue to process any Skyhooks after this one on that node. This effectively stops all application of Skyhooks starting with this one.

**NOTE**: `pause` was previously on the Skyhook spec and has been moved to annotations to be consistent with `disable` and to avoid incrementing the generation when toggling it.

---

## Recommended Priority Buckets

To coordinate work without explicit communication, we recommend bucketing Skyhooks by priority range:

| Range | Purpose | Examples |
|-------|---------|----------|
| 1–99 | Initialization and infrastructure | Security tools, monitoring agents |
| 100–199 | Configuration | SSH access, network settings |
| 200+ | User-level configuration | Workload tuning, application setup |

---

## Why
This solves a few problems:

The first is to to better support debugging. Prior to this it was impossible to know the order Skyhooks would get applied to nodes as they would all run in parallel. This can, and has, lead to issues debugging a problem as it isn't deterministic. Now every node will always receive updates in the same order as every other node. Additionaly, this removes the possiblility of conflicts between Skyhooks by heaving each one run in order.
**Deterministic ordering** — Prior to priority ordering, Skyhooks ran in parallel with no deterministic order. This made debugging difficult since different nodes could receive updates in different sequences. Priority ordering ensures every node processes Skyhooks in the same order.

The second is to provide the ability for complex tasks to be sequenced. This comes up when needing to apply different sets of work to different node groups in a particular order.
**Complex sequencing** — Some workflows require applying different sets of work to different node groups in a particular order. Priority ordering with `sequencing: all` enables cluster-wide synchronization points.

The third is to provide the community a way to bucket Skyhooks according to where they might live in a stream of updates and therefore better coordinate work without explicit communication. We propose the following buckets:
* 1 - 99 for initialization and infrastucture work
* install security or monitoring tools
* 100 - 199 for configuration work
* configuring ssh access
* 200+ for final user level configuration
* applying tuning for workloads
**Community coordination** — Priority buckets provide a shared convention so different teams can coordinate Skyhook ordering without direct communication.
4 changes: 2 additions & 2 deletions k8s-tests/chainsaw/cli/deployment-policy/chainsaw-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ metadata:
name: cli-deployment-policy-reset
spec:
timeouts:
assert: 120s
exec: 90s
assert: 150s
exec: 30s
steps:
# Step 0: Reset state and label worker nodes
- name: reset-state
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ kind: DeploymentPolicy
metadata:
name: cli-dp-reset-policy
spec:
resetBatchStateOnCompletion: true
resetBatchStateOnCompletion: true # Skyhook override (false) should take precedence

default:
budget:
Expand Down
2 changes: 1 addition & 1 deletion k8s-tests/chainsaw/cli/deployment-policy/skyhook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ spec:

packages:
test-package:
version: "1.0.0"
version: "1.1.1"
image: ghcr.io/nvidia/skyhook-packages/shellscript
configMap:
apply.sh: |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ metadata:
spec:
description: Test batch state reset - auto-reset on completion, config precedence, and manual CLI reset
timeouts:
assert: 240s
assert: 270s
exec: 30s

steps:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ kind: DeploymentPolicy
metadata:
name: overlapping-selector-policy
spec:
resetBatchStateOnCompletion: false # Preserve batch state so test can assert batch progression
default:
budget:
percent: 100
Expand Down
9 changes: 6 additions & 3 deletions k8s-tests/chainsaw/skyhook/cleanup-pods/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ Validates that the operator correctly cleans up pods when a node's state is rese

## Files

- `chainsaw-test.yaml` - Main test configuration
- `skyhook.yaml` - Skyhook resource definition
- `assert.yaml` - State assertions
- `chainsaw-test.yaml` - Main test configuration with lifecycle assertions inline (pods, nodes, skyhook status) for sequential ordering
- `setup.yaml` - Skyhook resource definition with package dependencies
- `assert-setup-complete.yaml` - Assertion for initial setup completion
- `assert-config-complete.yaml` - Assertion for config cycle completion
- `force-config.yaml` - Update to trigger a config cycle
- `muck_up.yaml` - Update to make a package error
162 changes: 0 additions & 162 deletions k8s-tests/chainsaw/skyhook/cleanup-pods/assert-cleaned-pods.yaml

This file was deleted.

Loading
Loading