Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Settings | Description | Default |
| controllerManager.manager.agent.repository | Where to get the image from | "ghcr.io/nvidia/skyhook/agent" |
| controllerManager.manager.agent.tag | what version of the agent to run | defaults to the current latest, but is not latest example v6.1.5 |
| controllerManager.manager.agent.digest | content-addressable pin for the agent image. Same precedence rules as above: if both tag and digest are provided, the digest controls which image is pulled. | "" |
| imagePullSecret | the secret used to pull the operator controller image, agent image, and package images. | node-init-secret |
| imagePullSecret | the secret used to pull the operator controller image, agent image, and package images. | "" |
| estimatedPackageCount | estimated number of packages to be installed on the cluster, this is used to calculate the resources for the operator controller. | 1 |
| estimatedNodeCount | estimated number of nodes in the cluster, this is used to calculate the resources for the operator controller | 1 |

Expand All @@ -44,6 +44,22 @@ Settings | Description | Default |
- **CRD**: This project currently has one CRD and its not managed the ["recommended" way](https://helm.sh/docs/chart_best_practices/custom_resource_definitions/). Its part of the templates. Meaning it will be updated with the `helm upgrade`. We decided it was better do it this way for this project. Doing it either way has consequences and this route has worked well for upgrades so far our deployments.
- **Image pinning (tag vs digest)**: You can set either an image tag or a digest. If both are set, the digest is prioritized; the tag is ignored for selection and may appear as `tag@digest` only for readability. This applies to both operator and agent images.

## Upgrade Notes

### Breaking Change: imagePullSecret Default Changed

**Previous behavior:** The `imagePullSecret` value defaulted to `node-init-secret`. If this secret didn't exist, kubelet logs would show errors about the missing secret.

**New behavior:** The `imagePullSecret` value now defaults to empty (`""`). No imagePullSecrets will be added to pods unless explicitly configured.

**Migration:** If you rely on `node-init-secret` for pulling images from private registries, you must now explicitly set `imagePullSecret` in your values:

```yaml
imagePullSecret: "node-init-secret"
```

If you use public images (like the default ghcr.io images), no action is needed.

### Resource Management
Skyhook uses Kubernetes LimitRange to set default CPU/memory requests/limits for all containers in the namespace. You can override these per-package in your Skyhook CR. Strict validation is enforced. See [../docs/resource_management.md](../docs/resource_management.md) for details and examples.

Expand Down
2 changes: 2 additions & 0 deletions chart/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,9 @@ spec:
securityContext: {{- toYaml .Values.controllerManager.kubeRbacProxy.containerSecurityContext
| nindent 10 }}
imagePullSecrets:
{{- if .Values.imagePullSecret }}
- name: {{ quote .Values.imagePullSecret }}
{{- end }}
volumes:
- name: kube-api-access
projected:
Expand Down
2 changes: 1 addition & 1 deletion chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ rbac:
createSkyhookViewerRole: false
createSkyhookEditorRole: false
## imagePullSecret: is the secret used to pull the operator controller image, agent image, and package images.
imagePullSecret: node-init-secret
imagePullSecret: ""
## useHostNetwork: Whether the Operator pods should use hostNetwork: true or false
useHostNetwork: false
## estimatedPackageCount: estimated number of packages to be installed on the cluster
Expand Down
6 changes: 3 additions & 3 deletions operator/config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ spec:
# versions < 1.19 or on vendors versions which do NOT support this field by default (i.e. Openshift < 4.11 ).
# seccompProfile:
# type: RuntimeDefault
imagePullSecrets:
- name: node-init-secret
# imagePullSecrets:
# - name: node-init-secret
containers:
- command:
- /manager
Expand All @@ -98,7 +98,7 @@ spec:
- name: NAMESPACE
value: skyhook-operator-system
- name: IMAGE_PULL_SECRET
value: node-init-secret
value: ""
- name: COPY_DIR_ROOT
value: /var/lib/skyhook
- name: REAPPLY_ON_REBOOT
Expand Down
26 changes: 15 additions & 11 deletions operator/internal/controller/skyhook_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ const (
type SkyhookOperatorOptions struct {
Namespace string `env:"NAMESPACE, default=skyhook"`
MaxInterval time.Duration `env:"DEFAULT_INTERVAL, default=10m"`
ImagePullSecret string `env:"IMAGE_PULL_SECRET, default=node-init-secret"` //TODO: should this be defaulted?
ImagePullSecret string `env:"IMAGE_PULL_SECRET"`
CopyDirRoot string `env:"COPY_DIR_ROOT, default=/var/lib/skyhook"`
ReapplyOnReboot bool `env:"REAPPLY_ON_REBOOT, default=false"`
RuntimeRequiredTaint string `env:"RUNTIME_REQUIRED_TAINT, default=skyhook.nvidia.com=runtime-required:NoSchedule"`
Expand Down Expand Up @@ -1608,11 +1608,6 @@ func createInterruptPodForPackage(opts SkyhookOperatorOptions, _interrupt *v1alp
},
},
},
ImagePullSecrets: []corev1.LocalObjectReference{
{
Name: opts.ImagePullSecret,
},
},
HostPID: true,
HostNetwork: true,
// If you change these go change the SelectNode toleration in cluster_state.go
Expand All @@ -1626,6 +1621,13 @@ func createInterruptPodForPackage(opts SkyhookOperatorOptions, _interrupt *v1alp
Volumes: volumes,
},
}
if opts.ImagePullSecret != "" {
pod.Spec.ImagePullSecrets = []corev1.LocalObjectReference{
{
Name: opts.ImagePullSecret,
},
}
}
return pod
}

Expand Down Expand Up @@ -1819,11 +1821,6 @@ func createPodFromPackage(opts SkyhookOperatorOptions, _package *v1alpha1.Packag
},
},
},
ImagePullSecrets: []corev1.LocalObjectReference{
{
Name: opts.ImagePullSecret,
},
},
Volumes: volumes,
HostPID: true,
HostNetwork: true,
Expand All @@ -1837,6 +1834,13 @@ func createPodFromPackage(opts SkyhookOperatorOptions, _package *v1alpha1.Packag
}, skyhook.Spec.AdditionalTolerations...),
},
}
if opts.ImagePullSecret != "" {
pod.Spec.ImagePullSecrets = []corev1.LocalObjectReference{
{
Name: opts.ImagePullSecret,
},
}
}
if _package.GracefulShutdown != nil {
pod.Spec.TerminationGracePeriodSeconds = ptr(int64(_package.GracefulShutdown.Duration.Seconds()))
}
Expand Down
71 changes: 71 additions & 0 deletions operator/internal/controller/skyhook_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -945,6 +945,77 @@ var _ = Describe("skyhook controller tests", func() {
Expect(found_toleration).To(BeTrue())
})

It("Pods should not have imagePullSecrets when ImagePullSecret is empty", func() {
emptyOpts := SkyhookOperatorOptions{
Namespace: "skyhook",
MaxInterval: time.Second * 61,
ImagePullSecret: "", // Empty - no pull secret
CopyDirRoot: "/tmp",
ReapplyOnReboot: true,
RuntimeRequiredTaint: "skyhook.nvidia.com=runtime-required:NoSchedule",
AgentImage: "foo:bar",
PauseImage: "foo:bar",
}

pod := createPodFromPackage(
emptyOpts,
&v1alpha1.Package{
PackageRef: v1alpha1.PackageRef{
Name: "foo",
Version: "1.1.2",
},
Image: "foo/bar",
},
&wrapper.Skyhook{
Skyhook: &v1alpha1.Skyhook{
Spec: v1alpha1.SkyhookSpec{
RuntimeRequired: true,
},
},
},
"node1",
v1alpha1.StageApply,
)
Expect(pod.Spec.ImagePullSecrets).To(BeEmpty())
})

It("Interrupt pods should not have imagePullSecrets when ImagePullSecret is empty", func() {
emptyOpts := SkyhookOperatorOptions{
Namespace: "skyhook",
MaxInterval: time.Second * 61,
ImagePullSecret: "", // Empty - no pull secret
CopyDirRoot: "/tmp",
ReapplyOnReboot: true,
RuntimeRequiredTaint: "skyhook.nvidia.com=runtime-required:NoSchedule",
AgentImage: "foo:bar",
PauseImage: "foo:bar",
}

pod := createInterruptPodForPackage(
emptyOpts,
&v1alpha1.Interrupt{
Type: v1alpha1.REBOOT,
},
"argEncode",
&v1alpha1.Package{
PackageRef: v1alpha1.PackageRef{
Name: "foo",
Version: "1.1.2",
},
Image: "foo/bar",
},
&wrapper.Skyhook{
Skyhook: &v1alpha1.Skyhook{
Spec: v1alpha1.SkyhookSpec{
RuntimeRequired: true,
},
},
},
"node1",
)
Expect(pod.Spec.ImagePullSecrets).To(BeEmpty())
})

It("should generate deterministic pod names", func() {
// Setup basic test data
skyhook := &wrapper.Skyhook{
Expand Down
Loading