Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness/Liveness probes do not accept integer port #778

Open
mmourafiq opened this issue Jul 25, 2023 · 4 comments
Open

Readiness/Liveness probes do not accept integer port #778

mmourafiq opened this issue Jul 25, 2023 · 4 comments

Comments

@mmourafiq
Copy link

Describe the issue:

Although the specification of the cluster is suggesting int_or_type, using integer probes raises an error, here's an example based on the documentation where the port http_dashboard is 8786, basically:

              readinessProbe:
                httpGet:
                  port: http-dashboard
                  path: /health
                initialDelaySeconds: 5
                periodSeconds: 10
              livenessProbe:
                httpGet:
                  port: http-dashboard
                  path: /health
                initialDelaySeconds: 15
                periodSeconds: 20

is replaced with this:

              readinessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 5
                periodSeconds: 10
              livenessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 15
                periodSeconds: 20

If you check the type definition of the probes, e.g. python definition https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1HTTPGetAction.md, you will notice that it's of type object and accepts string or integer, here's also the kubernetes docs: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request

Full example:

apiVersion: kubernetes.dask.org/v1
kind: DaskJob
metadata:
  name: simple-job
  namespace: default
spec:
  job:
    spec:
      containers:
        - name: job
          image: "ghcr.io/dask/dask:latest"
          imagePullPolicy: "IfNotPresent"
          args:
            - python
            - -c
            - "from dask.distributed import Client; client = Client(); # Do some work..."

  cluster:
    spec:
      worker:
        replicas: 2
        spec:
          containers:
            - name: worker
              image: "ghcr.io/dask/dask:latest"
              imagePullPolicy: "IfNotPresent"
              args:
                - dask-worker
                - --name
                - $(DASK_WORKER_NAME)
                - --dashboard
                - --dashboard-address
                - "8788"
              ports:
                - name: http-dashboard
                  containerPort: 8788
                  protocol: TCP
              env:
                - name: WORKER_ENV
                  value: hello-world # We dont test the value, just the name
      scheduler:
        spec:
          containers:
            - name: scheduler
              image: "ghcr.io/dask/dask:latest"
              imagePullPolicy: "IfNotPresent"
              args:
                - dask-scheduler
              ports:
                - name: tcp-comm
                  containerPort: 8786
                  protocol: TCP
                - name: http-dashboard
                  containerPort: 8787
                  protocol: TCP
              readinessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 5
                periodSeconds: 10
              livenessProbe:
                httpGet:
                  port: 8786
                  path: /health
                initialDelaySeconds: 15
                periodSeconds: 20
              env:
                - name: SCHEDULER_ENV
                  value: hello-world
        service:
          type: ClusterIP
          selector:
            dask.org/cluster-name: simple-job
            dask.org/component: scheduler
          ports:
            - name: tcp-comm
              protocol: TCP
              port: 8786
              targetPort: "tcp-comm"
            - name: http-dashboard
              protocol: TCP
              port: 8787
              targetPort: "http-dashboard"

Anything else we need to know?:

The error during the submission:

spec.cluster.spec.scheduler.spec.containers[0].readinessProbe.httpGet.port: Invalid value: "integer": spec.cluster.spec.scheduler.spec.containers[0].readinessProbe.httpGet.port in body must be of type string: "integer"

Environment:

  • Dask version: latest
  • Dask Kubernets operator
@mmourafiq mmourafiq changed the title Readiness/Liveness probes do not accespt integer port Readiness/Liveness probes do not accept integer port Jul 25, 2023
@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jul 25, 2023

Thanks @mmourafiq. I'm not 100% sure where to look to resolve this because DaskJob.spec.cluster.spec.scheduler.spec is just an io.k8s.api.core.v1.PodSpec and should be validated exactly the same as any other Pod spec. We use k8s-crd-resolver to generate the CRDs from our templates.

I note that our CRD templates are referencing the Kubernetes 1.21.1 spec so perhaps bumping those to a more recent version would help?

$ref: 'python://k8s_crd_resolver/schemata/k8s-1.21.1.json#/definitions/io.k8s.api.core.v1.PodSpec'

@mmourafiq
Copy link
Author

I see, I think the issue is in the k8s-crd-resolver, the choice of intOrString from the machinery is probably not the correct one. I already tried port: "8786" but the issue with that is that kubernetes would automatically try to resolve a port name if the value is string, i.e. it would complain about the value not starting with a charter value when there are quotes.

@jacobtomlinson
Copy link
Member

We can patch things and do in a couple of places already. Maybe we should do that here? Do you know what type should it be instead ofintOrString?

https://github.com/dask/dask-kubernetes/blob/main/dask_kubernetes/operator/customresources/daskcluster.patch.yaml

@mmourafiq
Copy link
Author

Sorry for late reply, I just checked again the generated CRD from kubebuilder, and indeed intOrString is the correct one. But the type needs to change from string to:

anyOf:
  - type: integer
  - type: string

Not sure if this is supported, but here's the full generated spec:

port:
  anyOf:
  - type: integer
  - type: string
  description: Name or number of the port to access
    on the container. Number must be in the range
    1 to 65535. Name must be an IANA_SVC_NAME.
  x-kubernetes-int-or-string: true

Hope this helps.

P.S. I reworked the converter in our application to use the port name string instead of the port value int, but the issue could happen to other users and it will easily take ~ an hour of debugging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants