Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.30: kube-scheduler crashes with: Observed a panic: "integer divide by zero" #124930

Closed
mikkeloscar opened this issue May 17, 2024 · 17 comments · Fixed by #124933
Closed

v1.30: kube-scheduler crashes with: Observed a panic: "integer divide by zero" #124930

mikkeloscar opened this issue May 17, 2024 · 17 comments · Fixed by #124933
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@mikkeloscar
Copy link
Contributor

mikkeloscar commented May 17, 2024

What happened?

On Kubernetes v1.30.0 (and v1.30.1), kube-scheduler can crash with the following panic if a pod is defined in a certain way:

W0514 09:09:41.391780       1 feature_gate.go:246] Setting GA feature gate MinDomainsInPodTopologySpread=true. It will be removed in a future release.
I0514 09:09:43.191448       1 serving.go:380] Generated self-signed cert in-memory
W0514 09:09:43.574824       1 authentication.go:446] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0514 09:09:43.574862       1 authentication.go:339] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0514 09:09:43.574871       1 authentication.go:363] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0514 09:09:43.574885       1 authorization.go:225] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0514 09:09:43.574992       1 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0514 09:09:43.586307       1 server.go:154] "Starting Kubernetes Scheduler" version="v1.30.0-zalando-master-117-dirty"
I0514 09:09:43.586327       1 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0514 09:09:43.587758       1 secure_serving.go:213] Serving securely on [::]:10259
I0514 09:09:43.587944       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0514 09:09:43.688733       1 leaderelection.go:250] attempting to acquire leader lease kube-system/kube-scheduler...
I0514 09:09:59.030033       1 leaderelection.go:260] successfully acquired lease kube-system/kube-scheduler
E0514 09:09:59.122043       1 runtime.go:79] Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)
goroutine 330 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1b68bc0, 0x35e1c40})
	k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x7c
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x4000be8e00?})
	k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x78
panic({0x1b68bc0?, 0x35e1c40?})
	runtime/panic.go:770 +0x124
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x40000d6900, {0x22aae08, 0x40012b9630}, {0x22d2bb8, 0x4000320fc8}, 0x4000347dc0, 0x4000cb4488)
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:502 +0x88c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x40000d6900, {0x22aae08, 0x40012b9630}, {0x22d2bb8, 0x4000320fc8}, 0x4000347dc0, 0x4000cb4488)
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:402 +0x25c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x40000d6900, {0x22aae08, 0x40012b9630}, 0x4000347dc0, {0x22d2bb8, 0x4000320fc8}, 0x4000c92e10, {0x2?, 0x2?, 0x36352c0?}, ...)
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:149 +0xb8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).ScheduleOne(0x40000d6900, {0x22aae08, 0x40012b83c0})
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:111 +0x4c0
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x2c
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x400161dec8?)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x400161df68, {0x2286fe0, 0x4001398b70}, 0x1, 0x40012bab40)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x4000669f68, 0x0, 0x0, 0x1, 0x40012bab40)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x22aae08, 0x40012b83c0}, 0x4001385720, 0x0, 0x0, 0x1)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:170
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 352
	k8s.io/kubernetes/pkg/scheduler/scheduler.go:445 +0x104
panic: runtime error: integer divide by zero [recovered]
	panic: runtime error: integer divide by zero

goroutine 330 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x4000be8e00?})
	k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xe0
panic({0x1b68bc0?, 0x35e1c40?})
	runtime/panic.go:770 +0x124
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x40000d6900, {0x22aae08, 0x40012b9630}, {0x22d2bb8, 0x4000320fc8}, 0x4000347dc0, 0x4000cb4488)
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:502 +0x88c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x40000d6900, {0x22aae08, 0x40012b9630}, {0x22d2bb8, 0x4000320fc8}, 0x4000347dc0, 0x4000cb4488)
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:402 +0x25c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x40000d6900, {0x22aae08, 0x40012b9630}, 0x4000347dc0, {0x22d2bb8, 0x4000320fc8}, 0x4000c92e10, {0x2?, 0x2?, 0x36352c0?}, ...)
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:149 +0xb8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).ScheduleOne(0x40000d6900, {0x22aae08, 0x40012b83c0})
	k8s.io/kubernetes/pkg/scheduler/schedule_one.go:111 +0x4c0
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x2c
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x400161dec8?)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x400161df68, {0x2286fe0, 0x4001398b70}, 0x1, 0x40012bab40)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x4000669f68, 0x0, 0x0, 0x1, 0x40012bab40)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x22aae08, 0x40012b83c0}, 0x4001385720, 0x0, 0x0, 0x1)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
	k8s.io/apimachinery/pkg/util/wait/backoff.go:170
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 352
	k8s.io/kubernetes/pkg/scheduler/scheduler.go:445 +0x104

The crash happens here because the len(nodes) is 0 in certain cases.

What did you expect to happen?

kube-scheduler should not crash, on v1.29.4 it doesn't happen.

On v1.29.4 the kube-scheduler is just printing these error logs, but doesn't crash:

E0517 13:19:26.611504       1 schedule_one.go:1003] "Error scheduling pod; retrying" err="nodeinfo not found for node name \"invalid-node\"" pod="default/break-kube-scheduler"

How can we reproduce it (as minimally and precisely as possible)?

Create a pod like this:

apiVersion: v1
kind: Pod
metadata:
  name: break-kube-scheduler
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - invalid-node # a node that doesn't exist
  containers:
  - name: main
    image: alpine
    command: ["cat"]
    stdin: true

The important part is that the affinity doesn't match a real/valid node.

Once this pod is being processed by kube-scheduler it will crash and continue to do so until the pod is deleted.

Anything else we need to know?

This issue was triggered during the rotation of control-plane nodes. In our setup we update the control plane by scaling from 1 to 2 instances. Once both a ready, the old is being terminated via EC2 API. At this stage the kube-controller-manager sometimes manages to create a replacement daemonset pod once the old node is being deleted. This results in a pod targeting a no longer existing node via affinity as illustrated in the example above. For some reason, when the kube-scheduler is crashing the kube-controller-manager doesn't delete the extra/invalid daemonset pod. Not sure if this is another issue or it has always happened in our setup, but only v1.30 makes kube-scheduler crash which causes an actual issue.

Kubernetes version

$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.1

Cloud provider

We run a custom Kubernetes setup on AWS EC2 (not EKS).

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux ip-10-149-91-92 6.5.0-1020-aws #20~22.04.1-Ubuntu SMP Wed May  1 16:38:06 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Custom tooling: https://github.com/zalando-incubator/kubernetes-on-aws

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@mikkeloscar mikkeloscar added the kind/bug Categorizes issue or PR as related to a bug. label May 17, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 17, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mikkeloscar
Copy link
Contributor Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 17, 2024
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label May 17, 2024
@liggitt
Copy link
Member

liggitt commented May 17, 2024

cc @alculquicondor
/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 17, 2024
@alculquicondor
Copy link
Member

@chengjoey @AxeZhan can you take a look?

I believe all the latest patch releases are affected too, because of #124559

@alculquicondor
Copy link
Member

cc @sanposhiho

@alculquicondor
Copy link
Member

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label May 17, 2024
@AxeZhan
Copy link
Member

AxeZhan commented May 17, 2024

This crash happens when preFilter plugins filtered out all nodes ...

@alculquicondor

sched.nextStartNodeIndex = (sched.nextStartNodeIndex + processedNodes) % len(nodes)

If preFilter filtered out some nodes, here len(nodes) will be a subset of allNodes.
Why are we using this subset at first? I think %allNodes should works fine?(seems logical to me)

@alculquicondor
Copy link
Member

I think so. In general, we just want to try a different set of nodes.

For the case of Daemonsets, it doesn't really matter, as we will just test one node.

@AxeZhan
Copy link
Member

AxeZhan commented May 17, 2024

/assign

Successfully catched this with a unit test.
I'll file a small pr with fix #124930 (comment) and a newly added UT.

@shapirus
Copy link

shapirus commented May 21, 2024

I am having the same issue with 1.28.10 on a clean new cluster created with kops 1.28.5:

I0521 14:15:49.659515      10 leaderelection.go:260] successfully acquired lease kube-system/kube-scheduler
I0521 14:15:49.659839      10 schedule_one.go:80] "About to try and schedule pod" pod="kube-system/aws-cloud-controller-manager-rvbs8"
I0521 14:15:49.659903      10 schedule_one.go:93] "Attempting to schedule pod" pod="kube-system/aws-cloud-controller-manager-rvbs8"
E0521 14:15:49.660114      10 runtime.go:79] Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)
goroutine 426 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19efce0?, 0x33010c0})
        k8s.io/apimachinery@v0.0.0/pkg/util/runtime/runtime.go:75 +0x7c
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        k8s.io/apimachinery@v0.0.0/pkg/util/runtime/runtime.go:49 +0x78
panic({0x19efce0?, 0x33010c0?})
        runtime/panic.go:914 +0x218
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x0?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:491 +0xa3c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x28368?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:383 +0x280
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, 0x1b1ff00?, {0x20f09b0, 0x400019f600}, 0x4000ccee80, {0x2?, 0x4349a76c8302d44?, 0x333cd20?}, ...)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:145 +0xa8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0x400067f2c0, {0x20ccc68?, 0x40004d6be0})
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:107 +0x348
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:259 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x6b68769b60b6b490?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x34c3f10eb615109?, {0x20aa080, 0x4000928120}, 0x1, 0x4000109740)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xe69071c31329cd3a?, 0x0, 0x0, 0x21?, 0xd0734d7fcbfb2b8e?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x20ccc68, 0x40004d6be0}, 0x40008fee80, 0xefb95d5a1b7ffc98?, 0x4840affac97ee6ca?, 0x21?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x20ccc68?, 0x40004d6be0?}, 0x0?, 0x75d6674e3c776941?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:170 +0x2c
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 421
        k8s.io/kubernetes/pkg/scheduler/scheduler.go:406 +0xfc
panic: runtime error: integer divide by zero [recovered]
        panic: runtime error: integer divide by zero

goroutine 426 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        k8s.io/apimachinery@v0.0.0/pkg/util/runtime/runtime.go:56 +0xe0
panic({0x19efce0?, 0x33010c0?})
        runtime/panic.go:914 +0x218
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x0?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:491 +0xa3c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x28368?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:383 +0x280
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, 0x1b1ff00?, {0x20f09b0, 0x400019f600}, 0x4000ccee80, {0x2?, 0x4349a76c8302d44?, 0x333cd20?}, ...)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:145 +0xa8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0x400067f2c0, {0x20ccc68?, 0x40004d6be0})
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:107 +0x348
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:259 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x6b68769b60b6b490?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x34c3f10eb615109?, {0x20aa080, 0x4000928120}, 0x1, 0x4000109740)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xe69071c31329cd3a?, 0x0, 0x0, 0x21?, 0xd0734d7fcbfb2b8e?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x20ccc68, 0x40004d6be0}, 0x40008fee80, 0xefb95d5a1b7ffc98?, 0x4840affac97ee6ca?, 0x21?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x20ccc68?, 0x40004d6be0?}, 0x0?, 0x75d6674e3c776941?)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/backoff.go:170 +0x2c
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 421
        k8s.io/kubernetes/pkg/scheduler/scheduler.go:406 +0xfc

The pod that it's trying to create is pending with the following messages:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  13m (x141 over 38m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
  Warning  FailedScheduling  9m30s                default-scheduler  0/2 nodes are available: 1 Insufficient cpu, 1 node is filtered out by the prefilter result. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..

(just for the sake of being indexed for those who will search for the same error to find this github issue)

@AxeZhan
Copy link
Member

AxeZhan commented May 21, 2024

Fix is on the way: #124933

Just waiting for reviews from approver.

After this pr got merged, I'll do cherry-picks for v1.28(1.27?)~1.30

@AxeZhan
Copy link
Member

AxeZhan commented May 22, 2024

cherry-picks:
#125039
#125041
#125042
#125043

@sara-hann
Copy link

sara-hann commented May 28, 2024

@AxeZhan @xmudrii Thanks for fixing this so fast. How do I pull this new code? I see that your PR was merged, but I don't see a new release to pull? looking for the 1.30

@xmudrii
Copy link
Member

xmudrii commented May 28, 2024

@sara-hann This change will be included in the upcoming patch releases, scheduled for
2024-06-11. You can check out the Patch Release page for more details.

@dimm0
Copy link

dimm0 commented May 28, 2024

I'm seeing the issue after upgrade v1.26.11->v1.27.14
Should I search for a broken pod? Or what else can be causing this?

@alculquicondor
Copy link
Member

Yes, a broken pod could be causing this. You can also try downgrading to v1.27.13

@dimm0
Copy link

dimm0 commented May 28, 2024

Phew, manually changing the scheduler version to 1.27.13 fixed the issue and showed the broken pod in logs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants