Scheduler ignores nodes that are in a bad state #7668

bprashanth · 2015-05-02T00:33:45Z

The scheduler should ignore unhealthy nodes.

wojtek-t · 2015-05-04T11:12:01Z

/sub

wojtek-t · 2015-05-04T11:13:58Z

pkg/client/cache/listers.go

+		// We currently only use a conditionType of "Ready". If the kubelet doesn't
+		// periodically report the status of a node, the nodecontroller sets its
+		// ConditionStatus to "Unknown". If the kubelet thinks a node is unhealthy
+		// it can (in theory) set it its ConditionStatus to "False".


s/it its/its/

lavalamp · 2015-05-04T17:43:04Z

pkg/client/cache/listers.go

+
+		// We currently only use a conditionType of "Ready". If the kubelet doesn't
+		// periodically report the status of a node, the nodecontroller sets its
+		// ConditionStatus to "Unknown". If the kubelet thinks a node is unhealthy


s/kubelet/node controller/

The nodecontroller sets it to unknown when the kubelet hasn't updated it in so long, the kubelet sets it to something sensible usually.

I see, thanks.

lavalamp · 2015-05-04T17:51:32Z

LGTM sans nits-- can you add a test in test/integration/ somewhere?

bprashanth · 2015-05-04T20:02:35Z

Doing, wanted to check if this was ok

bprashanth · 2015-05-06T02:33:36Z

Sorry for the delay, surfacing failures due to watch timeout in the integration tests was a slight pain but I didn't want to checkin something that flaked without reason on shippable given the weird n/w latencies we've observed in the past. PTAL, running e2e.

bprashanth · 2015-05-06T19:49:25Z

E2e passed

lavalamp · 2015-05-06T20:10:11Z

test/integration/scheduler_test.go

+// Wait till the passFunc confirms that the object it expects to see is in the store.
+// Used to observe reflected events.
+func waitForReflection(s cache.Store, key string, passFunc func(n interface{}) bool) error {
+	return wait.Poll(time.Second, time.Second*20, func() (bool, error) {


An entire second between checks in a test? Is it really that slow?

I can bring it down, copied the polling interval from the scheduler test but it probably isn't. Most cases on localhost it's done on the first poll.

Changed to 10ms

lavalamp · 2015-05-06T20:20:41Z

Small suggestions. Basically fine.

bprashanth · 2015-05-06T21:32:06Z

Addressed comments, PTAL

lavalamp · 2015-05-06T21:34:36Z

LGTM

bprashanth · 2015-05-07T00:29:05Z

Thought it's unlikely i just realize the node controller can hit the pod eviction timeout and delete the pod we're expecting to see scheduled before we see it, because we poll. Hrm. The right way to solve this would be to watch for events. @lavalamp wdyt about doing that in this pr vs a follow up?

Edit: the comment above was about the integration test not the actual code change.

bprashanth · 2015-05-07T00:45:15Z

Filed #7874 for shippable

lavalamp · 2015-05-07T01:17:29Z

You're worried about the test accidentally passing because of that? You could disable the node controller for the test, or switch to events like you say.

bprashanth · 2015-05-07T01:34:53Z

False alarm, I got confused with the other integration suite, this one doesn't start a nodecontroller to begin with. So I guess this is good to go when things pass, I only made the name of the nod/pod more unique to help with debuggin in my latest upload.

ArtfulCoder · 2015-05-07T16:12:27Z

@lavalamp can this be merged ?

Scheduler ignores nodes that are in a bad state

lavalamp · 2015-05-07T17:58:14Z

Yeah I was just waiting for it to turn green.

googlebot added the cla: yes label May 2, 2015

wojtek-t reviewed May 4, 2015
View reviewed changes

lavalamp self-assigned this May 4, 2015

lavalamp reviewed May 4, 2015
View reviewed changes

bprashanth force-pushed the scheduling_minions branch from e06a218 to 76b54ef Compare May 6, 2015 01:13

bprashanth changed the title ~~WIP: Scheduler ignores nodes that are in a bad state~~ Scheduler ignores nodes that are in a bad state May 6, 2015

lavalamp reviewed May 6, 2015
View reviewed changes

bprashanth force-pushed the scheduling_minions branch from 9c224e6 to d93f74d Compare May 6, 2015 21:28

lavalamp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 6, 2015

bprashanth force-pushed the scheduling_minions branch from d93f74d to 51365a4 Compare May 7, 2015 00:19

Scheduler ignored nodes with unknown condition status

4b0607c

bprashanth force-pushed the scheduling_minions branch from 51365a4 to 4b0607c Compare May 7, 2015 01:33

lavalamp added a commit that referenced this pull request May 7, 2015

Merge pull request #7668 from bprashanth/scheduling_minions

6ab51f3

Scheduler ignores nodes that are in a bad state

lavalamp merged commit 6ab51f3 into kubernetes:master May 7, 2015

lavalamp mentioned this pull request May 11, 2015

Pods being scheduled on shut down machines #7222

Closed

bprashanth unassigned lavalamp Aug 12, 2015

bprashanth deleted the scheduling_minions branch October 26, 2015 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler ignores nodes that are in a bad state #7668

Scheduler ignores nodes that are in a bad state #7668

bprashanth commented May 2, 2015

wojtek-t commented May 4, 2015

wojtek-t May 4, 2015

lavalamp May 4, 2015

bprashanth May 6, 2015

lavalamp May 6, 2015

lavalamp commented May 4, 2015

bprashanth commented May 4, 2015

bprashanth commented May 6, 2015

bprashanth commented May 6, 2015

lavalamp May 6, 2015

bprashanth May 6, 2015

bprashanth May 6, 2015

lavalamp commented May 6, 2015

bprashanth commented May 6, 2015

lavalamp commented May 6, 2015

bprashanth commented May 7, 2015

bprashanth commented May 7, 2015

lavalamp commented May 7, 2015

bprashanth commented May 7, 2015

ArtfulCoder commented May 7, 2015

lavalamp commented May 7, 2015

Scheduler ignores nodes that are in a bad state #7668

Scheduler ignores nodes that are in a bad state #7668

Conversation

bprashanth commented May 2, 2015

wojtek-t commented May 4, 2015

wojtek-t May 4, 2015

Choose a reason for hiding this comment

lavalamp May 4, 2015

Choose a reason for hiding this comment

bprashanth May 6, 2015

Choose a reason for hiding this comment

lavalamp May 6, 2015

Choose a reason for hiding this comment

lavalamp commented May 4, 2015

bprashanth commented May 4, 2015

bprashanth commented May 6, 2015

bprashanth commented May 6, 2015

lavalamp May 6, 2015

Choose a reason for hiding this comment

bprashanth May 6, 2015

Choose a reason for hiding this comment

bprashanth May 6, 2015

Choose a reason for hiding this comment

lavalamp commented May 6, 2015

bprashanth commented May 6, 2015

lavalamp commented May 6, 2015

bprashanth commented May 7, 2015

bprashanth commented May 7, 2015

lavalamp commented May 7, 2015

bprashanth commented May 7, 2015

ArtfulCoder commented May 7, 2015

lavalamp commented May 7, 2015