-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(race): change heritage label for every pod launch #206
Conversation
fixes #199 |
Pretty hilarious hack. Really wanted to understand why this works. Did some spelunking. So Pod GC should trigger when the deleted pod threshold exceeds 12500 pods: https://github.com/kubernetes/kubernetes/blob/v1.1.2/cmd/kube-controller-manager/app/controllermanager.go#L126 Probably explains why we don't see this as much on lightly used, or new clusters. The "fast delete" behavior will most likely show up after you get 12k pods in the stack... There is a pod sync process that sucks in terminated pods on a loop: https://github.com/kubernetes/kubernetes/blob/v1.1.2/pkg/controller/gc/gc_controller.go#L67-L74 It feels like pod listing + sorting is impacted by labels in a non-obvious way. And once the deletion starts happening, the GC process gets the pods in a funny order, most recent pods at the top of the list? Non-overlapping labels might just be moving our pods out of the firing line: https://github.com/kubernetes/kubernetes/blob/v1.1.2/pkg/controller/gc/gc_controller.go#L89-L91 Haven't yet teased out the interaction between |
The more I think about this, the more I'm convinced that this probably isn't desired k8s behavior. |
Original conversation upstream:
|
OMG YOU WEREN'T KIDDING. 💥 I thought that had to be unrelated. Wow. So could we leave |
This might work, and seems more descriptive to me: diff --git a/pkg/gitreceive/k8s_util.go b/pkg/gitreceive/k8s_util.go
index bfa863b..1fac569 100644
--- a/pkg/gitreceive/k8s_util.go
+++ b/pkg/gitreceive/k8s_util.go
@@ -96,6 +96,7 @@ func buildPod(debug, withAuth bool, name, namespace string, env map[string]inter
Name: name,
Namespace: namespace,
Labels: map[string]string{
+ "gc-hack": name,
"heritage": "deis",
"version": "2.0.0-beta",
}, But I'm ok with this PR as-is given the urgency of getting a fix for builder problems. |
feat(race): change heritage label for every pod launch
👍 awesome you guys got to the bottom of this! |
I've proposed a more comprehensive solution for the future, which would obviously be more involved than this. See #207 |
Having same labels for every slugbuilder launch makes GC to garbage collect.
This PR fixes terminating behavior for pod. Also we haven't pin pointed the root cause for this behavior.
Listening for event stream wont help in fixing this issue.