Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(race): change heritage label for every pod launch #206

Merged
merged 1 commit into from
Feb 28, 2016

Conversation

smothiki
Copy link
Contributor

Having same labels for every slugbuilder launch makes GC to garbage collect.
This PR fixes terminating behavior for pod. Also we haven't pin pointed the root cause for this behavior.
Listening for event stream wont help in fixing this issue.

@smothiki
Copy link
Contributor Author

fixes #199

@smothiki smothiki added this to the v2.0-beta1 milestone Feb 27, 2016
@smothiki smothiki self-assigned this Feb 27, 2016
@slack
Copy link
Member

slack commented Feb 27, 2016

Pretty hilarious hack. Really wanted to understand why this works. Did some spelunking.

So Pod GC should trigger when the deleted pod threshold exceeds 12500 pods: https://github.com/kubernetes/kubernetes/blob/v1.1.2/cmd/kube-controller-manager/app/controllermanager.go#L126

Probably explains why we don't see this as much on lightly used, or new clusters. The "fast delete" behavior will most likely show up after you get 12k pods in the stack...

There is a pod sync process that sucks in terminated pods on a loop: https://github.com/kubernetes/kubernetes/blob/v1.1.2/pkg/controller/gc/gc_controller.go#L67-L74

It feels like pod listing + sorting is impacted by labels in a non-obvious way. And once the deletion starts happening, the GC process gets the pods in a funny order, most recent pods at the top of the list?

Non-overlapping labels might just be moving our pods out of the firing line:

https://github.com/kubernetes/kubernetes/blob/v1.1.2/pkg/controller/gc/gc_controller.go#L89-L91

Haven't yet teased out the interaction between podStore.List and labels.Everything()...

@slack slack added the LGTM1 label Feb 27, 2016
@slack
Copy link
Member

slack commented Feb 27, 2016

The more I think about this, the more I'm convinced that this probably isn't desired k8s behavior.

@slack
Copy link
Member

slack commented Feb 27, 2016

Original conversation upstream:

@mboersma
Copy link
Member

OMG YOU WEREN'T KIDDING. 💥 I thought that had to be unrelated. Wow.

So could we leave "heritage": "deis" and instead add "gc-hack": name to get the same behavior?

@mboersma
Copy link
Member

could we leave "heritage": "deis"...

This might work, and seems more descriptive to me:

diff --git a/pkg/gitreceive/k8s_util.go b/pkg/gitreceive/k8s_util.go
index bfa863b..1fac569 100644
--- a/pkg/gitreceive/k8s_util.go
+++ b/pkg/gitreceive/k8s_util.go
@@ -96,6 +96,7 @@ func buildPod(debug, withAuth bool, name, namespace string, env map[string]inter
                        Name:      name,
                        Namespace: namespace,
                        Labels: map[string]string{
+                               "gc-hack":  name,
                                "heritage": "deis",
                                "version":  "2.0.0-beta",
                        },

But I'm ok with this PR as-is given the urgency of getting a fix for builder problems.

@mboersma mboersma added the LGTM2 label Feb 28, 2016
smothiki pushed a commit that referenced this pull request Feb 28, 2016
feat(race): change heritage label for every pod launch
@smothiki smothiki merged commit 7330a6d into deis:master Feb 28, 2016
@arschles
Copy link
Member

👍 awesome you guys got to the bottom of this!

@arschles
Copy link
Member

I've proposed a more comprehensive solution for the future, which would obviously be more involved than this. See #207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants