Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: Introduce PodInfraContainerChanged(). #6608

Merged
merged 2 commits into from Apr 14, 2015

Conversation

yifan-gu
Copy link
Contributor

@yifan-gu yifan-gu commented Apr 9, 2015

@vmarmol @yujuhong
This new function computes in ahead whether we need to restart the pod
infra container.

Splitted from #6169 to enable smoother review.

Tests are broken, fixing...

/cc @jonboulle

DockerPrefix = "docker://"
PodInfraContainerName = leaky.PodInfraContainerName
DockerPrefix = "docker://"
PodInfraContainerImage = "gcr.io/google_containers/pause:0.8.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a flag for this, we should use it instead of this (the old version was unused I believe).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmarmol I have checked about this. The flag is stored in kubelet, I think we should move it to the DockManager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, the PR was merged :)

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Apr 9, 2015

@vmarmol Hmm, I found by comparing the container directly, it's making the fake docker much complex since I need to construct a valid docker.Container in StartContainer and CreateContainer. What's the arguments for not using hash actually?

@vmarmol
Copy link
Contributor

vmarmol commented Apr 9, 2015

@yifan-gu I'm fine going the has route since we use that today, but it just seems out of place when we do equals other places today.

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Apr 9, 2015

@vmarmol Could you point some places where we do equals please? I didn't find one, maybe it's differently and better than me.

@vmarmol
Copy link
Contributor

vmarmol commented Apr 9, 2015

@yifan-gu we do it for PodStatus and for MirrorPod

@yifan-gu yifan-gu force-pushed the infra_changed branch 3 times, most recently from da74906 to e1299f6 Compare April 10, 2015 00:22
@yifan-gu
Copy link
Contributor Author

@vmarmol I see. Ideally this can be done in the future when we can reconstruct api.Container from the container runtime.

BTW, I fixed all the tests, to make it easier for you to review, I put those in the second commits :) I am running e2e now to see if this change works right.

"list", // Get pod status.
"create", "start", "inspect_container", // Create pod infra container.
"create", "start", // Create container.
"list", "inspect_container", "inspect_container"}) // Get pod status.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is classy newline use; make the comments justified at the same position?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and everywhere else) @yifan-gu

@pmorie
Copy link
Member

pmorie commented Apr 10, 2015

Is PodInfraContainerChanged going to become part of the Container.Runtime interface eventually?

@yifan-gu
Copy link
Contributor Author

@pmorie I don't think so, it's docker specific IMO, rkt doesn't need a pod infra container.

@yifan-gu
Copy link
Contributor Author

Oops, got many e2e failures.
Debugging...

Summarizing 14 Failures:

[Fail] Services [It] should correctly serve identically named services in different namespaces on different external IP addresses 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:373

[Fail] kubectl guestbook [It] should create and stop a working application 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:304

[Fail] Density [It] should allow starting 3 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:126

[Fail] Networking [It] should function for intra-pod communication 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/networking.go:273

[Fail] kubectl update-demo [It] should scale a replication controller 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:304

[Fail] emptyDir [It] volume on tmpfs should have the correct mode 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:325

[Fail] Pods [It] should be submitted and removed 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:129

[Fail] ReplicationController [It] should serve a basic image on each replica with a private image 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/rc.go:126

[Fail] Shell [It] tests that services.sh passes 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/shell.go:69

[Fail] Services [It] should be able to create a functioning external load balancer 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:282

[Fail] ReplicationController [It] should serve a basic image on each replica with a public image 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/rc.go:95

[Fail] kubectl update-demo [It] should do a rolling update of a replication controller 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:304

[Fail] Pods [It] should be restarted with a /healthz http liveness probe 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:68

[Fail] PD [It] should schedule a pod w/ a RW PD, remove it, then schedule it on another host 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pd.go:86

Ran 33 of 39 Specs in 1112.625 seconds
FAIL! -- 19 Passed | 14 Failed | 1 Pending | 5 Skipped F0410 10:38:47.441049   28149 driver.go:112] At least one test failed
2015/04/10 10:38:47 Error running Ginkgo tests: exit status 255
exit status 1

@yifan-gu
Copy link
Contributor Author

Got a log of connection refused error, such as

INFO: After 10.313296623s failed to make proxy call to elasticsearch-logging: Get https://130.211.156.16/api/v1beta3/proxy/namespaces/default/services/elasticsearch-logging/_search?q=log%3Aes_logging_5752synthlogger&size=400: dial tcp 130.211.156.16:443: connection refused

Investigating

@yifan-gu
Copy link
Contributor Author

The containers on master are keeping restarting... Sounds like a bug.

@yifan-gu yifan-gu force-pushed the infra_changed branch 2 times, most recently from 0b20ff7 to 6063f3a Compare April 11, 2015 01:41
@yifan-gu
Copy link
Contributor Author

Got 2 failures now:

Summarizing 2 Failures:

[Fail] Services [It] should be able to create a functioning external load balancer
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:282

[Fail] kubectl guestbook [It] should create and stop a working application
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:304

Ran 33 of 39 Specs in 909.800 seconds
FAIL! -- 31 Passed | 2 Failed | 1 Pending | 5 Skipped

@yifan-gu yifan-gu force-pushed the infra_changed branch 2 times, most recently from ddcd5da to 7563bbe Compare April 12, 2015 05:57
@yifan-gu
Copy link
Contributor Author

Fixed, it's because the target pools are not cleared.

Ran 33 of 39 Specs in 1216.063 seconds
SUCCESS! -- 33 Passed | 0 Failed | 1 Pending | 5 Skipped Shutting down test cluster in background

@vmarmol

@yifan-gu yifan-gu force-pushed the infra_changed branch 3 times, most recently from ba7258c to aca5d8b Compare April 13, 2015 17:03
@yifan-gu
Copy link
Contributor Author

@vmarmol I think this is ready for another review?

@vmarmol
Copy link
Contributor

vmarmol commented Apr 13, 2015

Awesome, I'll take a look today @yifan-gu

@yifan-gu
Copy link
Contributor Author

@vmarmol Thank you! FYI the 1st commit is the change made to source, 2nd is updating tests. I am running e2e again :)

@yifan-gu
Copy link
Contributor Author

Have run for twice again, one passed all tests, one got a failure which I think is a known issue #6424

Summarizing 1 Failure:

[Fail] ReplicationController [It] should serve a basic image on each replica with a public image
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/rc.go:151

Ran 33 of 39 Specs in 1219.849 seconds
FAIL! -- 32 Passed | 1 Failed | 1 Pending | 5 Skipped F0413 10:42:57.441241    4053 driver.go:115] At least one test failed

@vmarmol
Copy link
Contributor

vmarmol commented Apr 13, 2015

Outside of that one change (and the needed rebase), this looks good :)

Yifan Gu added 2 commits April 13, 2015 16:18
This functions computes in ahead whether we need to restart the pod
infra container.
@yifan-gu
Copy link
Contributor Author

@vmarmol Rebased and ran e2e again. Looks good:

Ran 35 of 39 Specs in 1163.674 seconds
SUCCESS! -- 35 Passed | 0 Failed | 0 Pending | 4 Skipped Shutting down test cluster in background.
Bringing down cluster using provider: gce

@vmarmol
Copy link
Contributor

vmarmol commented Apr 14, 2015

LGTM, thanks for the diligence @yifan-gu! Will merge once the CIs go green.

@yifan-gu
Copy link
Contributor Author

@vmarmol You're very welcome! I think cleaning up the sync logic might have potential to break the e2e, so I am trying to be more careful. I will keep an eye on that if any stuff fails in the future.

@pmorie
Copy link
Member

pmorie commented Apr 14, 2015

Clean GCE e2e run? It's a miracle.

On Mon, Apr 13, 2015 at 9:22 PM, Yifan Gu notifications@github.com wrote:

@vmarmol https://github.com/vmarmol You're very welcome! I think
cleaning up the sync logic might have potential to break the e2e, so I am
trying to be more careful. I will keep an eye on that if any stuff fails in
the future.


Reply to this email directly or view it on GitHub
#6608 (comment)
.

vmarmol added a commit that referenced this pull request Apr 14, 2015
kubelet: Introduce PodInfraContainerChanged().
@vmarmol vmarmol merged commit f59a9ca into kubernetes:master Apr 14, 2015
@yifan-gu yifan-gu deleted the infra_changed branch May 7, 2015 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants