Kubelet: persist restart count of a container #6794

yujuhong · 2015-04-14T00:12:04Z

Currently, restart count are generated by examine dead docker containers, which
are subject to background garbage collection. Therefore, the restart count is
capped at 5 and can decrement if GC happens.

This change leverages the container statuses recorded in the pod status as a
reference point. If a container finished after the last observation, restart
count is incremented on top of the last observed count. If container is created
after last observation, but GC'd before the current observation time, kubelet
would not be aware of the existence of such a container, and would not increase
the restart count accordingly. However, the chance of this should be low, given
that pod statuses are reported frequently. Also, the restart cound would still
be increasing monotonically (with the exception of container insepct error).

vmarmol · 2015-04-14T00:47:21Z

pkg/kubelet/dockertools/manager.go

+		if containerDone.Has(dockerContainerName) {
+			continue
+		}
+		// We assume docker returns a list of containers sorted reversedly by


nit: sorted in reverse?

yujuhong · 2015-04-14T17:17:38Z

Uploaded a new patch to address the comments on readability and maintainability.

FYI, the previous patch passed e2e.

erictune · 2015-04-14T17:20:01Z

If a container is failing and restating rapidly, does this result in a lot
of traffic to apiserver, or is there some limit on the frequency of the
status updates?

On Tue, Apr 14, 2015 at 10:18 AM, Yu-Ju Hong notifications@github.com
wrote:

Uploaded a new patch to address the comments on readability and
maintainability.

FYI, the previous patch passed e2e.

—
Reply to this email directly or view it on GitHub
#6794 (comment)
.

vmarmol · 2015-04-14T17:32:12Z

@erictune it does result in a new status update per crash (this is true without this PR). It is rate-limited by the Kubelet's sync loop which only runs every 10s.

erictune · 2015-04-14T17:37:01Z

Okay. SGTM. Presumably that 10s will get even less frequent if the master
starts rate-limiting the kubelets.

On Tue, Apr 14, 2015 at 10:32 AM, Victor Marmol notifications@github.com
wrote:

@erictune https://github.com/erictune it does result in a new status
update per crash (this is true without this PR). It is rate-limited by the
Kubelet's sync loop which only runs every 10s.

—
Reply to this email directly or view it on GitHub
#6794 (comment)
.

yujuhong · 2015-04-14T17:43:12Z

Like @vmarmol mentioned, we update the status on crash with or without this PR. The sync loop frequency could be shorter than 10s if updates to other pods have triggered the new iteration. Overall, the syncing frequency is quite low for now.

@erictune, there is also a QPS limiter for the REST client, which should be our safety net.

yujuhong · 2015-04-14T19:02:18Z

Tested this with a local setup and saw the restart count incremented beyond 5:

{
    "kind": "Pod",
    "apiVersion": "v1beta3",
    "metadata": {
        ...
    },
    "spec": {
        "volumes": null,
        "containers": [
            {
                "name": "taco",
                "image": "ubuntu:14.04",
                ...
            }
        ],
        "restartPolicy": "Always",
        "dnsPolicy": "ClusterFirst",
        "host": "127.0.0.1"
    },
    "status": {
        "phase": "Running",
        "Condition": [
            {
                "type": "Ready",
                "status": "False"
            }
        ],
        "hostIP": "127.0.0.1",
        "podIP": "172.17.2.144",
        "containerStatuses": [
            {
                "name": "taco",
                "state": {
                    "running": {
                        "startedAt": "2015-04-14T18:58:27Z"
                    }
                },
                "lastState": {
                    "termination": {
                        "exitCode": 0,
                        "startedAt": "2015-04-14T18:58:17Z",
                        "finishedAt": "2015-04-14T18:58:17Z",
                        "containerID": "docker://e073bedc4eb2b62709d7c9d116c744ad60b63aebbb995216ba6f527bf1209cbe"
                    }
                },
                "ready": false,
                "restartCount": 14,
                "image": "ubuntu:14.04",
                "imageID": "docker://5ba9dab47459d81c0037ca3836a368a4f8ce5050505ce89720e1fb8839ea048a",
                "containerID": "docker://569dbeb2b0efbbf2b657383f8615fca4fe81fccd5991d7588a83269b219ead4f"
            }
        ]
    }
}

vmarmol · 2015-04-14T20:03:12Z

pkg/kubelet/dockertools/manager.go

@@ -245,6 +254,9 @@ func (self *DockerManager) GetPodStatus(pod *api.Pod) (*api.PodStatus, error) {
 		return nil, err
 	}

+	containerDone := util.NewStringSet()
+	// Loop through the docker container list to construct the container


Maybe add a note that these are all the running and exited containers?

vmarmol · 2015-04-14T20:08:04Z

Otherwise looks good, thanks @yujuhong!

Currently, restart count are generated by examine dead docker containers, which are subject to background garbage collection. Therefore, the restart count is capped at 5 and can decrement if GC happens. This change leverages the container statuses recorded in the pod status as a reference point. If a container finished after the last observation, restart count is incremented on top of the last observed count. If container is created after last observation, but GC'd before the current observation time, kubelet would not be aware of the existence of such a container, and would not increase the restart count accordingly. However, the chance of this should be low, given that pod statuses are reported frequently. Also, the restart cound would still be increasing monotonically (with the exception of container insepct error).

yujuhong · 2015-04-14T22:07:55Z

Addressed the comments.

vmarmol · 2015-04-14T23:01:47Z

LGTM, thanks for the patience @yujuhong!

Kubelet: persist restart count of a container

…rtCount docs wasn't updated as part of kubernetes#6794.

googlebot added the cla: yes label Apr 14, 2015

vmarmol self-assigned this Apr 14, 2015

vmarmol reviewed Apr 14, 2015
View reviewed changes

yujuhong force-pushed the restart_counts branch from ba18b6f to c41f70a Compare April 14, 2015 17:12

vmarmol reviewed Apr 14, 2015
View reviewed changes

yujuhong force-pushed the restart_counts branch from c41f70a to fd34441 Compare April 14, 2015 22:06

vmarmol added a commit that referenced this pull request Apr 14, 2015

Merge pull request #6794 from yujuhong/restart_counts

54406a5

Kubelet: persist restart count of a container

vmarmol merged commit 54406a5 into kubernetes:master Apr 14, 2015

yujuhong deleted the restart_counts branch May 8, 2015 17:36

SergeyKanzhelev mentioned this pull request Jun 25, 2021

Clear restart counter #50375

Open

SergeyKanzhelev added a commit to SergeyKanzhelev/kubernetes that referenced this pull request Jul 19, 2021

TODO comment should have been removed with kubernetes#2912, the resta…

ac09f7a

…rtCount docs wasn't updated as part of kubernetes#6794.

SergeyKanzhelev mentioned this pull request Jul 19, 2021

ContainerStatus old comments fix #103779

Merged

ccampo133 mentioned this pull request Sep 16, 2021

elbv2 multicert ccampo133/kubernetes#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet: persist restart count of a container #6794

Kubelet: persist restart count of a container #6794

yujuhong commented Apr 14, 2015

vmarmol Apr 14, 2015

yujuhong Apr 14, 2015

yujuhong commented Apr 14, 2015

erictune commented Apr 14, 2015

vmarmol commented Apr 14, 2015

erictune commented Apr 14, 2015

yujuhong commented Apr 14, 2015

yujuhong commented Apr 14, 2015

vmarmol Apr 14, 2015

yujuhong Apr 14, 2015

vmarmol commented Apr 14, 2015

yujuhong commented Apr 14, 2015

vmarmol commented Apr 14, 2015

Kubelet: persist restart count of a container #6794

Kubelet: persist restart count of a container #6794

Conversation

yujuhong commented Apr 14, 2015

vmarmol Apr 14, 2015

Choose a reason for hiding this comment

yujuhong Apr 14, 2015

Choose a reason for hiding this comment

yujuhong commented Apr 14, 2015

erictune commented Apr 14, 2015

vmarmol commented Apr 14, 2015

erictune commented Apr 14, 2015

yujuhong commented Apr 14, 2015

yujuhong commented Apr 14, 2015

vmarmol Apr 14, 2015

Choose a reason for hiding this comment

yujuhong Apr 14, 2015

Choose a reason for hiding this comment

vmarmol commented Apr 14, 2015

yujuhong commented Apr 14, 2015

vmarmol commented Apr 14, 2015