Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet: persist restart count of a container #6794

Merged
merged 1 commit into from Apr 14, 2015

Conversation

yujuhong
Copy link
Contributor

Currently, restart count are generated by examine dead docker containers, which
are subject to background garbage collection. Therefore, the restart count is
capped at 5 and can decrement if GC happens.

This change leverages the container statuses recorded in the pod status as a
reference point. If a container finished after the last observation, restart
count is incremented on top of the last observed count. If container is created
after last observation, but GC'd before the current observation time, kubelet
would not be aware of the existence of such a container, and would not increase
the restart count accordingly. However, the chance of this should be low, given
that pod statuses are reported frequently. Also, the restart cound would still
be increasing monotonically (with the exception of container insepct error).

@vmarmol vmarmol self-assigned this Apr 14, 2015
if containerDone.Has(dockerContainerName) {
continue
}
// We assume docker returns a list of containers sorted reversedly by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sorted in reverse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yujuhong
Copy link
Contributor Author

Uploaded a new patch to address the comments on readability and maintainability.

FYI, the previous patch passed e2e.

@erictune
Copy link
Member

If a container is failing and restating rapidly, does this result in a lot
of traffic to apiserver, or is there some limit on the frequency of the
status updates?

On Tue, Apr 14, 2015 at 10:18 AM, Yu-Ju Hong notifications@github.com
wrote:

Uploaded a new patch to address the comments on readability and
maintainability.

FYI, the previous patch passed e2e.


Reply to this email directly or view it on GitHub
#6794 (comment)
.

@vmarmol
Copy link
Contributor

vmarmol commented Apr 14, 2015

@erictune it does result in a new status update per crash (this is true without this PR). It is rate-limited by the Kubelet's sync loop which only runs every 10s.

@erictune
Copy link
Member

Okay. SGTM. Presumably that 10s will get even less frequent if the master
starts rate-limiting the kubelets.

On Tue, Apr 14, 2015 at 10:32 AM, Victor Marmol notifications@github.com
wrote:

@erictune https://github.com/erictune it does result in a new status
update per crash (this is true without this PR). It is rate-limited by the
Kubelet's sync loop which only runs every 10s.


Reply to this email directly or view it on GitHub
#6794 (comment)
.

@yujuhong
Copy link
Contributor Author

Like @vmarmol mentioned, we update the status on crash with or without this PR. The sync loop frequency could be shorter than 10s if updates to other pods have triggered the new iteration. Overall, the syncing frequency is quite low for now.

@erictune, there is also a QPS limiter for the REST client, which should be our safety net.

@yujuhong
Copy link
Contributor Author

Tested this with a local setup and saw the restart count incremented beyond 5:

{
    "kind": "Pod",
    "apiVersion": "v1beta3",
    "metadata": {
        ...
    },
    "spec": {
        "volumes": null,
        "containers": [
            {
                "name": "taco",
                "image": "ubuntu:14.04",
                ...
            }
        ],
        "restartPolicy": "Always",
        "dnsPolicy": "ClusterFirst",
        "host": "127.0.0.1"
    },
    "status": {
        "phase": "Running",
        "Condition": [
            {
                "type": "Ready",
                "status": "False"
            }
        ],
        "hostIP": "127.0.0.1",
        "podIP": "172.17.2.144",
        "containerStatuses": [
            {
                "name": "taco",
                "state": {
                    "running": {
                        "startedAt": "2015-04-14T18:58:27Z"
                    }
                },
                "lastState": {
                    "termination": {
                        "exitCode": 0,
                        "startedAt": "2015-04-14T18:58:17Z",
                        "finishedAt": "2015-04-14T18:58:17Z",
                        "containerID": "docker://e073bedc4eb2b62709d7c9d116c744ad60b63aebbb995216ba6f527bf1209cbe"
                    }
                },
                "ready": false,
                "restartCount": 14,
                "image": "ubuntu:14.04",
                "imageID": "docker://5ba9dab47459d81c0037ca3836a368a4f8ce5050505ce89720e1fb8839ea048a",
                "containerID": "docker://569dbeb2b0efbbf2b657383f8615fca4fe81fccd5991d7588a83269b219ead4f"
            }
        ]
    }
}

@@ -245,6 +254,9 @@ func (self *DockerManager) GetPodStatus(pod *api.Pod) (*api.PodStatus, error) {
return nil, err
}

containerDone := util.NewStringSet()
// Loop through the docker container list to construct the container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note that these are all the running and exited containers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@vmarmol
Copy link
Contributor

vmarmol commented Apr 14, 2015

Otherwise looks good, thanks @yujuhong!

Currently, restart count are generated by examine dead docker containers, which
are subject to background garbage collection. Therefore, the restart count is
capped at 5 and can decrement if GC happens.

This change leverages the container statuses recorded in the pod status as a
reference point. If a container finished after the last observation, restart
count is incremented on top of the last observed count. If container is created
after last observation, but GC'd before the current observation time, kubelet
would not be aware of the existence of such a container, and would not increase
the restart count accordingly. However, the chance of this should be low, given
that pod statuses are reported frequently. Also, the restart cound would still
be increasing monotonically (with the exception of container insepct error).
@yujuhong
Copy link
Contributor Author

Addressed the comments.

@vmarmol
Copy link
Contributor

vmarmol commented Apr 14, 2015

LGTM, thanks for the patience @yujuhong!

vmarmol added a commit that referenced this pull request Apr 14, 2015
Kubelet: persist restart count of a container
@vmarmol vmarmol merged commit 54406a5 into kubernetes:master Apr 14, 2015
@yujuhong yujuhong deleted the restart_counts branch May 8, 2015 17:36
SergeyKanzhelev added a commit to SergeyKanzhelev/kubernetes that referenced this pull request Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants