This repository has been archived by the owner on May 3, 2022. It is now read-only.
Diagnosing rollout progress: fleet summary in the Capacity Target object #21
Labels
enhancement
New feature or request
(ported from the internal repo design docs)
This document describes part of our plan for helping users diagnose how their rollout is going. This has two components: the low-level high detail view in the status of the
CapacityTarget
object; the high-level low-detail view in the status of theRelease
object. This document discusses the low-level high-detailCapacityTarget
view: we think it'll be easier to start with a domain where we don't need to invent summarization/prioritization schemes.Reporting Progress
Previously, we introduced the concept of sad pods, which allowed the
user to see the pods that were not ready. There were a few problems
with this approach:
into the capacity target, for every single pod that was not working
meant that different problems wouldn't be necessarily surfaced.
it'd be hard to see if the release was progressing, or for tooling
to show the status of the whole release across multiple clusters.
So, we decided to summarize the status of all the pods per cluster.
Criteria For Summarizing
Owner
The first level of the summary is the owner of the pod.
Multiple Kubernetes objects can lead to one or more pods being
created. DaemonSets, Deployments, Jobs, ReplicaSets, and StatefulSets
can create new pods, which means that later down the hierarchy, we
might have container names that clash. To prevent that, we are using
the owner of a pod to categorize the summary report at the top level.
Pod Condition: Type, Reason and Status
Under each owner, there is a pod status breakdown. This breakdown is
grouped by the following fields, in order:
Ready
)ContainersNotReady
)True
,False
, orUnknown
)Apart from categorizing pods by their conditions, we also sort the
results with the same criteria to keep the ordering consistent across
multiple updates.
To aid humans in deciding which problem to look into, we also maintain
a
count
for the number of pods with this type, reason and status.Container Name
In the
containers
field of each type + reason + statuscombination, there is another grouping happening, and that is grouped
by container name.
This means that we have a report per container name. And that report
follows a pretty similar structure as the report for pods.
Container State: Type and Reason
Just as the pod status breakdown works with conditions, container
state breakdown works with container states. The only difference is
that unlike pods, container states are not as transparent, and we need
to infer type and reason through a logic of our own..
Type
Each container state has three nullable fields. They are called
Waiting, Running, and Terminated.
We use these to derive the container state Type. The type of a
container state is whichever field that is not null.
Reason
Containers keep two states, not one. The state called State is their
current state, and the one called LastTerminationState contains the
last state that happened. In other words, this is the state of the
container before it got restarted.
Reason is tricky mostly because it is not always informative. Based on
our experience so far, what users usually want to see is the Reason
of the current state, if the current state is Waiting.
Here are the steps we go through to come up with the Reason for a
container state:
use the reason.
Constructing Examples
In each pod status breakdown, we have an example that contains a pod
name and a message. At best, this message helps the user know what is
wrong without having to switch to the target cluster. At worst, the
user can use the pod name to look through logs or events after
switching to the application cluster.
This example is picked from the list of pods which fall into that
breakdown. However, to keep this example pod consistent, pods are
sorted, and then the first pod is picked as the example.
The example contains only two fields, the pod name and a message.
Pod Name
This is copied, verbatim, from the name of the example pod.
Message
We are trying to show some useful information to the user through the
Message of the example. Here is where we get the message from:
LastTerminationState.Terminated.Message
is set, meaning thatthe user has written to the termination message path, we choose
it.
proposal is to go with a string like
Terminated with exit code <exitcode>
orTerminated with signal <signal>
if there is asignal instead of an exit code.
Example
To bring it all together, here is an example of what a capacity target would
look like with a replica set maintaining 20 pods with 2 containers (
app
andenvoy
):Caveats
Memory impact of pod informer for each cluster
This scheme is predicated on maintaining a pod informer for each cluster. For very large clusters with hundreds of thousands of pods, this may add up to a significant memory impact. Taking an extreme case, consider a management cluster orchestrating 10 Kubernetes clusters each with 5000 nodes and 100 pods per node: this represents about 50gb of heap if we think each pod is about ~10kb in memory.
Update rate for informers subscribing to a very large number of pod changes
We're not sure how client-go will handle a very high churn subscription on big clusters.
CPU impact of doing crunchy summarization work
The summarization scheme we're proposing, implemented involves a lot of aggregation over the set of pods and their containers. This might end up being a lot of CPU load for multiple very large clusters.
API call throttling updating capacity target objects for a high-churn pod fleet
We're likely to run into the client-go throttling limits when attempting to keep a CapacityTarget object up-to-date with a very large pod fleet. In this case, it should be safe to drop updates and re-process at the next resync period, or retry after a certain amount of time. None of the state depends on catching each update.
The text was updated successfully, but these errors were encountered: