shell

Labels

As already mentioned this system is based on labels to know what metrics to get and what rules to apply. This labels must be written in spec.template.metadata.labels within the deployment yaml file.

Overscaler labels ~~~~~~~~

In addition to metrics and rules it is also necessary to add some extra labels for the correct operation of the system.

app: Stateful Set name.

overscaler: "true" or “false”, active or deactivate overscaler in this Stateful set.

current-count: Rescaling counter. During monitoring, this value is reduced until 0, then is possible to rescale.

autoscaler-count: Value to be assigned in "current-count" after rescaling.

min-replicas: Maximum number of replicas for this stateful set.

max-replicas: Minimum number of replicas for this stateful set.

rescaling: Flag to know when a Stateful Set is being rescaled.

Current-count and autoscaler-count labels play a key role. Each type of service requires a certain time after start to configure and start working in parallel with the other replicas. With these labels we guarantee that time.

Metrics ~~~~~

Overscaler is designed for a customizable monitoring through labels, adding a label for each metric to monitor, and there are different sets of node and pod metrics.

Label format:

metric-n: "metric-name"

Example:

metric-1: "cpu-usage-percent"

However, it is still possible to monitor the entire node or pod using the label "all-metrics: true".

Node metrics

These metrics determine the status of the different nodes and are assigned by labels in the Google Kubernetes Engine.

Node metrics

Metric Name Description
cpu-limit Cpu hard li	mit in millicores.
cpu-node-capacity Cpu cap	acity of a node.
cpu-node-allocatable Cpu all	ocatable of a node.
cpu-node-reservation Share o	f cpu that is reserved on the node allocatable.
cpu-node-utilization Cpu uti	lization as a share of node allocatable.
cpu-request Cpu request	(the guaranteed amount of resources) in millicores.
cpu-usage Cumulative	cpu usage on all cores.
cpu-usage-rate Cpu usage o	n all cores in millicores.
cpu-usage-percent Cpu usa	ge percent of total cpu Node.
memory-limit Memory hard	limit in bytes.
memory-major-page-faults Number	of major page faults.
memory-major-page-faults-rate Num	ber of major page faults per second.
memory-node-capacity Memory	capacity of a node.
memory-node-allocatable Memory	allocatable of a node.
memory-node-reservation Share o	f memory that is reserved on the node allocatable.
memory-node-utilization Memory	utilization as a share of memory allocatable.
memory-page-faults Number	of page faults.
memory-page-faults-rate Number	of page faults per second.
memory-request Memory requ	est (the guaranteed amount of resources) in bytes.
memory-usage Total memor	y usage.
memory-rss RSS memory	usage.
memory-working-set Total w	orking set usage. Working set is the memory being used and not easily dropped by the kernel.
memory-usage-percent Memory	usage percent of total memory Node.
network-rx Cumulative	number of bytes received over the network.
network-rx-errors Cumulat	ive number of errors while receiving over the network.
network-rx-errors-rate Number	of errors while receiving over the network per second.
network-rx-rate Number of b	ytes received over the network per second.
network-tx Cumulative	number of bytes sent over the network
network-tx-errors Cumulat	ive number of errors while sending over the network
network-tx-errors-rate Number	of errors while sending over the network
network-tx-rate Number of b	ytes sent over the network per second.
uptime Number of milli	seconds since the container was started.

Pod metrics

These metrics determine the status of any Pods and are assigned by labels in the different Stateful sets.

Pod metrics

Metric Name Description
cpu-limit Cpu hard li	mit in millicores.
cpu-request Cpu request	(the guaranteed amount of resources) in millicores.
cpu-usage-rate Cpu usage o	n all cores in millicores.
cpu-usage-percent Cpu usa	ge percent of total node cpu.
memory-limit Memory hard	limit in bytes.
memory-major-page-faults-rate Num	ber of major page faults per second.
memory-page-faults-rate Number	of page faults per second.
memory-request Memory requ	est (the guaranteed amount of resources) in bytes.
memory-usage Total memor	y usage.
memory-rss RSS memory	usage.
memory-working-set Total w	orking set usage. Working set is the memory being used and not easily dropped by the kernel.
memory-usage-percent Memory	usage percent of total node memory.
network-rx Cumulative	number of bytes received over the network.
network-rx-errors Cumulat	ive number of errors while receiving over the network.
network-rx-errors-rate Number	of errors while receiving over the network per second.
network-rx-rate Number of b	ytes received over the network per second.
network-tx Cumulative	number of bytes sent over the network
network-tx-errors Cumulat	ive number of errors while sending over the network
network-tx-errors-rate Number	of errors while sending over the network
network-tx-rate Number of b	ytes sent over the network per second.
uptime Number of milli	seconds since the container was started.

Rules

The rules for scaling are also assigned by labels and must have a specific syntax:

Label format:

rule-n: “metric_greaterreduce”

metric: Previously established metrics.

greater or lower: “>” or “<” that limit.

limit: Number that establishes a limit

scale or reduce: Action to be realized when the limit is exceeded.

Example:

rule-1: "cpu-usage-percent_greater_90_scale"
rule-2: "memory-usage-percent_greater_90_scale"
rule-3: "cpu-usage-percent_lower_10_reduce"
rule-4: "memory-usage-percent_lower_10_reduce"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics.rst

metrics.rst

Labels

Node metrics

Pod metrics

Rules

Files

metrics.rst

Latest commit

History

metrics.rst

File metadata and controls

Labels

Node metrics

Pod metrics

Rules