shell
As already mentioned this system is based on labels to know what metrics to get and what rules to apply. This labels must be written in spec.template.metadata.labels within the deployment yaml file.
Overscaler labels ~~~~~~~~
In addition to metrics and rules it is also necessary to add some extra labels for the correct operation of the system.
- app: Stateful Set name.
- overscaler: "true" or “false”, active or deactivate overscaler in this Stateful set.
- current-count: Rescaling counter. During monitoring, this value is reduced until 0, then is possible to rescale.
- autoscaler-count: Value to be assigned in "current-count" after rescaling.
- min-replicas: Maximum number of replicas for this stateful set.
- max-replicas: Minimum number of replicas for this stateful set.
- rescaling: Flag to know when a Stateful Set is being rescaled.
Current-count and autoscaler-count labels play a key role. Each type of service requires a certain time after start to configure and start working in parallel with the other replicas. With these labels we guarantee that time.
Metrics ~~~~~
Overscaler is designed for a customizable monitoring through labels, adding a label for each metric to monitor, and there are different sets of node and pod metrics.
Label format:
metric-n: "metric-name"
Example:
metric-1: "cpu-usage-percent"
However, it is still possible to monitor the entire node or pod using the label "all-metrics: true".
These metrics determine the status of the different nodes and are assigned by labels in the Google Kubernetes Engine.
Node metricsMetric Name Description | |
---|---|
cpu-limit Cpu hard li | mit in millicores. |
cpu-node-capacity Cpu cap | acity of a node. |
cpu-node-allocatable Cpu all | ocatable of a node. |
cpu-node-reservation Share o | f cpu that is reserved on the node allocatable. |
cpu-node-utilization Cpu uti | lization as a share of node allocatable. |
cpu-request Cpu request |
|
cpu-usage Cumulative | cpu usage on all cores. |
cpu-usage-rate Cpu usage o | n all cores in millicores. |
cpu-usage-percent Cpu usa | ge percent of total cpu Node. |
memory-limit Memory hard |
|
memory-major-page-faults Number | of major page faults. |
memory-major-page-faults-rate Num | ber of major page faults per second. |
memory-node-capacity Memory | capacity of a node. |
memory-node-allocatable Memory | allocatable of a node. |
memory-node-reservation Share o | f memory that is reserved on the node allocatable. |
memory-node-utilization Memory | utilization as a share of memory allocatable. |
memory-page-faults Number | of page faults. |
memory-page-faults-rate Number | of page faults per second. |
memory-request Memory requ | est (the guaranteed amount of resources) in bytes. |
memory-usage Total memor | y usage. |
memory-rss RSS memory | usage. |
memory-working-set Total w | orking set usage. Working set is the memory being used and not easily dropped by the kernel. |
memory-usage-percent Memory | usage percent of total memory Node. |
network-rx Cumulative | number of bytes received over the network. |
network-rx-errors Cumulat | ive number of errors while receiving over the network. |
network-rx-errors-rate Number | of errors while receiving over the network per second. |
network-rx-rate Number of b | ytes received over the network per second. |
network-tx Cumulative | number of bytes sent over the network |
network-tx-errors Cumulat | ive number of errors while sending over the network |
network-tx-errors-rate Number | of errors while sending over the network |
network-tx-rate Number of b | ytes sent over the network per second. |
uptime Number of milli | seconds since the container was started. |
These metrics determine the status of any Pods and are assigned by labels in the different Stateful sets.
Pod metricsMetric Name Description | |
---|---|
cpu-limit Cpu hard li | mit in millicores. |
cpu-request Cpu request |
|
cpu-usage-rate Cpu usage o | n all cores in millicores. |
cpu-usage-percent Cpu usa | ge percent of total node cpu. |
memory-limit Memory hard |
|
memory-major-page-faults-rate Num | ber of major page faults per second. |
memory-page-faults-rate Number | of page faults per second. |
memory-request Memory requ | est (the guaranteed amount of resources) in bytes. |
memory-usage Total memor | y usage. |
memory-rss RSS memory | usage. |
memory-working-set Total w | orking set usage. Working set is the memory being used and not easily dropped by the kernel. |
memory-usage-percent Memory | usage percent of total node memory. |
network-rx Cumulative | number of bytes received over the network. |
network-rx-errors Cumulat | ive number of errors while receiving over the network. |
network-rx-errors-rate Number | of errors while receiving over the network per second. |
network-rx-rate Number of b | ytes received over the network per second. |
network-tx Cumulative | number of bytes sent over the network |
network-tx-errors Cumulat | ive number of errors while sending over the network |
network-tx-errors-rate Number | of errors while sending over the network |
network-tx-rate Number of b | ytes sent over the network per second. |
uptime Number of milli | seconds since the container was started. |
The rules for scaling are also assigned by labels and must have a specific syntax:
Label format:
rule-n: “metric_greaterreduce”
- metric: Previously established metrics.
- greater or lower: “>” or “<” that limit.
- limit: Number that establishes a limit
- scale or reduce: Action to be realized when the limit is exceeded.
Example:
rule-1: "cpu-usage-percent_greater_90_scale"
rule-2: "memory-usage-percent_greater_90_scale"
rule-3: "cpu-usage-percent_lower_10_reduce"
rule-4: "memory-usage-percent_lower_10_reduce"