Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/en/solutions/Backend_Performance_Requirements_for_etcd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
kind:
- Troubleshooting
products:
- Alauda Container Platform
ProductsVersion:
- 4.1.0,4.2.x
---
## Issue

etcd performance degrades due to insufficient storage or network backend capabilities, producing log messages similar to the following:

```
etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for xxx ms)
etcdserver: server is likely overloaded
etcdserver: read-only range request "key:\"xxxx\"" count_only:true with result "xxxx" took too long (xxx s) to execute
wal: sync duration of xxxx s, expected less than 1s
```
Comment on lines +13 to +18

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced log block.

The code fence is unlabeled; this triggers MD040 and reduces readability in rendered docs.

Suggested fix
-```
+```text
 etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for xxx ms)
 etcdserver: server is likely overloaded
 etcdserver: read-only range request "key:\"xxxx\"" count_only:true with result "xxxx" took too long (xxx s) to execute
 wal: sync duration of xxxx s, expected less than 1s
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 13-13: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/Backend_Performance_Requirements_for_etcd.md` around lines
13 - 18, The fenced log block in
docs/en/solutions/Backend_Performance_Requirements_for_etcd.md is unlabeled
causing MD040; update the code fence that contains the etcdserver/wal log lines
(the block starting with "etcdserver: failed to send out heartbeat..." and
ending with "wal: sync duration of xxxx s, expected less than 1s") to include a
language identifier, e.g., change ``` to ```text so the block becomes a labeled
plain-text code fence.


These warnings indicate the storage subsystem or network cannot keep up with etcd's latency requirements.

## Root Cause

etcd is highly sensitive to storage and network performance. Any bottleneck in the backend infrastructure — slow disk I/O, high network latency, packet drops, or CPU saturation — directly impacts the ability of the etcd cluster to process writes and maintain leader-heartbeat deadlines. A request should normally complete in under 50 ms; durations exceeding 200 ms trigger warnings in the logs.

## Resolution

### Identify the Bottleneck

Three common causes of etcd slowness:

1. **Slow storage** — Disk I/O latency exceeds acceptable thresholds
2. **CPU overload** — Control-plane nodes are overcommitted
3. **Database size growth** — The etcd data file has grown beyond optimal size

### Check Storage Performance with fio

Run an I/O benchmark on each control-plane node to validate disk performance:

```bash
fio --name=etcd-io-test --ioengine=sync --bs=4k --numjobs=1 --size=512M \
--rw=write --iodepth=1 --fsync=1 --runtime=30 --time_based
```

The 99th percentile fdatasync latency must be under **10 ms**.

### Monitor Key etcd Metrics

Use Prometheus to track the following metrics:

| Metric | Threshold | Meaning |
|---|---|---|
| `etcd_disk_wal_fsync_duration_seconds_bucket` (p99) | < 10 ms | WAL write latency |
| `etcd_disk_backend_commit_duration_seconds_bucket` (p99) | < 25 ms | Backend commit latency |
| `etcd_network_peer_round_trip_time_seconds_bucket` (p99) | < 50 ms | Peer-to-peer network RTT |
| `etcd_mvcc_db_total_size_in_bytes` | < 2 GB (default quota) | Database size |

### Network Health

High network latency or packet drops between etcd members destabilize the cluster. Monitor network RTT and investigate any persistent packet loss on the control-plane network interface.

### Database Defragmentation

If the database size approaches the quota, perform manual defragmentation:

```bash
kubectl exec -n kube-system etcd-<node-name> -- etcdctl defrag \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Comment on lines +67 to +71

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Kubernetes static-pod etcd (kubeadm style), which certificate/key pair should etcdctl use for client-auth operations like etcdctl endpoint healthandetcdctl defrag? Is using /etc/kubernetes/pki/etcd/server.crtandserver.keycorrect, or should a client cert such ashealthcheck-client.crt/key be used?

💡 Result:

For Kubernetes static-pod etcd in kubeadm style, etcdctl should use the /etc/kubernetes/pki/etcd/peer.crt and /etc/kubernetes/pki/etcd/peer.key certificate/key pair (along with /etc/kubernetes/pki/etcd/ca.crt) for client-auth operations like etcdctl endpoint health and etcdctl defrag. Using /etc/kubernetes/pki/etcd/server.crt and server.key is incorrect for etcdctl client operations, as these are server-side certificates for etcd listening on client port 2379. The peer certificates are used for client authentication when one etcd member communicates with others over the client port. The healthcheck-client.crt/key is specifically for Kubernetes liveness probes and not intended for general etcdctl use. Example command: ETCDCTL_API=3 etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/peer.crt \ --key=/etc/kubernetes/pki/etcd/peer.key \ endpoint health This applies to both single-node local etcd and stacked HA etcd static pods managed by kubeadm.

Citations:


Use peer certificates for etcdctl defrag, not the etcd server certificate.

On kubeadm-based clusters, server.crt/key are server-side certificates for etcd listening on the client port and cannot be used for client authentication. Replace this example with the peer certificate pair: peer.crt and peer.key from /etc/kubernetes/pki/etcd/.

Corrected example:
kubectl exec -n kube-system etcd-<node-name> -- etcdctl defrag \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/Backend_Performance_Requirements_for_etcd.md` around lines
67 - 71, The example for running etcdctl defrag is using the server certificate
pair (/etc/kubernetes/pki/etcd/server.crt and server.key) which are server-side
certificates and cannot be used for client authentication; update the command
that calls "etcdctl defrag" to use the peer certificate pair
(/etc/kubernetes/pki/etcd/peer.crt and /etc/kubernetes/pki/etcd/peer.key) and
keep the same CA file (/etc/kubernetes/pki/etcd/ca.crt) and endpoints, so the
defrag invocation authenticates as a peer client rather than the server
certificate.

```

## Diagnostic Steps

Check etcd logs for latency warnings:

```bash
kubectl logs -n kube-system etcd-<node-name> --tail=100 | grep -E "took too long|heartbeat|overloaded"
```

Query etcd metrics directly via the Prometheus endpoint:

```bash
kubectl exec -n kube-system etcd-<node-name> -- wget -qO- http://127.0.0.1:2381/metrics 2>/dev/null \
| grep -E "etcd_disk_wal_fsync|etcd_disk_backend_commit|etcd_mvcc_db_total_size"
```
111 changes: 111 additions & 0 deletions docs/en/solutions/Configure_Kubelet_Log_Level_Verbosity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
kind:
- Troubleshooting
products:
- Alauda Container Platform
ProductsVersion:
- 4.1.0,4.2.x
---
## Issue

When troubleshooting node-level problems, increasing the kubelet log verbosity helps identify the root cause. The default log level (`2`) may not provide enough detail for complex issues such as pod scheduling failures, volume mount errors, or container runtime communication problems.

## Root Cause

The kubelet supports configurable log verbosity levels ranging from `0` (least verbose) to `10` (most verbose). The default level is `2`, which provides basic operational information. Higher levels expose progressively more diagnostic data, but consume additional CPU, disk I/O, and memory on the node.

## Resolution

### Log Level Reference

| Level Range | Purpose |
|---|---|
| 0 | Critical errors only |
| 1–2 | Default operational output |
| 3–4 | Debug-level information, suitable for most troubleshooting |
| 5–8 | Trace-level output, verbose internal state dumps |
| 9–10 | Maximum verbosity, rarely needed |

### Persistent Configuration (Mutable Host OS)

On mutable host OSes (standard Linux distributions with a writable `/etc`), set the kubelet log level persistently by adding or modifying the `--v` flag via a systemd drop-in file:

```bash
sudo mkdir -p /etc/systemd/system/kubelet.service.d/
sudo tee /etc/systemd/system/kubelet.service.d/10-log-level.conf <<EOF
[Service]
Environment="KUBELET_LOG_LEVEL=4"
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_LOG_LEVEL
Comment on lines +35 to +39

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify expected kubelet flag format and effective command on a node
systemctl show kubelet -p ExecStart
ps -ef | grep '[k]ubelet' | grep -o -- '--v=[0-9]\+'

Repository: alauda/knowledge

Length of output: 148


🏁 Script executed:

# Locate and examine the target documentation file
find . -name "Configure_Kubelet_Log_Level_Verbosity.md" -type f

Repository: alauda/knowledge

Length of output: 120


🏁 Script executed:

# Once located, read the file to see the full context (lines 1-50)
cat -n docs/en/solutions/Configure_Kubelet_Log_Level_Verbosity.md | head -50

Repository: alauda/knowledge

Length of output: 2155


🌐 Web query:

kubelet --v flag log level format Kubernetes documentation

💡 Result:

The kubelet supports the -v or --v flag (type: int) to control log verbosity level using Kubernetes' klog logging system. This sets the threshold for V(LEVEL) statements in the code, where higher values enable more detailed logging. Log format is primarily "text" (klog native format), configurable via --logging-format (default: text; json supported in some components). Example text log line at high verbosity: I1025 00:15:15.525108 1 httplog.go:79] GET /api/v1/namespaces/... (1.512ms) 200 ... Increasing --v=N logs more events (less severe); --v=0 logs only critical events. Recommended production default: --v=2. OpenShift docs provide detailed levels: - --v=0: Always visible to operator - --v=1: Reasonable default - --v=2: Steady state info, important changes (recommended default) - --v=3: Extended changes - --v=4: Debug - --v=6: Requested resources - --v=7: HTTP headers - --v=8: HTTP contents Use --vmodule=pattern=N for per-file/module overrides (text format only). To set: command-line flag kubelet --v=4, config file (logging.verbosity: 4), or environment KUBELET_LOG_LEVEL=4 (systemd drop-in). Restart kubelet after changes. Many flags deprecated; prefer --config file.

Citations:


🏁 Script executed:

# Search for kubelet systemd configuration patterns in the repo
rg -i "kubelet.*environment\|kubelet.*extra.*args" --type=md --type=yaml --type=json -A 3 -B 1

Repository: alauda/knowledge

Length of output: 42


🏁 Script executed:

# Check for any references to KUBELET_LOG_LEVEL, KUBELET_EXTRA_ARGS in documentation
rg "KUBELET_LOG_LEVEL|KUBELET_EXTRA_ARGS|KUBELET_KUBECONFIG_ARGS"

Repository: alauda/knowledge

Length of output: 404


🌐 Web query:

kubelet systemd service KUBELET_LOG_LEVEL environment variable kubeadm

💡 Result:

The KUBELET_LOG_LEVEL environment variable sets the kubelet log verbosity level (equivalent to the --v flag) in kubeadm-managed systemd services. It is not part of standard upstream Kubernetes kubeadm configuration but is used in distributions like OpenShift/OKD. In kubeadm setups, kubelet logging is primarily configured via: - The --v flag added to KUBELET_KUBEADM_ARGS in /var/lib/kubelet/kubeadm-flags.env (generated by kubeadm init/join). - Or in the structured config.yaml (/var/lib/kubelet/config.yaml) under logging.verbosity. To set log level in standard kubeadm: 1. Edit /var/lib/kubelet/kubeadm-flags.env and append --v=N to KUBELET_KUBEADM_ARGS (e.g., KUBELET_KUBEADM_ARGS="--v=4 ..."). 2. Or edit /var/lib/kubelet/config.yaml: logging: verbosity: N. 3. Run: systemctl daemon-reload && systemctl restart kubelet. For OpenShift-style override (compatible with kubeadm): Create /etc/systemd/system/kubelet.service.d/10-logging.conf: [Service] Environment="KUBELET_LOG_LEVEL=4" Then: systemctl daemon-reload && systemctl restart kubelet. Verbosity levels: 0 (minimal), 1-4 (debug), 5-8 (trace). Default is typically 2 in OpenShift examples. Kubeadm's /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf supports Environment= and EnvironmentFile=, so KUBELET_LOG_LEVEL works if the kubelet binary recognizes it (likely maps to --v).

Citations:


Persistent snippet passes an invalid kubelet argument format at line 37–39.

The configuration sets Environment="KUBELET_LOG_LEVEL=4", which expands to a bare 4 when referenced as $KUBELET_LOG_LEVEL in the ExecStart command. Kubelet cannot interpret a bare numeric argument; it requires the --v=4 flag format.

Use KUBELET_EXTRA_ARGS=--v=4 instead to provide the correctly formatted flag:

Suggested fix
-Environment="KUBELET_LOG_LEVEL=4"
+Environment="KUBELET_EXTRA_ARGS=--v=4"
 ExecStart=
-ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_LOG_LEVEL
+ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_EXTRA_ARGS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sudo tee /etc/systemd/system/kubelet.service.d/10-log-level.conf <<EOF
[Service]
Environment="KUBELET_LOG_LEVEL=4"
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_LOG_LEVEL
sudo tee /etc/systemd/system/kubelet.service.d/10-log-level.conf <<EOF
[Service]
Environment="KUBELET_EXTRA_ARGS=--v=4"
ExecStart=
ExecStart=/usr/bin/kubelet \$KUBELET_KUBECONFIG_ARGS \$KUBELET_CONFIG_ARGS \$KUBELET_EXTRA_ARGS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/Configure_Kubelet_Log_Level_Verbosity.md` around lines 35 -
39, The persistent unit currently sets Environment="KUBELET_LOG_LEVEL=4" and
then expands $KUBELET_LOG_LEVEL in ExecStart which yields a bare "4" that
kubelet won't accept; replace that by exporting a properly formatted flag (e.g.,
Environment="KUBELET_EXTRA_ARGS=--v=4") and update the ExecStart invocation to
include $KUBELET_EXTRA_ARGS so kubelet receives the correct --v=4 flag (look for
KUBELET_LOG_LEVEL, KUBELET_EXTRA_ARGS, ExecStart and the kubelet ExecStart
invocation to make the change).

Comment thread
coderabbitai[bot] marked this conversation as resolved.
EOF
sudo systemctl daemon-reload
sudo systemctl restart kubelet
```

### Persistent Configuration (Immutable OS Nodes)

On immutable-OS nodes — MicroOS, or any setup where `/etc` is backed by a read-mostly overlay that is reset on node upgrades or rollbacks — direct file edits under `/etc/systemd/system/kubelet.service.d/` **will not survive the next node update**. You may see the desired verbosity right after the change, then lose it silently when the node image is replaced.

Persist the change through ACP's Immutable Infrastructure mechanism instead:

- Define the drop-in file as part of the node configuration managed by ACP (under `configure/clusters/nodes`). The platform renders and re-applies it every time a node boots, so the override survives OS upgrades and rollbacks.
- Trigger a rolling apply on the target node pool. ACP will cordon/drain, restart the kubelet with the new verbosity, and resume scheduling.
- Revert the same way — update the node configuration to remove the override; do not `rm` the file directly on the node, because the mutation will be lost at the next reconcile.

If the cluster spans both mutable and immutable nodes, scope the change to a node group / pool so that only the intended nodes carry the higher verbosity.

### One-Time Change (Single Node)

For temporary debugging on a single mutable-OS node without touching the persistent configuration, override the kubelet arguments directly on that node:

```bash
sudo systemctl edit kubelet
```

Add the following to raise verbosity to level 4:

```ini
[Service]
Environment="KUBELET_EXTRA_ARGS=--v=4"
```

Then reload and restart:

```bash
sudo systemctl daemon-reload
sudo systemctl restart kubelet
```

On immutable-OS nodes, prefer the Immutable Infrastructure flow above even for short investigations: running `systemctl edit` on a single node works until that node is re-imaged, at which point the change is gone without warning.

> **Important:** Revert the log level back to the default (`2`) after collecting the necessary logs. Extended operation at high verbosity places significant load on node resources.

## Diagnostic Steps

Verify the current kubelet log level by inspecting the running process:

```bash
ps aux | grep kubelet | grep -o '\-\-v=[0-9]*'
```

Gather kubelet logs from a specific node:

```bash
kubectl get nodes
kubectl debug node/<node-name> --image=busybox -- cat /host/var/log/kubelet.log
```
Comment on lines +95 to +96

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

On Kubernetes/OpenShift nodes managed by systemd, what is the recommended way to collect kubelet logs, and is /var/log/kubelet.log guaranteed to exist?

💡 Result:

On Kubernetes and OpenShift nodes managed by systemd, the recommended way to collect kubelet logs is using journalctl -u kubelet. For OpenShift, additional commands like oc adm node-logs --role= -u kubelet are also recommended. The file /var/log/kubelet.log is not guaranteed to exist. On Linux nodes using systemd (the default for modern distributions), kubelet writes logs to journald by default, not to a .log file in /var/log. A .log file only appears if systemd is not present.

Citations:


Use journalctl -u kubelet to collect kubelet logs instead of relying on /host/var/log/kubelet.log.

On modern systemd-based nodes (the default for Kubernetes/OpenShift), kubelet writes logs to journald, not to a file in /var/log. The documentation should recommend the journalctl approach as the primary method, since /var/log/kubelet.log is not guaranteed to exist.

Suggested doc fix
-kubectl debug node/<node-name> --image=busybox -- cat /host/var/log/kubelet.log
+kubectl debug node/<node-name> --image=busybox -- chroot /host journalctl -u kubelet.service --since "1 hour ago"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl debug node/<node-name> --image=busybox -- cat /host/var/log/kubelet.log
```
kubectl debug node/<node-name> --image=busybox -- chroot /host journalctl -u kubelet.service --since "1 hour ago"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/Configure_Kubelet_Log_Level_Verbosity.md` around lines 81 -
82, Replace the current instruction that prints /host/var/log/kubelet.log (the
line containing "kubectl debug node/<node-name> --image=busybox -- cat
/host/var/log/kubelet.log") with a recommendation to use systemd journald as the
primary source: show how to run "journalctl -u kubelet" (optionally from within
a debug pod or via ssh) and mention common flags (e.g., -f, --since) for live or
time-bounded logs; ensure the doc states that /var/log/kubelet.log may not exist
on systemd-based nodes and that journalctl is the preferred approach.


Alternatively, SSH into the node and use journalctl:

```bash
ssh <node-address>
sudo journalctl -b -f -u kubelet.service
```

To collect logs from all nodes at once:

```bash
for n in $(kubectl get nodes --no-headers | awk '{print $1}'); do
ssh "$n" "sudo journalctl -u kubelet.service --since '1 hour ago'" > "${n}.kubelet.log"
done
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
kind:
- Information
products:
- Alauda Container Platform
ProductsVersion:
- 4.1.0,4.2.x
---
## Issue

When etcd auto-defragmentation is disabled, the database accumulates unused space over time. Without proactive monitoring, the etcd data file can grow to the point where cluster performance degrades. A mechanism is needed to alert operators when manual defragmentation becomes necessary.

## Resolution

Create a custom `PrometheusRule` resource that triggers alerts based on the ratio of unused space within the etcd database.

### Prerequisites

Ensure the Prometheus Operator is deployed and the `PrometheusRule` CRD is available in the cluster:

```bash
kubectl get crd prometheusrules.monitoring.coreos.com
```

### Create the Alert Rules

Apply the following `PrometheusRule` manifest. Adjust the namespace to match the monitoring stack configuration (commonly `monitoring` or `kube-system`):

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: etcd-defragmentation-alerts
namespace: monitoring
spec:
groups:
- name: etcd-defragmentation.rules
rules:
- alert: EtcdDefragIsAdvised
annotations:
summary: >-
Etcd database unused space exceeds 35%.
Consider running defragmentation.
description: >-
The etcd database has more than 35% unused space
and the total size exceeds 400 MB. Schedule a
defragmentation during a maintenance window.
expr: >-
avg(etcd_db_total_size_in_bytes) > 419430400
and
(
(avg(etcd_mvcc_db_total_size_in_bytes)
- avg(etcd_mvcc_db_total_size_in_use_in_bytes))
* 100
/ avg(etcd_mvcc_db_total_size_in_bytes)
) > 35
Comment on lines +49 to +56

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Prometheus alerting for etcd fragmentation, is it recommended to evaluate etcd_mvcc_db_total_size_in_bytesand...in_use...per instance (no globalavg) to avoid masking a single fragmented member?

💡 Result:

Yes, it is recommended to evaluate etcd_mvcc_db_total_size_in_bytes and etcd_mvcc_db_total_size_in_use_in_bytes per instance without global aggregation like avg to avoid masking fragmentation on a single etcd member.

Citations:


Remove global avg() aggregation from alert expressions; it masks fragmentation on individual members.

Per Prometheus and etcd best practices, evaluate fragmentation metrics per instance without global aggregation. Using avg(...) across all members can keep the alert below threshold even when a single etcd node is critically fragmented.

Suggested PromQL adjustment
-            avg(etcd_db_total_size_in_bytes) > 419430400
+            etcd_db_total_size_in_bytes > 419430400
             and
             (
-              (avg(etcd_mvcc_db_total_size_in_bytes)
-               - avg(etcd_mvcc_db_total_size_in_use_in_bytes))
+              (etcd_mvcc_db_total_size_in_bytes
+               - etcd_mvcc_db_total_size_in_use_in_bytes)
               * 100
-              / avg(etcd_mvcc_db_total_size_in_bytes)
+              / etcd_mvcc_db_total_size_in_bytes
             ) > 35
@@
-            avg(etcd_db_total_size_in_bytes) > 629145600
+            etcd_db_total_size_in_bytes > 629145600
             and
             (
-              (avg(etcd_mvcc_db_total_size_in_bytes)
-               - avg(etcd_mvcc_db_total_size_in_use_in_bytes))
+              (etcd_mvcc_db_total_size_in_bytes
+               - etcd_mvcc_db_total_size_in_use_in_bytes)
               * 100
-              / avg(etcd_mvcc_db_total_size_in_bytes)
+              / etcd_mvcc_db_total_size_in_bytes
             ) > 40

Also applies to: lines 71–78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/Create_PrometheusRule_Alerts_for_etcd_Defragmentation.md`
around lines 49 - 56, The alert uses global avg(...) around etcd metrics which
masks per-member fragmentation; replace avg(etcd_db_total_size_in_bytes) and the
avg(...) calls around etcd_mvcc_db_total_size_in_bytes /
etcd_mvcc_db_total_size_in_use_in_bytes with per-instance evaluation (either
remove aggregation entirely or use avg by (instance) / by (pod, instance, or
node) as appropriate). Concretely, change the expression to evaluate per-member,
e.g. keep the size threshold as etcd_db_total_size_in_bytes > 419430400 and
compute fragmentation as ((etcd_mvcc_db_total_size_in_bytes -
etcd_mvcc_db_total_size_in_use_in_bytes) * 100 /
etcd_mvcc_db_total_size_in_bytes) > 35 (or avg by (instance)(...) if you need
short-window smoothing), and apply the same change to the similar block
referenced at lines 71–78.

labels:
severity: warning

- alert: EtcdDefragIsNeeded
annotations:
summary: >-
Etcd database unused space exceeds 40%.
Defragmentation is strongly recommended.
description: >-
The etcd database has more than 40% unused space
and the total size exceeds 600 MB. Perform
defragmentation as soon as possible to avoid
performance degradation.
expr: >-
avg(etcd_db_total_size_in_bytes) > 629145600
and
(
(avg(etcd_mvcc_db_total_size_in_bytes)
- avg(etcd_mvcc_db_total_size_in_use_in_bytes))
* 100
/ avg(etcd_mvcc_db_total_size_in_bytes)
) > 40
labels:
severity: critical
```

### Verify the Rules Are Loaded

```bash
kubectl get prometheusrule -n monitoring
kubectl describe prometheusrule etcd-defragmentation-alerts -n monitoring
```

### Perform Defragmentation When Alerted

When the alert fires, run defragmentation on each etcd member:

```bash
kubectl exec -n kube-system etcd-<node-name> -- etcdctl defrag \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
```

Process one member at a time to maintain quorum throughout the operation.

## Diagnostic Steps

Check current etcd database size and usage:

```bash
kubectl exec -n kube-system etcd-<node-name> -- wget -qO- http://127.0.0.1:2381/metrics 2>/dev/null \
| grep -E "etcd_mvcc_db_total_size_in_bytes|etcd_mvcc_db_total_size_in_use_in_bytes|etcd_db_total_size_in_bytes"
```

Verify the PrometheusRule is being evaluated:

```bash
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
curl -s 'http://localhost:9090/api/v1/rules' | python3 -m json.tool | grep -A5 "defrag"
```
Loading
Loading