You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have observed that files ip_netns_commands.txt (example of location folder /tmp/tmp.4FKbTfOrn4/collect) in our AKS cluster nodes sometimes growing to many GBs and when the size comes to around 90GB nodes start having issues with ephemeral storage (The node was low on resource: ephemeral-storage.) then pods become evicted and multiple other issues appear.
root@aks-apps5--vmss000014:/tmp/tmp.4FKbTfOrn4/collect# ls -lh ip_netns_commands.txt
-rw-r--r-- 1 root root 38G Mar 7 13:19 ip_netns_commands.txt
root@aks-apps5--vmss000014:/tmp/tmp.4FKbTfOrn4/collect# fuser -v ip_netns_commands.txt
USER PID ACCESS COMMAND
/tmp/tmp.4FKbTfOrn4/collect/ip_netns_commands.txt:
root 83233 F.... ip
root 109814 F.... ip
root 560481 F.... ip
root 1133085 F.... ip
root 1133086 F.... ip
root 1133087 F.... ip
root 1134797 F.... ip
root 1737066 F.... ip
root 1737172 F.... ip
root 1737210 F.... ip
root 1737451 F.... ip
What happened: cf Azure/AKS#4148
Describe the bug
We have observed that files ip_netns_commands.txt (example of location folder /tmp/tmp.4FKbTfOrn4/collect) in our AKS cluster nodes sometimes growing to many GBs and when the size comes to around 90GB nodes start having issues with ephemeral storage (The node was low on resource: ephemeral-storage.) then pods become evicted and multiple other issues appear.
root@aks-apps5--vmss000014:/tmp/tmp.4FKbTfOrn4/collect# ls -lh ip_netns_commands.txt
-rw-r--r-- 1 root root 38G Mar 7 13:19 ip_netns_commands.txt
root@aks-apps5--vmss000014:/tmp/tmp.4FKbTfOrn4/collect# fuser -v ip_netns_commands.txt
USER PID ACCESS COMMAND
/tmp/tmp.4FKbTfOrn4/collect/ip_netns_commands.txt:
root 83233 F.... ip
root 109814 F.... ip
root 560481 F.... ip
root 1133085 F.... ip
root 1133086 F.... ip
root 1133087 F.... ip
root 1134797 F.... ip
root 1737066 F.... ip
root 1737172 F.... ip
root 1737210 F.... ip
root 1737451 F.... ip
root@aks-apps5--vmss000014:/tmp/tmp.4FKbTfOrn4/collect# pstree -aps 83233
systemd,1
└─aks-log-collect,61814 /opt/azure/containers/aks-log-collector.sh
└─ip,83233 -all netns exec /bin/bash -x -c...
└─ip,109814 -all netns exec /bin/bash -x -c...
└─ip,1737066 -all netns exec /bin/bash -x -c...
└─ip,1737172 -all netns exec /bin/bash -x -c...
└─ip,1737210 -all netns exec /bin/bash -x -c...
└─ip,1737451 -all netns exec /bin/bash -x -c...
└─ip,560481 -all netns exec /bin/bash -x -c...
└─ip,1133085 -all netns exec /bin/bash -x -c...
└─bash,1147264 -x -c...
└─ss,1147282 -anoempiO --cgroup
root@aks-apps5--vmss000014:/tmp/tmp.4FKbTfOrn4/collect# head --lines 20 /opt/azure/containers/aks-log-collector.sh
#! /bin/bash
AKS Log Collector
This script collects information and logs that are useful to AKS engineering
for support and uploads them to the Azure host via a private API. These log
bundles are available to engineering when customers open a support case and
are especially useful for troubleshooting failures of networking or
kubernetes daemons.
This script runs via a systemd unit and slice that limits it to low CPU
priority and 128MB RAM, to avoid impacting other system functions.
Log bundle upload max size is limited to 100MB
MAX_SIZE=104857600
Shell options - remove non-matching globs, don't care about case, and use
extended pattern matching
shopt -s nullglob nocaseglob extglob
AKS 1.28.5
The text was updated successfully, but these errors were encountered: