Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium_host/cilium_net: Caught tx_queue_len zero misconfig #17703

Closed
mukeshkwm opened this issue Oct 26, 2021 · 2 comments · Fixed by #17705 or #17709
Closed

cilium_host/cilium_net: Caught tx_queue_len zero misconfig #17703

mukeshkwm opened this issue Oct 26, 2021 · 2 comments · Fixed by #17705 or #17709
Labels
kind/bug This is a bug in the Cilium logic.

Comments

@mukeshkwm
Copy link

mukeshkwm commented Oct 26, 2021

General Information

  • Cilium version (run cilium version)
    cilium-cli: v0.9.1 compiled with go1.17.1 on darwin/amd64
    cilium image (default): v1.10.4
    cilium image (stable): unknown
    cilium image (running): v1.10.5

  • Kernel version (run uname -a)
    "Linux cp-lab-amukesh4-vmss000000 5.4.0-1062-azure 65~18.04.1-Ubuntu SMP Tue Oct 12 11:26:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux"

  • Orchestration system version in use (e.g. kubectl version, ...)
    kubectl version
    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.13", GitCommit:"53c7b65d4531a749cd3a7004c5212d23daa044a9", GitTreeState:"clean", BuildDate:"2021-07-15T20:53:19Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}

  • Link to relevant artifacts (policies, deployments scripts, ...)
    https://docs.cilium.io/en/stable/gettingstarted/k8s-install-kubeadm/
    How to reproduce the issue

These are the kubernetes cluster built via kubeadm on azure VMs.

  1. Install cilium 1.10.5 via helm3 and saw the messages reported.

  2. Are they going to impact the cluster network connectivity or any other issues in prod?

  3. Currently the cluster is healthy and network connectivity is fine but seeing the messages "cilium_net: Caught tx_queue_len zero misconfig" frequently.

  4. Cilium Installation via Helm3:
    helm install cilium cilium/cilium --version 1.10.5 --namespace kube-system --debug

cilium status
/¯¯
/¯¯_/¯¯\ Cilium: OK
_
/¯¯_/ Operator: OK
/¯¯_
/¯¯\ Hubble: disabled
_/¯¯_/ ClusterMesh: disabled
__/

DaemonSet cilium Desired: 28, Ready: 28/28, Available: 28/28
Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2
Containers: cilium Running: 28
cilium-operator Running: 2
Cluster Pods: 219/250 managed by Cilium
Image versions cilium ******.com/cilium/cilium:v1.10.5: 28
cilium-operator *******.com/cilium/operator-generic:v1.10.5: 2

  1. After cilium got initialised, all the nodes in the cluster showing below messages via dmesg -T command

[Mon Oct 25 06:01:29 2021] IPv6: ADDRCONF(NETDEV_CHANGE): cali11641feb8ae: link becomes ready
[Mon Oct 25 06:48:01 2021] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_net: link becomes ready
[Mon Oct 25 06:48:01 2021] IPv6: ADDRCONF(NETDEV_CHANGE): cilium_host: link becomes ready
[Mon Oct 25 06:48:01 2021] NET: Registered protocol family 38
[Mon Oct 25 06:48:02 2021] IPv6: ADDRCONF(NETDEV_CHANGE): lxc_health: link becomes ready
[Mon Oct 25 06:48:03 2021] cilium_host: Caught tx_queue_len zero misconfig
[Mon Oct 25 06:48:03 2021] cilium_net: Caught tx_queue_len zero misconfig
[Mon Oct 25 06:48:08 2021] lxc_health: Caught tx_queue_len zero misconfig

[Mon Oct 25 07:22:27 2021] lxc667d091cb0ea: Caught tx_queue_len zero misconfig
[Mon Oct 25 07:22:27 2021] lxc742557edbdd8: Caught tx_queue_len zero misconfig
[Mon Oct 25 07:22:27 2021] lxc_health: Caught tx_queue_len zero misconfig
[Mon Oct 25 07:22:27 2021] lxcb7215aa13fc7: Caught tx_queue_len zero misconfig
[Mon Oct 25 07:22:30 2021] hv_balloon: Max. dynamic memory size: 16384 MB

@mukeshkwm mukeshkwm added the kind/bug This is a bug in the Cilium logic. label Oct 26, 2021
@kkourt
Copy link
Contributor

kkourt commented Oct 26, 2021

Hi @mukeshkwm,

Thanks for the report.

As for the following question:

Are they going to impact the cluster network connectivity or any other issues in prod?

I believe these messages are harmless. After some digging in our issues, I found an old message from @borkmann also saying the same: #987 (comment), as well as a kernel commit message that indicates the same: https://patchwork.ozlabs.org/project/netdev/patch/20161103135611.28737.39840.stgit@firesoul/.

borkmann added a commit that referenced this issue Oct 26, 2021
Avoid spamming the kernel log with below messages as reported by @mukeshkwm:

  [...]
  [Mon Oct 25 06:48:03 2021] cilium_host: Caught tx_queue_len zero misconfig
  [Mon Oct 25 06:48:03 2021] cilium_net: Caught tx_queue_len zero misconfig
  [Mon Oct 25 06:48:08 2021] lxc_health: Caught tx_queue_len zero misconfig
  [...]
  [Mon Oct 25 07:22:27 2021] lxc667d091cb0ea: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxc742557edbdd8: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxc_health: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxcb7215aa13fc7: Caught tx_queue_len zero misconfig
  [...]

These are harmless and the kernel will internally fix up dev->tx_queue_len
to DEFAULT_TX_QUEUE_LEN (== 1000). Avoid the messages by explicitly setting
the dev->tx_queue_len upon veth creation.

Fixes: #17703
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Member

@mukeshkwm could you check if you are still seeing them with #17705 ? Thanks!

chendotjs added a commit to chendotjs/cilium that referenced this issue Oct 26, 2021
Avoid spamming the kernel log with below messages when creating ipvlan slave:

[Tue Oct 26 14:20:38 2021] cilium: renamed from tmp_svwpi
[Tue Oct 26 14:20:38 2021] cilium: Caught tx_queue_len zero misconfig
[Tue Oct 26 14:20:46 2021] cilium_host: Caught tx_queue_len zero misconfig
[Tue Oct 26 14:32:11 2021] eth0: renamed from tmp6fed1
[Tue Oct 26 14:32:11 2021] eth0: Caught tx_queue_len zero misconfig

@mukeshkwm reported similar issue on veth datapath, which also applies to ipvlan datapath.

Fixes: cilium#17703
Signed-off-by: Chen Yaqi <chendotjs@gmail.com>
borkmann added a commit that referenced this issue Oct 26, 2021
Avoid spamming the kernel log with below messages as reported by @mukeshkwm:

  [...]
  [Mon Oct 25 06:48:03 2021] cilium_host: Caught tx_queue_len zero misconfig
  [Mon Oct 25 06:48:03 2021] cilium_net: Caught tx_queue_len zero misconfig
  [Mon Oct 25 06:48:08 2021] lxc_health: Caught tx_queue_len zero misconfig
  [...]
  [Mon Oct 25 07:22:27 2021] lxc667d091cb0ea: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxc742557edbdd8: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxc_health: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxcb7215aa13fc7: Caught tx_queue_len zero misconfig
  [...]

These are harmless and the kernel will internally fix up dev->tx_queue_len
to DEFAULT_TX_QUEUE_LEN (== 1000). Avoid the messages by explicitly setting
the dev->tx_queue_len upon veth creation.

Fixes: #17703
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
michi-covalent pushed a commit that referenced this issue Oct 27, 2021
Avoid spamming the kernel log with below messages as reported by @mukeshkwm:

  [...]
  [Mon Oct 25 06:48:03 2021] cilium_host: Caught tx_queue_len zero misconfig
  [Mon Oct 25 06:48:03 2021] cilium_net: Caught tx_queue_len zero misconfig
  [Mon Oct 25 06:48:08 2021] lxc_health: Caught tx_queue_len zero misconfig
  [...]
  [Mon Oct 25 07:22:27 2021] lxc667d091cb0ea: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxc742557edbdd8: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxc_health: Caught tx_queue_len zero misconfig
  [Mon Oct 25 07:22:27 2021] lxcb7215aa13fc7: Caught tx_queue_len zero misconfig
  [...]

These are harmless and the kernel will internally fix up dev->tx_queue_len
to DEFAULT_TX_QUEUE_LEN (== 1000). Avoid the messages by explicitly setting
the dev->tx_queue_len upon veth creation.

Fixes: #17703
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
pchaigno pushed a commit that referenced this issue Nov 5, 2021
Avoid spamming the kernel log with below messages when creating ipvlan slave:

[Tue Oct 26 14:20:38 2021] cilium: renamed from tmp_svwpi
[Tue Oct 26 14:20:38 2021] cilium: Caught tx_queue_len zero misconfig
[Tue Oct 26 14:20:46 2021] cilium_host: Caught tx_queue_len zero misconfig
[Tue Oct 26 14:32:11 2021] eth0: renamed from tmp6fed1
[Tue Oct 26 14:32:11 2021] eth0: Caught tx_queue_len zero misconfig

@mukeshkwm reported similar issue on veth datapath, which also applies to ipvlan datapath.

Fixes: #17703
Signed-off-by: Chen Yaqi <chendotjs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic.
Projects
None yet
3 participants