You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Comet Lake Intel I219-V Ethernet interface (8086:0d4f) managed by driver e1000e hangs under load i.e. tens of Mbits/s sustained for a few hours.
Environment
Clear Linux Version: 33590
Kernel: Linux vmhost 5.7.13-975.native #1 SMP Wed Aug 5 03:03:05 PDT 2020 x86_64 GNU/Linux
Hardware: Intel(R) Client Systems NUC10i5FNH/NUC10i5FNB, BIOS FNCML357.0044.2020.0715.1813 07/15/2020, with 32GB RAM, NVMe SSD.
Use Case: Small number of KVM virtual machines (managed by libvirtd) and Docker containers. Ethernet interface (eno1) sits under a bridge (br0) to enable connectivity to virtual machines - as a consequence the host IP address is configured on the bridge (br0) rather than the Ethernet interface (eno1).
Problem Description
After running under load for a few hours (primarily video streams and NFS traffic) the Ethernet interface hangs:
Aug 10 22:47:14 vmhost kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <b8>
TDT <dd>
next_to_use <dd>
next_to_clean <b7>
buffer_info[next_to_clean]:
time_stamp <10299afaf>
next_to_watch <b8>
jiffies <10299b8c0>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3c00>
PHY Extended Status <3000>
PCI Status <10>
This is repeated twice more, at 2 second intervals before the driver resets the adapter.
The impact is a loss of connectivity for around 15 seconds (+ bridge forwarding delay which is set to the minimum of 2 seconds) which is enough to disrupt a video stream :-(
Once the problem occurs it happens more frequently until a reboot i.e. initial onset takes hours under load but subsequent occurrences happen within tens of minutes IF significant load is maintained.
I haven't tested with the LTS2019 kernel as this hardware (8086:0d4f) is not supported by the e1000e driver in mainline kernels before 5.5.
I turned on the dump capability in the driver with ethtool and have captured a complete event (3 hangs followed by reset).
The journalctl output for this period (which includes register and TX/RX ring dumps) is attached to this issue here: e1000e.log
Also here is the output of lspci -vvv after an event has occurred:
Features for eno1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
The text was updated successfully, but these errors were encountered:
Problem Summary
Comet Lake Intel I219-V Ethernet interface (8086:0d4f) managed by driver e1000e hangs under load i.e. tens of Mbits/s sustained for a few hours.
Environment
Clear Linux Version: 33590
Kernel: Linux vmhost 5.7.13-975.native #1 SMP Wed Aug 5 03:03:05 PDT 2020 x86_64 GNU/Linux
Hardware: Intel(R) Client Systems NUC10i5FNH/NUC10i5FNB, BIOS FNCML357.0044.2020.0715.1813 07/15/2020, with 32GB RAM, NVMe SSD.
Use Case: Small number of KVM virtual machines (managed by libvirtd) and Docker containers. Ethernet interface (eno1) sits under a bridge (br0) to enable connectivity to virtual machines - as a consequence the host IP address is configured on the bridge (br0) rather than the Ethernet interface (eno1).
Problem Description
After running under load for a few hours (primarily video streams and NFS traffic) the Ethernet interface hangs:
This is repeated twice more, at 2 second intervals before the driver resets the adapter.
The impact is a loss of connectivity for around 15 seconds (+ bridge forwarding delay which is set to the minimum of 2 seconds) which is enough to disrupt a video stream :-(
Once the problem occurs it happens more frequently until a reboot i.e. initial onset takes hours under load but subsequent occurrences happen within tens of minutes IF significant load is maintained.
I haven't tested with the LTS2019 kernel as this hardware (8086:0d4f) is not supported by the e1000e driver in mainline kernels before 5.5.
I turned on the dump capability in the driver with ethtool and have captured a complete event (3 hangs followed by reset).
The journalctl output for this period (which includes register and TX/RX ring dumps) is attached to this issue here: e1000e.log
Also here is the output of
lspci -vvv
after an event has occurred:And the output of
ethtool -k
:The text was updated successfully, but these errors were encountered: