Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comet Lake Intel I219-V Ethernet (8086:0d4f) e1000e hangs on 5.7.13-975.native kernel #2091

Open
kellyjp opened this issue Aug 11, 2020 · 0 comments

Comments

@kellyjp
Copy link

kellyjp commented Aug 11, 2020

Problem Summary

Comet Lake Intel I219-V Ethernet interface (8086:0d4f) managed by driver e1000e hangs under load i.e. tens of Mbits/s sustained for a few hours.

Environment

Clear Linux Version: 33590
Kernel: Linux vmhost 5.7.13-975.native #1 SMP Wed Aug 5 03:03:05 PDT 2020 x86_64 GNU/Linux
Hardware: Intel(R) Client Systems NUC10i5FNH/NUC10i5FNB, BIOS FNCML357.0044.2020.0715.1813 07/15/2020, with 32GB RAM, NVMe SSD.
Use Case: Small number of KVM virtual machines (managed by libvirtd) and Docker containers. Ethernet interface (eno1) sits under a bridge (br0) to enable connectivity to virtual machines - as a consequence the host IP address is configured on the bridge (br0) rather than the Ethernet interface (eno1).

Problem Description

After running under load for a few hours (primarily video streams and NFS traffic) the Ethernet interface hangs:

Aug 10 22:47:14 vmhost kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                                 TDH                  <b8>
                                 TDT                  <dd>
                                 next_to_use          <dd>
                                 next_to_clean        <b7>
                               buffer_info[next_to_clean]:
                                 time_stamp           <10299afaf>
                                 next_to_watch        <b8>
                                 jiffies              <10299b8c0>
                                 next_to_watch.status <0>
                               MAC Status             <40080083>
                               PHY Status             <796d>
                               PHY 1000BASE-T Status  <3c00>
                               PHY Extended Status    <3000>
                               PCI Status             <10>

This is repeated twice more, at 2 second intervals before the driver resets the adapter.

Aug 10 22:47:19 vmhost kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly

The impact is a loss of connectivity for around 15 seconds (+ bridge forwarding delay which is set to the minimum of 2 seconds) which is enough to disrupt a video stream :-(

Once the problem occurs it happens more frequently until a reboot i.e. initial onset takes hours under load but subsequent occurrences happen within tens of minutes IF significant load is maintained.

I haven't tested with the LTS2019 kernel as this hardware (8086:0d4f) is not supported by the e1000e driver in mainline kernels before 5.5.

I turned on the dump capability in the driver with ethtool and have captured a complete event (3 hangs followed by reset).
The journalctl output for this period (which includes register and TX/RX ring dumps) is attached to this issue here: e1000e.log

Also here is the output of lspci -vvv after an event has occurred:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (10) I219-V
	DeviceName:  LAN
	Subsystem: Intel Corporation Device 2081
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 169
	Region 0: Memory at 56300000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00838  Data: 0000
	Kernel driver in use: e1000e
	Kernel modules: e1000e

And the output of ethtool -k:

Features for eno1:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off
	tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant