Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhcpcd5 cannot cope with USB bus resets #102

Closed
l1k opened this issue Apr 9, 2018 · 15 comments
Closed

dhcpcd5 cannot cope with USB bus resets #102

l1k opened this issue Apr 9, 2018 · 15 comments

Comments

@l1k
Copy link

@l1k l1k commented Apr 9, 2018

The Revolution Pi products are used in EMI contaminated environments, which results in occasional USB bus resets. It is crucial that after a bus reset, the smsc95xx interface is brought up again and assigned an IP address so that the product remains reachable over the network.

On Raspbian jessie we ensured that with an allow-hotplug eth0 directive in /etc/network/interfaces, but on stretch the interfaces are handled by dhcpcd by default. When simulating a bus reset using usbreset on stretch, it turned out that dhcpcd fails to assign an IP address after a bus reset most of the time:

Apr 09 17:05:19 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: unregister 'smsc95xx' usb-20980000.usb-1.1, smsc95xx USB 2.0 Ethernet
Apr 09 17:05:19 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: carrier lost
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
Apr 09 17:05:19 RevPi147 kernel: usb 1-1.1: reset high-speed USB device number 3 using dwc_otg
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: deleting address fe80::5b64:36f6:bf05:7a28
Apr 09 17:05:19 RevPi147 kernel: smsc95xx v1.0.5
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: deleting default route
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: deleting route to 192.168.0.0/24
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: deleting IP address 192.168.0.145/24
Apr 09 17:05:19 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-20980000.usb-1.1, smsc95xx USB 2.0 Ethernet, c8:3e:a7:00:0a:07
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' EXPIRE
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: libudev: remove
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: interface departed
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: removing interface
Apr 09 17:05:19 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' STOP
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' DEPARTED
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: libudev: add
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: interface added
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: disabling kernel IPv6 RA support
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' PREINIT
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
Apr 09 17:05:20 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 09 17:05:20 RevPi147 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: waiting for carrier
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: interface departed
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: removing interface
Apr 09 17:05:20 RevPi147 dhcpcd[1698]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' DEPARTED
Apr 09 17:05:22 RevPi147 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 09 17:05:22 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1

So I cloned the original repository, checked out the dhcpcd-6.11.5 tag, extracted dhcpcd5_6.11.5-1+rpt4.debian.tar.xz and invoked dpkg-buildpackage -b -us -uc -a armhf. Surprisingly, bus resets are handled flawlessly with the resulting package:

Apr 09 17:18:56 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: unregister 'smsc95xx' usb-20980000.usb-1.1, smsc95xx USB 2.0 Ethernet
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: carrier lost
Apr 09 17:18:56 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
Apr 09 17:18:56 RevPi147 kernel: usb 1-1.1: reset high-speed USB device number 3 using dwc_otg
Apr 09 17:18:56 RevPi147 kernel: smsc95xx v1.0.5
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: deleting default route
Apr 09 17:18:56 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-20980000.usb-1.1, smsc95xx USB 2.0 Ethernet, c8:3e:a7:00:0a:07
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: deleting route to 192.168.0.0/24
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: deleting IP address 192.168.0.145/24
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' EXPIRE
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: interface departed
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: removing interface
Apr 09 17:18:56 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' STOP
Apr 09 17:18:57 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' DEPARTED
Apr 09 17:18:57 RevPi147 dhcpcd[829]: eth0: interface added
Apr 09 17:18:57 RevPi147 dhcpcd[829]: eth0: disabling kernel IPv6 RA support
Apr 09 17:18:57 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' PREINIT
Apr 09 17:18:57 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' NOCARRIER
Apr 09 17:18:57 RevPi147 dhcpcd[829]: eth0: waiting for carrier
Apr 09 17:18:57 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Apr 09 17:18:57 RevPi147 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 09 17:18:59 RevPi147 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: carrier acquired
Apr 09 17:18:59 RevPi147 kernel: smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: IAID a7:00:0a:07
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: adding address fe80::5b64:36f6:bf05:7a28
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: pltime infinity, vltime infinity
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: delaying IPv6 router solicitation for 0.2 seconds
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: delaying IPv4 for 0.4 seconds
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: soliciting an IPv6 router
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: delaying Router Solicitation for LL address
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: using ClientID 01:c8:3e:a7:00:0a:07
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: reading lease `/var/lib/dhcpcd5/dhcpcd-eth0.lease'
Apr 09 17:18:59 RevPi147 dhcpcd[829]: eth0: rebinding lease of 192.168.0.145

I then compared the dhcpcd-6.11.5 tag with the source package but they are identical.

Something is fishy here. Either you've made modifications to the source code which are not part of dhcpcd5_6.11.5-1+rpt4.debian.tar.xz, or the change in behavior was caused by a different build environment. To allow you to verify the latter, I'm including the output of the configure build stage below. Either way, we'd appreciate if the issue could be fixed in Raspbian, because even though we can ship a working deb package to our customers, they might overwrite it with a non-working package made available later by the Foundation. Thanks!

Cc: @mduckeck

dh_auto_configure -- --prefix= --libexecdir=/lib/dhcpcd \
	--mandir=/usr/share/man --dbdir=/var/lib/dhcpcd5 \
	--with-hooks="ntp wpa_supplicant" --host=arm-linux-gnueabihf \
	--build=arm-linux-gnueabihf
	./configure --build=arm-linux-gnueabihf --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=\${prefix}/lib/arm-linux-gnueabihf --libexecdir=\${prefix}/lib/arm-linux-gnueabihf --disable-maintainer-mode --disable-dependency-tracking --prefix= --libexecdir=/lib/dhcpcd --mandir=/usr/share/man --dbdir=/var/lib/dhcpcd5 "--with-hooks=ntp wpa_supplicant" --host=arm-linux-gnueabihf --build=arm-linux-gnueabihf
configure args: --build=arm-linux-gnueabihf --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/arm-linux-gnueabihf --libexecdir=${prefix}/lib/arm-linux-gnueabihf --disable-maintainer-mode --disable-dependency-tracking --prefix= --libexecdir=/lib/dhcpcd --mandir=/usr/share/man --dbdir=/var/lib/dhcpcd5 --with-hooks=ntp wpa_supplicant --host=arm-linux-gnueabihf --build=arm-linux-gnueabihf
Deriving operating system from ... arm-linux-gnueabihf
Configuring dhcpcd for ... linux
Enabling INET support
Enabling IPv4LL support
Enabling INET6 support
Enabling DHCPv6 support
Enabling Authentication
Using compiler .. cc
cc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516
dhcpcd-definitions.conf will be embedded in dhcpcd itself
Testing for nl80211 ... yes
Testing for IN6_ADDR_GEN_MODE_NONE ... yes
Testing for getifaddrs ... yes
Testing for ifaddrs.ifa_addrflags ... no
Testing for clock_gettime ... yes
Testing for inet_ntoa ... yes
Testing for arc4random ... no
Testing for arc4random_uniform ... no
Testing for getline ... no
Testing for strlcpy ... no
Testing for pidfile_lock ... no
Testing for strtoi ... no
Testing for dprintf ... yes
Testing for printf %m support ...yes
Testing for TAILQ_FOREACH_SAFE ... no
Testing for TAILQ_CONCAT ...yes
Testing for posix_spawn ... yes
Testing for reallocarray ... no
Testing for kqueue1 ... no
Testing for kqueue ... no
Testing for epoll ... yes
Testing for be64enc ... no
Testing for fls64 ... no
Testing for MD5Init ... no
Testing for sha2.h ... no
Testing for SHA256_Init ... no
Testing for SHA256Init ... no
Checking for libudev ... no
Finding hook ntp ... 50-ntp.conf
Finding hook wpa_supplicant ... 10-wpa_supplicant

   SYSCONFDIR =		/etc
   SBINDIR =		/sbin
   LIBDIR =		/lib/arm-linux-gnueabihf
   LIBEXECDIR =		/lib/dhcpcd
   DBDIR =		/var/lib/dhcpcd5
   RUNDIR =		/var/run
   MANDIR =		/usr/share/man
   DATADIR =		/usr/share
   HOOKSCRIPTS =	50-ntp.conf 10-wpa_supplicant
   EGHOOKSCRIPTS =	
   STATUSARG = 		
@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 10, 2018

Comparing the output from configure, the difference is this:

Checking for libudev ... yes
Checking udev_monitor_filter_add_match_subsystem_devtype ... yes
Checking udev_device_get_is_initialized ... yes
Testing for dlopen ... yes (-ldl)

Maybe building with --without-udev would fix the problem, but I would like to understand why it happens. Might be a while before I can look into it properly.

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 10, 2018

So I don't forget:

We used to have trouble with persistent interface names failing because the the system would try to rename them while dhcpcd had hold of them. The problem seemed to go away 'on its own', but now I suspect it's because we started building with libudev available on the build machine.

From README:

Some systems have /dev management systems and some of these like to rename
interfaces. As this system would listen in the same way as dhcpcd to new
interface arrivals, dhcpcd needs to listen to the /dev management sytem
instead of the kernel. However, if the /dev management system breaks, stops
working, or changes to a new one, dhcpcd should still try and continue to work.
To facilitate this, dhcpcd allows a plugin to load to instruct dhcpcd when it
can use an interface. As of the time of writing only udev support is included.
You can disable this with --without-dev, or without-udev.
NOTE: in Gentoo at least, sys-fs/udev as provided by systemd leaks memory
sys-fs/eudev, the fork of udev does not and as such is recommended.

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 10, 2018

Quick glance at the logs you posted. It looks like in the working case:
Interface departs and is removed.
Interface is added
Wait for carrier, get carrier and carry on.

In the non-working case.
Interface departs and is removed
Interface is added
Wait for carrier
Interface departs and is removed before carrier is acquired.

I think we'd need to check what udev event dhcpcd is listening for that's making it decide that the interface has departed when there are no kernel messages to suggest that that's the case.

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 10, 2018

The output of udevadm monitor -s net may be useful.

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 10, 2018

Looks like you can also launch dhcpcd with --nodev to get around the problem without having to recompile it.

@l1k
Copy link
Author

@l1k l1k commented Apr 11, 2018

There's a race between the kernel and udev. In the working case, udevadm monitor -s net output is

KERNEL remove
UDEV remove
KERNEL add
UDEV add

whereas in the nonworking case it's

KERNEL remove
KERNEL add
UDEV remove
UDEV add
@l1k
Copy link
Author

@l1k l1k commented Apr 11, 2018

I can confirm that adding --nodev on the command line or nodev to dhcpcd.conf makes it work despite the problematic remove-add-remove-add sequence. What was your intention going forward, keep udev support or get rid of it? I'll doublecheck now if the issue is still present with the tip of the master branch and I'm also digging into the source code.

@l1k
Copy link
Author

@l1k l1k commented Apr 11, 2018

The issue is apparently gone in 7.0.3. There's 432 commits between it and 6.11.5 but none of them strikes me like an obvious candidate, so I'm bisecting to find the fix.

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 11, 2018

What was your intention going forward, keep udev support or get rid of it?

I believe we have to have udev support for persistent interface names to work reliably. Ideally, we'd backport a fix.

@l1k
Copy link
Author

@l1k l1k commented Apr 11, 2018

Sorry I was mistaken, I did manage to reproduce the issue with 7.0.3 now.

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 11, 2018

Might be worth reporting the issue on the dhcpcd mailing list.

To subscribe to the list, send an empty email to dhcpcd-discuss+subscribe@marples.name
Once subscribed to the list, you can post to it by sending an email to dhcpcd-discuss@marples.name

l1k added a commit to RevolutionPi/imagebakery that referenced this issue Apr 11, 2018
…ion)

After an EMI-induced USB bus reset, the on-board Ethernet interfaces
should be brought up again and assigned IP addresses.

On jessie this required amending /etc/network/interfaces, but on
stretch, dhcpcd is used instead.  Nevertheless add the required
directives as comments to /etc/network/interfaces in case users
prefer not to use dhcpcd.

There's an issue with dhcpcd wherein address assignment after a
bus reset fails if kernel and udevd uevents are received in a
specific order.  Work around by disabling reception of udevd
uevents for now.

Link: RPi-Distro/repo#102
Signed-off-by: Lukas Wunner <lukas@wunner.de>
@l1k
Copy link
Author

@l1k l1k commented Apr 16, 2018

@XECDesign: The issue is now fixed on the dhcpcd master branch with commit rsmarples/dhcpcd@cd3ab83. Could you cut a new release of the dhcpcd Raspbian package one of these days with the commit cherry-picked? I've double-checked that 6.11.5 plus this commit compile cleanly and fix the issue. Thanks!

@XECDesign
Copy link
Contributor

@XECDesign XECDesign commented Apr 17, 2018

Thanks. A new dhcpcd package should hit the repo shortly.

@l1k
Copy link
Author

@l1k l1k commented Apr 17, 2018

Okay I've verified that the issue is gone in dhcpcd5_6.11.5-1+rpt5. Thanks a lot!

@l1k l1k closed this Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants