Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhcpcd crashes every 5 days #179

Closed
zacknewman opened this issue Jan 28, 2023 · 33 comments
Closed

dhcpcd crashes every 5 days #179

zacknewman opened this issue Jan 28, 2023 · 33 comments
Labels

Comments

@zacknewman
Copy link

For over a year now, dhcpcd crashes every 5 days. I "solved" this by running a ksh script every 30 minutes that starts it up when it is not running.

Basic machine info:

router$ uname -a
OpenBSD router.philomathiclife.com 7.2 GENERIC.MP#6 amd64
router$ dhcpcd --version
dhcpcd 9.4.1
Copyright (c) 2006-2021 Roy Marples
Compiled in features: INET ARP ARPing IPv4LL INET6 DHCPv6 AUTH PRIVSEP
router$ cat /etc/dhcpcd.conf
noipv6rs
denyinterfaces !(em0)
allowinterfaces em0
script ""
interface em0
	duid
	iaid 0
	ia_na 0
	ia_pd 0/::/60
	ipv6rs
	leasetime -1
	nooption domain_name,domain_name_servers,domain_search,fqdn,host_name,ntp_servers,time_offset,time_servers
	option rapid_commit
	persistent
	require dhcp_server_identifier
	waitip 6
	waitip 4

Error that is logged to /var/log/daemon when it crashes:

Jan 26 03:24:19 router dhcpcd[41538]: ps_inet_dodispatch: Connection reset by peer
Jan 26 03:24:19 router dhcpcd[41538]: control_free: No such file or directory
Jan 26 03:24:19 router dhcpcd[41538]: ps_sendpsmmsg: Destination address required
Jan 26 03:24:19 router dhcpcd[41538]: ps_dostop: Destination address required

I ran sysctl kern.nosuidcoredump=2, but unfortunately no crash dump was created in /var/crash/dhcpcd.

I allow all egress traffic, ICMPv6 and ICMP ingress traffic, IPv6 UDP ingress traffic on the external interface destined to port 546, and IPv4 UDP ingress traffic on the external interface destined to port 68. I am not sure what additional information is useful. When dhcpcd is started up, the following is logged to /var/log/daemon:

Jan 26 03:43:01 router dhcpcd[11791]: dhcpcd-9.4.1 starting
Jan 26 03:43:01 router dhcpcd[56722]: DUID 00:04:1c:02:1b:03:0d:04:e7:05:08:06:5c:07:00:08:00:09
Jan 26 03:43:01 router dhcpcd[56722]: em0: IAID 00:00:00:00
Jan 26 03:43:02 router dhcpcd[56722]: em0: rebinding lease of 71.229.183.31
Jan 26 03:43:02 router dhcpcd[56722]: em0: probing address 71.229.183.31/22
Jan 26 03:43:02 router dhcpcd[56722]: em0: soliciting an IPv6 router
Jan 26 03:43:02 router dhcpcd[56722]: em0: rebinding prior DHCPv6 lease
Jan 26 03:43:04 router dhcpcd[56722]: em0: REPLY6 received from fe80::201:5cff:fe63:ee46
Jan 26 03:43:04 router dhcpcd[56722]: em0: adding address 2001:558:6040:83:a99d:b51a:6dd1:632c/128
Jan 26 03:43:04 router dhcpcd[56722]: em0: renew in 141184, rebind in 244864, expire in 313984 seconds
Jan 26 03:43:04 router dhcpcd[56722]: lo0: adding reject route to 2601:283:4403:ce80::/60 via ::1
Jan 26 03:43:04 router dhcpcd[56722]: em0: delegated prefix 2601:283:4403:ce80::/60
Jan 26 03:43:05 router dhcpcd[56722]: em0: Router Advertisement from fe80::201:5cff:fe63:ee46
Jan 26 03:43:05 router dhcpcd[56722]: em0: adding route to 2001:558:4070:83::/64 via fe80::201:5cff:fe63:ee46
Jan 26 03:43:05 router dhcpcd[56722]: em0: adding route to 2001:558:5021:32::/64 via fe80::201:5cff:fe63:ee46
Jan 26 03:43:05 router dhcpcd[56722]: em0: adding route to 2001:558:6040:83::/64 via fe80::201:5cff:fe63:ee46
Jan 26 03:43:05 router dhcpcd[56722]: em0: adding route to 2001:558:803a:47::/64 via fe80::201:5cff:fe63:ee46
Jan 26 03:43:05 router dhcpcd[56722]: em0: adding default route via fe80::201:5cff:fe63:ee46
Jan 26 03:43:07 router dhcpcd[56722]: em0: leased 71.229.183.31 for 224363 seconds
Jan 26 03:43:07 router dhcpcd[56722]: em0: adding route to 71.229.180.0/22
Jan 26 03:43:07 router dhcpcd[56722]: em0: adding default route via 71.229.180.1

Potentially relevant: when I run dhcpcd -U em0, it times out more often than not between the Neighbor Discovery info and the DHCPv6 info. Regardless if it times out, /var/log/daemon shows:

Jan 27 17:59:58 router dhcpcd[56722]: control command: dhcpcd -U em0
Jan 27 17:59:58 router dhcpcd[56722]: control_free: No such file or directory
Jan 27 18:00:03 router dhcpcd[56722]: control_free: No such file or directory

Not sure if the control_free error being one of the errors that is shown when crashing is important.

router$ dhcpcd -U em0
reason=CARRIER
interface=em0
protocol=link
if_configured=true
ifcarrier=up
ifmetric=0
ifwireless=0
ifflags=34883
ifmtu=1500

reason=RENEW
interface=em0
protocol=dhcp
ip_address=71.229.183.31
subnet_cidr=22
network_number=71.229.180.0
subnet_mask=255.255.252.0
routers=71.229.180.1
broadcast_address=255.255.255.255
dhcp_lease_time=345600
dhcp_message_type=5
dhcp_server_identifier=96.113.84.152
dhcp_renewal_time=172800
dhcp_rebinding_time=302400

reason=ROUTERADVERT
interface=em0
protocol=ra
nd1_from=fe80::201:5cff:fe63:ee46
nd1_acquired=360690
nd1_now=360690
nd1_hoplimit=64
nd1_flags=MO
nd1_lifetime=1800
nd1_prefix_information1_length=64
nd1_prefix_information1_flags=
nd1_prefix_information1_vltime=604800
nd1_prefix_information1_pltime=302400
nd1_prefix_information1_prefix=2001:558:4070:83::
nd1_prefix_information2_length=64
nd1_prefix_information2_flags=
nd1_prefix_information2_vltime=604800
nd1_prefix_information2_pltime=302400
nd1_prefix_information2_prefix=2001:558:5021:32::
nd1_prefix_information3_length=64
nd1_prefix_information3_flags=
nd1_prefix_information3_vltime=604800
nd1_prefix_information3_pltime=302400
nd1_prefix_information3_prefix=2001:558:6040:83::
nd1_prefix_information4_length=64
nd1_prefix_information4_flags=
nd1_prefix_information4_vltime=604800
nd1_prefix_information4_pltime=302400
nd1_prefix_information4_prefix=2001:558:803a:47::

reason=REBIND6
interface=em0
protocol=dhcp6
dhcp6_client_id=00041c021b030d04e70508065c0700080009
dhcp6_server_id=000100011caae03b40a8f02e8ef4
dhcp6_ia_na1_iaid=00000000
dhcp6_ia_na1_t1=141184
dhcp6_ia_na1_t2=244864
dhcp6_ia_na1_ia_addr1=2001:558:6040:83:a99d:b51a:6dd1:632c
dhcp6_ia_na1_ia_addr1_pltime=313984
dhcp6_ia_na1_ia_addr1_vltime=313984
dhcp6_ia_pd1_iaid=00000000
dhcp6_ia_pd1_t1=141184
dhcp6_ia_pd1_t2=244864
dhcp6_ia_pd1_prefix1_pltime=313984
dhcp6_ia_pd1_prefix1_vltime=313984
dhcp6_ia_pd1_prefix1_length=60
dhcp6_ia_pd1_prefix1=2601:283:4403:ce80::
@zacknewman
Copy link
Author

zacknewman commented Jan 28, 2023

I should add that during the ≈ 5 days that dhcpcd runs, it does successfully renew the IPv6 and IPv4 leases. When doing so, /var/log/daemon shows:

Jan 19 18:56:08 router dhcpcd[24241]: em0: REPLY6 received from fe80::201:5cff:fe63:ee46
Jan 19 18:56:08 router dhcpcd[24241]: em0: renew in 172800, rebind in 276480, expire in 345600 seconds

Finally, if there is one aspect about my setup that I have always felt "uneasy" about, it is the ia_pd 0/::/60 line in /etc/dhcpcd.conf. This may be an "edge case" that dhcpcd.conf(5) does not rule out. In particular according to dhcpcd.conf(5):

⋮
ia_pd [iaid [/ prefix / prefix_len] [interface [/ sla_id [/ prefix_len [/suffix]]]]]
⋮
If no interface is given then we will assign a prefix to every other interface with a sla_id equivalent to the interface index assigned by the OS.  Otherwise addresses are only assigned for each interface and sla_id.
⋮

The way that I interpret this means that dhcpcd should not assign prefixes to any interface since I don't have interfaces specified and I explicitly deny all interfaces with the exception of the requesting interface (i.e., denyinterfaces !(em0)). It seems bizarre that if this were to be an issue that dhcpcd would work at all let alone for 5 days successfully renewing the leases twice within that interval. As to the question of "why is dhcpcd not assigning addresses from the prefix it is delegated?" is concerned, the answer is that I statically assign /64s to the internal interfaces. My ISP has not changed my IPv4 or IPv6 addresses in years; and while I know they could, I am OK treating them as static for simpler configuration of some machines I have.

@rsmarples rsmarples added the bug label Feb 7, 2023
@rsmarples
Copy link
Member

The control_free: No such file or directory diagnostic you noted has been removed from the current source.
I don't recall if dhcpcd -U on OpenBSD triggers a code path failure with ppoll in dhcpcd-9.4.x which has also been fixed in the current branch.

The master branch here on github has new process management code for privsep which may help here..
Is there any chance you can pull down, compile and test please?

@rsmarples
Copy link
Member

I've left the master branch running for over 5 days and it seems to be working without issue on OpenBSD 6.6

@zacknewman
Copy link
Author

zacknewman commented Mar 2, 2023

I didn't mean to "ghost" you Roy. I had some stuff come up personally and professionally that I was/am dealing with. I know I don't have to explain that to you, but I want to to apologize anyway.

I am in the middle of rotating out the machine that I have been using as my router; and when that is done, I'll try to replicate this problem in a controlled setting by hooking up the old router to an SBC running a DHCP and DCHPv6 server. Assuming I can replicate this problem, I'll build dhcpcd from the master branch to see if it resolves the problem. OpenBSD 6.6 is quite old, and so I don't believe much can be gained from testing on it. I know I have been dealing with this problem since OpenBSD 7.0 and I even think OpenBSD 6.9.

I don't want you to be held up on me though, so feel free to close this if you'd like. I can always open a new bug report.

@rsmarples
Copy link
Member

OK, I've got dhcpcd setup in OpenBSD-7.3 and I'll keep it running as long as I can.

@zacknewman
Copy link
Author

zacknewman commented May 1, 2023

I still get crashes on OpenBSD 7.3, and it is normally every 5 days still. Sometimes it'll be 4 days or even 2 days. I was assigned new IPs from my ISP, and the lease is currently only 2 hours instead of 4 days like it was before; but the problem is still there. I also experimented with manually terminating dhcpcd every 3 days to see if that would "reset" this 5-day crash thing, but it does not: it would crash 2 days after I manually brought it back down. Point being it doesn't appear to be related to the length of the lease, using a "long-time assigned" IP, or having the process run a long time. Perhaps there is something specific with the packets that my ISP sends (e.g., they send some kind of packet that forces a disconnect)?

@igalic
Copy link

igalic commented May 1, 2023

how bad would it be to run a packet capture for n days to find out?


and would running a protocol fuzzer get us there quicker?

@zacknewman
Copy link
Author

zacknewman commented May 2, 2023

The penultimate time dhcpcd crashed was Apr 25 18:47:44 MDT, and the last time it crashed was Apr 30 19:00:26 MDT which means it should crash next around May 5 19:00 MDT. In order to reduce how long I will run tcpdump, I will start running it around May 5 18:30 MDT until it crashes. The command that I will run is the following: tcpdump -ntt -i em0 -w pkts.dat '(udp and ((ip and (dst port 67 or dst port 68)) or (ip6 and (dst port 546 or dst port 547)))) or icmp6' &. That command writes about 7500 bytes/minute.

If there is a better command, then I am all ears. Just for reference em0 is the only interface dhcpcd controls.

@zacknewman
Copy link
Author

I should clarify that the version of dhcpcd is the same for OpenBSD 7.3 as it was for OpenBSD 7.2 (i.e., version 9.4.1) and not the current version of 10.0.1.

@rsmarples
Copy link
Member

If it crashes again and you can't get a log, crashdump or anything useful you can try to do two things

  1. Try 10.0.1
  2. Compile without privsep - then you should get a crash dump.

@igalic
Copy link

igalic commented May 2, 2023

as an aside: does privsep prevent crashdumps on all platforms, or just on OpenBSD?

@rsmarples
Copy link
Member

All platforms.

@zacknewman
Copy link
Author

OK. I won't run that tcpdump then. I went ahead and ran the following:

router$ git clone https://github.com/NetworkConfiguration/dhcpcd && cd dhcpcd/ && ./configure && ./make
router$ su root
Password:
router# rcctl stop dhcpcd && ifconfig em0 down && ifconfig em0 -inet6 && ifconfig em0 -inet && rm -rf /var/run/dhcpcd && mv /usr/local/sbin/dhcpcd /home/zack/dhcp && mv src/dhcpcd /usr/local/sbin && chown root:daemon /usr/local/sbin/dhcpcd && rcctl start dhcpcd

I will report back.

@zacknewman
Copy link
Author

It crashed in the expected time frame that 9.4.1 was going to crash. Unlike 9.4.1 though, I don't see

Jan 26 03:24:19 router dhcpcd[41538]: ps_inet_dodispatch: Connection reset by peer
Jan 26 03:24:19 router dhcpcd[41538]: control_free: No such file or directory
Jan 26 03:24:19 router dhcpcd[41538]: ps_sendpsmmsg: Destination address required
Jan 26 03:24:19 router dhcpcd[41538]: ps_dostop: Destination address required

in /var/log/daemon. I take the next thing to do is compile this without privsep and log the crash dump? I also went ahead and ran the tcpdump anyway, but nothing stood out.

@zacknewman
Copy link
Author

zacknewman commented May 6, 2023

Wait, there actually is something in that tcpdump:

18:56:55.920416 104.152.52.138.54574 > 73.14.244.157.68: bootp-#0 htype-#0 hlen:0 [|bootp] [tos 0x20]

That IP belongs to Rethem Hosting LLC. My ISP is Xfinity/Comcast though. dhcpcd -U em0 says that the DHCP server identifier is 96.113.84.152, but tcpdump reports 96.120.140.45. This is what tcpdump shows when dhcpcd is started up:

19:13:01.666134 :: > ff02::1:ff62:31fb: icmp6: neighbor sol: who has fe80::7ec2:55ff:fe62:31fb
19:13:02.685399 fe80::7ec2:55ff:fe62:31fb > ff02::2: icmp6: router solicitation
19:13:02.803661 fe80::21c:73ff:fe00:99 > ff02::1: icmp6: router advertisement
19:13:03.596053 0.0.0.0.68 > 255.255.255.255.67:  xid:0x68427ec [|bootp]
19:13:03.654892 96.120.140.45.67 > 73.14.244.157.68:  xid:0x68427ec Y:73.14.244.157 G:96.120.140.45 [|bootp] [tos 0x10]
19:13:03.695587 fe80::7ec2:55ff:fe62:31fb.546 > ff02::1:2.547: DHCPv6 Rebind xid 1f4cf9 [hlim 1]
19:13:03.753507 2001:558:4070:b8::10.547 > fe80::7ec2:55ff:fe62:31fb.546: DHCPv6 Reply xid 1f4cf9 [flowlabel 0x14181]
19:13:03.753918 :: > ff02::1:ff34:57d5: icmp6: neighbor sol: who has 2001:558:6040:b9:7dd1:10d5:7434:57d5

fe80::21c:73ff:fe00:99 is the link-local address of my ISP's router, fe80::7ec2:55ff:fe62:31fb is the link-local address of em0, 2001:558:6040:b9:7dd1:10d5:7434:57d5 is the IPv6 address assigned to em0, and 73.14.244.157 is the IPv4 address assigned to em0.

@zacknewman
Copy link
Author

zacknewman commented May 6, 2023

Does dhcpcd allow DHCP servers to send data from a port that is not 67? I know it is a very common misconception that DHCP servers MUST/SHOULD send traffic from port 67, but RFC 2131 makes no requirements or recommendations on what the source ports are. Similarly, RFC 8415 makes no requirements or recommendations on the source ports for DHCPv6 (i.e., servers can send data from a port that is not 547).

Stating this another way, RFC 2131 only says that servers MUST send traffic to port 68 on the client, the client MUST send traffic to port 67 on the server, RFC 8415 states servers MUST send traffic to port 546 on the client, and clients MUST send traffic to port 547 on the server.

@zacknewman
Copy link
Author

I forgot to report the warnings that were generated when I compiled dhcpcd 10.0.1. Not sure if these are relevant, but here are the following warnings:

if.c:909:30: warning: cast from 'char *' to 'struct cmsghdr *' increases required alignment from 1 to 4 [-Wcast-align]
             cm = (struct cmsghdr *)CMSG_NXTHDR(msg, cm))
                                    ^~~~~~~~~~~~~~~~~~~~
/usr/include/sys/socket.h:537:6: note: expanded from macro 'CMSG_NXTHDR'
            (struct cmsghdr *)((char *)(cmsg) + _ALIGN((cmsg)->cmsg_len)))
if-bsd.c:736:2: warning: kernel does not allow IPv6 address sharing [-W#warnings]
#warning kernel does not allow IPv6 address sharing
 ^
if-bsd.c:1405:2: warning: No SIOCGIFALIAS support [-W#warnings]
#warning No SIOCGIFALIAS support
 ^
if-bsd.c:1602:2: warning: kernel does not support RTM_MISS DST filtering [-W#warnings]
#warning kernel does not support RTM_MISS DST filtering
 ^
if-bsd.c:1723:2: warning: OS does not allow setting of RA bits hoplimit, retrans or reachable [-W#warnings]
#warning OS does not allow setting of RA bits hoplimit, retrans or reachable
ipv6.c:91:4: warning: kernel does not report IPv6 address flag changes [-W#warnings]
#  warning kernel does not report IPv6 address flag changes
   ^
ipv6.c:92:4: warning: polling tentative address flags periodically [-W#warnings]
#  warning polling tentative address flags periodically
ipv6nd.c:579:2: warning: kernel does not support userland sending ND6 advertisements [-W#warnings]
#warning kernel does not support userland sending ND6 advertisements
ipv4ll.c:77(ipv4ll.o:(ipv4ll_pickaddr)): warning: random() may return deterministic values, is that what you want?

rsmarples added a commit that referenced this issue May 8, 2023
RFC2131 and updates make no mention of what the source port
should or must be.

Update for #179.
@rsmarples
Copy link
Member

Does dhcpcd allow DHCP servers to send data from a port that is not 67? I know it is a very common misconception that DHCP servers MUST/SHOULD send traffic from port 67, but RFC 2131 makes no requirements or recommendations on what the source ports are. Similarly, RFC 8415 makes no requirements or recommendations on the source ports for DHCPv6 (i.e., servers can send data from a port that is not 547).

Stating this another way, RFC 2131 only says that servers MUST send traffic to port 68 on the client, the client MUST send traffic to port 67 on the server, RFC 8415 states servers MUST send traffic to port 546 on the client, and clients MUST send traffic to port 547 on the server.

I introduced checking the source port when I was improving the BPF code, but you're right there is no mention of what the source port must be so I've removed the check in the above commit on the master branch.

The first kernel warning is an OpenBSD header that needs fixing in OpenBSD.
The last warning is fine - OpenBSD is correctly telling us this but a random determinstic value is what we want per RFC 3927 section 2.1
The other warning are about the facilities dhcpcd wants that OpenBSD lacks.
None of the warnings shown should cause dhcpcd to crash.

Have you been able to get a crashdump with privsep disabled yet?

@zacknewman
Copy link
Author

I introduced checking the source port when I was improving the BPF code, but you're right there is no mention of what the source port must be so I've removed the check in the above commit on the master branch.

A similar fix is not required for DHCPv6? You are not enforcing source ports for it like you were for DHCP?

Have you been able to get a crashdump with privsep disabled yet?

No. As I have stated, this crashes very consistently in 5 day intervals; so it won't crash again until 5/10/2023 around 7:00 PM MDT. Is this required? I find it hard to believe that the reason for these crashes is not due to the incorrect enforcement of DHCP source port. This is a production environment, so I would rather not run it without privsep. Unless you are confident for some reason that this was not the problem, then I would prefer to re-compile from the master branch which has the port fix.

@zacknewman
Copy link
Author

zacknewman commented May 8, 2023

Also, do you happen to know if a DHCP server is even allowed to send unsolicited traffic to a client? While tcpdump captured 18:56:55.920416 104.152.52.138.54574 > 73.14.244.157.68: bootp-#0 htype-#0 hlen:0 [|bootp] [tos 0x20], it did not capture any other IPv4 UDP traffic with destination ports 67 or 68 before that. In particular my router did not send a renewal request or anything else to UDP port 67. I tried perusing RFC 2131, and I couldn't find anything that allowed for such a thing. If not, then I guess I should contact my ISP and request they block such traffic. According to them, they don't block traffic to UDP port 68, so anyone can attempt to send DHCP traffic to anyone else so long as they don't send such traffic from UDP port 67. Also, since DHCP is handled by BPF, I can't even use pf to block it.

@rsmarples
Copy link
Member

Yes, DHCP servers can send unsolicited messages using the FORCERENEW op code.

@rsmarples
Copy link
Member

DHCPv6 does not use BPF and does not need the similar fix.

@zacknewman
Copy link
Author

OK. I went ahead and compiled master and am running it. I will report back. If it ends up crashing again, then I will re-compile master without privsep to capture the crash dump.

@zacknewman
Copy link
Author

Well, it still crashed. :(

I re-compiled master without privsep. I'll report back on 5/15/2023 around 7:00 PM MDT with the crash dump.

@zacknewman
Copy link
Author

I should probably create a new issue or discussion, but I also noticed that despite having waitip 6 4 in /etc/dhcpcd.conf; it doesn't wait to get an IPv6 address. I don't recall having this problem with 9.4.1. Here is what /var/log/daemon shows when booting:

May 10 19:33:35 router dhcpcd[17177]: dhcpcd-10.0.1 starting
May 10 19:33:35 router dhcpcd[73937]: DUID 00:04:7d:eb:51:f7:c1:3f:11:ed:bd:36:7c:c2:55:62:31:fb
May 10 19:33:35 router dhcpcd[73937]: no interfaces have a carrier
May 10 19:33:35 router dhcpcd[73937]: em0: waiting for carrier
May 10 19:33:35 router dhcpcd[73937]: em0: carrier acquired
May 10 19:33:35 router dhcpcd[73937]: em0: IAID 00:00:00:00
May 10 19:33:35 router dhcpcd[73937]: em0: rebinding prior DHCPv6 lease
May 10 19:33:35 router dhcpcd[73937]: em0: carrier lost
May 10 19:33:35 router dhcpcd[73937]: em0: carrier acquired
May 10 19:33:35 router dhcpcd[73937]: em0: IAID 00:00:00:00
May 10 19:33:35 router dhcpcd[73937]: em0: rebinding prior DHCPv6 lease
May 10 19:33:35 router dhcpcd[73937]: em0: soliciting an IPv6 router
May 10 19:33:35 router dhcpcd[73937]: em0: Router Advertisement from fe80::21c:73ff:fe00:99
May 10 19:33:35 router dhcpcd[73937]: em0: advertised MTU 9192 is greater than link MTU 1500
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:42::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:43::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:d:4::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:44::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to 2001:558:1028:3e9f::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:41::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:46::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:45::/64
May 10 19:33:35 router dhcpcd[73937]: em0: rebinding lease of 73.14.244.157
May 10 19:33:35 router dhcpcd[73937]: em0: probing address 73.14.244.157/23
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router dhcpcd[73937]: em0: leased 73.14.244.157 for 3461 seconds
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to 73.14.244.0/23
May 10 19:33:35 router dhcpcd[73937]: em0: adding default route via 73.14.244.1
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router last message repeated 2 times
May 10 19:33:35 router dhcpcd[73937]: em0: failed to rebind prior DHCPv6 delegation
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router last message repeated 7 times
May 10 19:33:35 router dhcpcd[73937]: em0: adding default route via fe80::21c:73ff:fe00:99
May 10 19:33:36 router ntpd[78749]: listening on ::1
May 10 19:33:36 router ntpd[78749]: listening on fdb5:d87:ae42:1::1
May 10 19:33:36 router ntpd[78749]: listening on 127.0.0.1
May 10 19:33:36 router ntpd[78749]: listening on 192.168.1.1
May 10 19:33:36 router ntpd[78749]: ntp engine ready
May 10 19:33:37 router ntpd[78749]: constraint reply from 2620:fe::fe: offset 0.865761
May 10 19:33:37 router ntpd[78749]: constraint reply from 9.9.9.9: offset 0.858526
May 10 19:33:39 router ntpd[78749]: ntp: couldn't bind to IPv6 query address: 2001:558:6040:b9:7dd1:10d5:7434:57d5: Can't assign requested address
May 10 19:33:39 router ntpd[97765]: Terminating
May 10 19:33:39 router savecore: no core dump
May 10 19:33:39 router rad[59423]: startup
May 10 19:33:39 router httpd[79025]: startup
May 10 19:33:49 router dhcpcd[73937]: em0: ADV 2001:558:6040:b9:7dd1:10d5:7434:57d5/128 from 2001:558:4070:b8::10
May 10 19:33:49 router dhcpcd[73937]: em0: REPLY6 received from 2001:558:4070:b8::10
May 10 19:33:49 router dhcpcd[73937]: em0: adding address 2001:558:6040:b9:7dd1:10d5:7434:57d5/128
May 10 19:33:49 router dhcpcd[73937]: em0: renew in 2353, rebind in 4513, expire in 5953 seconds
May 10 19:33:49 router dhcpcd[73937]: lo0: adding reject route to 2601:283:4e00:b1d0::/60 via ::1
May 10 19:33:49 router dhcpcd[73937]: em0: delegated prefix 2601:283:4e00:b1d0::/60

This forces me to have to remember to manually start ntpd after booting since it crashes due to it requiring an IPv6 address to have been assigned to em0.

@rsmarples
Copy link
Member

rsmarples commented May 11, 2023

Could I get you to add debug to the top of your /etc/dhcpcd.conf please and restart dhcpcd? It will give a log more debug output and hopefully the last point before it crashes.
Optionally add logfile /var/log/dhcpcd.log as well to get just the dhcpcd logs.

You'll need to restart dhcpcd for it to take effect.

@zacknewman
Copy link
Author

zacknewman commented May 11, 2023

I receive router advertisements like every second, so I was logging too much for my liking. I will restart dhcpcd with debug Monday afternoon, so that I only have to log a few hours. As I mentioned previously, restarting dhcpcd doesn't "reset" this 5-day crash thing.

@zacknewman
Copy link
Author

Well, it did not crash when it would normally crash. There are two reasons that I can think of for why it did not crash when it normally would. One, it won't crash without privsep enabled which sucks since that puts us in a Catch-22 situation: crash without logs when privsep is enabled or log but not crash without privsep. The second is I experienced a power outage that lasted almost 5 hours this morning, so perhaps that prevented the crash. I'll keep logging for another five days to see if it crashes.

@zacknewman
Copy link
Author

zacknewman commented May 16, 2023

Not only did it not crash, but now my leases are back to 4 days. I think that is a coincidence and has nothing to do with disabling privsep, but I am > 99% confident that privsep breaks dhcpcd -U <interface>. Since running version 10.0.1 as well as master, I have not once been able to get dhcpcd -U em0 to display DHCPv6 information. It would always time out after outputting DHCP information. On 9.4.1 it would fail at least 75% of the time, but on 10.0.1 it never succeeds. Now that I have privsep disabled it hasn't timed out a single time.

@zacknewman
Copy link
Author

zacknewman commented May 16, 2023

Here is everything that is "interesting" in dhcpcd.log:

router$ rg -v '^.+em0: Router Advertisement.+$' dhcpcd.log | rg -v '^.+em0: advertised MTU.+$'
May 15 12:53:00 [887]: dhcpcd-10.0.1 starting
May 15 12:53:00 [4563]: spawned manager process on PID 4563
May 15 12:53:00 [4563]: DUID 00:04:7d:eb:51:f7:c1:3f:11:ed:bd:36:7c:c2:55:62:31:fb
May 15 12:53:00 [4563]: no interfaces have a carrier
May 15 12:53:00 [4563]: em0: waiting for an IPv4 address
May 15 12:53:00 [4563]: em0: waiting for carrier
May 15 12:53:00 [4563]: em0: carrier acquired
May 15 12:53:00 [4563]: em0: interface updated
May 15 12:53:00 [4563]: em0: IAID 00:00:00:00
May 15 12:53:00 [4563]: em0: delaying IPv6 router solicitation for 0.6 seconds
May 15 12:53:00 [4563]: em0: delaying DHCPv6 for LL address
May 15 12:53:00 [4563]: em0: delaying IPv4 for 0.6 seconds
May 15 12:53:00 [4563]: em0: reading lease: /var/db/dhcpcd/em0.lease6
May 15 12:53:00 [4563]: em0: rebinding prior DHCPv6 lease
May 15 12:53:00 [4563]: em0: delaying REBIND6 (xid 0xa5b14d), next in 1.1 seconds
May 15 12:53:00 [4563]: em0: carrier lost
May 15 12:53:04 [4563]: em0: carrier acquired
May 15 12:53:04 [4563]: em0: interface updated
May 15 12:53:04 [4563]: em0: IAID 00:00:00:00
May 15 12:53:04 [4563]: em0: delaying IPv6 router solicitation for 0.8 seconds
May 15 12:53:04 [4563]: em0: reading lease: /var/db/dhcpcd/em0.lease6
May 15 12:53:04 [4563]: em0: rebinding prior DHCPv6 lease
May 15 12:53:04 [4563]: em0: delaying REBIND6 (xid 0x543bbf), next in 1.0 seconds
May 15 12:53:04 [4563]: em0: delaying IPv4 for 0.2 seconds
May 15 12:53:04 [4563]: em0: reading lease: /var/db/dhcpcd/em0.lease
May 15 12:53:04 [4563]: em0: rebinding lease of 73.14.244.157
May 15 12:53:04 [4563]: em0: sending REQUEST (xid 0x823e3ad3), next in 3.9 seconds
May 15 12:53:05 [4563]: em0: soliciting an IPv6 router
May 15 12:53:05 [4563]: em0: sending Router Solicitation
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:42::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:43::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:d:4::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:44::/64
May 15 12:53:05 [4563]: em0: adding route to 2001:558:1028:3e9f::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:41::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:46::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:45::/64
May 15 12:53:05 [4563]: em0: adding default route via fe80::21c:73ff:fe00:99
May 15 12:53:05 [4563]: em0: multicasting REBIND6 (xid 0x543bbf), next in 1.1 seconds
May 15 12:53:05 [4563]: em0: REPLY6 received from 2001:558:4070:b8::10
May 15 12:53:05 [4563]: em0: adding address 2001:558:6040:b9:7dd1:10d5:7434:57d5/128
May 15 12:53:05 [4563]: em0: pltime 190799 seconds, vltime 190799 seconds
May 15 12:53:05 [4563]: em0: renew in 17999, rebind in 121679, expire in 190799 seconds
May 15 12:53:05 [4563]: lo0: adding reject route to 2601:283:4e00:b1d0::/60 via ::1
May 15 12:53:05 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease6
May 15 12:53:05 [4563]: em0: delegated prefix 2601:283:4e00:b1d0::/60
May 15 12:53:05 [4563]: em0: waiting for DHCPv6 DAD to complete
May 15 12:53:06 [4563]: em0: DHCPv6 DAD completed
May 15 12:53:08 [4563]: em0: sending REQUEST (xid 0x823e3ad3), next in 9.0 seconds
May 15 12:53:08 [4563]: em0: acknowledged 73.14.244.157 from 96.113.84.152
May 15 12:53:08 [4563]: em0: probing address 73.14.244.157/23
May 15 12:53:08 [4563]: em0: probing for 73.14.244.157
May 15 12:53:08 [4563]: em0: ARP probing 73.14.244.157 (1 of 3), next in 1.2 seconds
May 15 12:53:09 [4563]: em0: ARP probing 73.14.244.157 (2 of 3), next in 1.2 seconds
May 15 12:53:10 [4563]: em0: ARP probing 73.14.244.157 (3 of 3), next in 2.0 seconds
May 15 12:53:12 [4563]: em0: DAD completed for 73.14.244.157
May 15 12:53:12 [4563]: em0: leased 73.14.244.157 for 186842 seconds
May 15 12:53:12 [4563]: em0: renew in 14042 seconds, rebind in 143642 seconds
May 15 12:53:12 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease
May 15 12:53:12 [4563]: em0: adding IP address 73.14.244.157/23 broadcast 255.255.255.255
May 15 12:53:12 [4563]: em0: adding route to 73.14.244.0/23
May 15 12:53:12 [4563]: em0: adding default route via 73.14.244.1
May 15 12:53:12 [4563]: em0: ARP announcing 73.14.244.157 (1 of 2), next in 2.0 seconds
May 15 12:53:14 [4563]: em0: ARP announcing 73.14.244.157 (2 of 2)
May 15 16:47:14 [4563]: em0: renewing lease of 73.14.244.157
May 15 16:47:14 [4563]: em0: sending REQUEST (xid 0xd6ac90e0), next in 3.4 seconds
May 15 16:47:14 [4563]: em0: acknowledged 73.14.244.157 from 96.113.84.152
May 15 16:47:14 [4563]: em0: leased 73.14.244.157 for 345600 seconds
May 15 16:47:14 [4563]: em0: renew in 172800 seconds, rebind in 302400 seconds
May 15 16:47:14 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease
May 15 16:47:14 [4563]: em0: IP address 73.14.244.157/23 already exists
May 15 16:47:14 [4563]: em0: ARP announcing 73.14.244.157 (1 of 2), next in 2.0 seconds
May 15 16:47:16 [4563]: em0: ARP announcing 73.14.244.157 (2 of 2)
May 15 17:53:04 [4563]: em0: multicasting RENEW6 (xid 0x644eb6), next in 10.3 seconds
May 15 17:53:04 [4563]: em0: REPLY6 received from 2001:558:4070:b8::10
May 15 17:53:04 [4563]: em0: adding address 2001:558:6040:b9:7dd1:10d5:7434:57d5/128
May 15 17:53:04 [4563]: em0: pltime 345600 seconds, vltime 345600 seconds
May 15 17:53:04 [4563]: em0: renew in 172800, rebind in 276480, expire in 345600 seconds
May 15 17:53:04 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease6
May 15 17:53:04 [4563]: em0: delegated prefix 2601:283:4e00:b1d0::/60
May 15 19:18:00 [4563]: em0: truncated packet (0) from 104.152.52.210

Notice the last line above. That is the same packet that tcpdump caught before. I am pretty sure that is what caused it to crash the last time. If you look at the timestamp, that is around the time dhcpcd would normally crash. My guess is that packet interacts badly with privsep and causes the program to terminate. To avoid having to scroll up in this thread, this is the packet I am referring to:

18:56:55.920416 104.152.52.138.54574 > 73.14.244.157.68: bootp-#0 htype-#0 hlen:0 [|bootp] [tos 0x20]

While the IPv4 address is not exactly the same, it is from the same /24 IPv4 network. That IP also belongs to Rethem Hosting LLC.

My original hypothesis was the UDP port not being 67 caused it to crash; however I already tried running master since the bug fix that no longer required the port to be 67, and it still crashed. Perhaps there is something with that port in the privsep code? Or perhaps that was a red herring (but still a bug), and it has to do with the "truncated packet" that privsep does not like?

rsmarples added a commit that referenced this issue May 16, 2023
They should be handled gracefully without privsep anyway.
Fix for #179.
@rsmarples
Copy link
Member

That's an excellent catch! So the privsep processes exit gracefully on a zero length packet which is exactly what you received so the above commit should fix things up for you now :)

Not only did it not crash, but now my leases are back to 4 days. I think that is a coincidence and has nothing to do with disabling privsep, but I am > 99% confident that privsep breaks dhcpcd -U . Since running version 10.0.1 as well as master, I have not once been able to get dhcpcd -U em0 to display DHCPv6 information. It would always time out after outputting DHCP information. On 9.4.1 it would fail at least 75% of the time, but on 10.0.1 it never succeeds. Now that I have privsep disabled it hasn't timed out a single time.

Please open a new ticket for this.

@zacknewman
Copy link
Author

I re-compiled master and kept privsep. I'll report back in a week if it hasn't crashed.

@zacknewman
Copy link
Author

Received a truncated packet at 6:44 PM MDT, and dhcpcd did not crash. It is already almost 8:00 PM MDT, and dhcpcd has not crashed which it normally would have by now. I will close this bug. If it starts to crash again, I'll open a new a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants