-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dhcpcd crashes every 5 days #179
Comments
I should add that during the ≈ 5 days that
Finally, if there is one aspect about my setup that I have always felt "uneasy" about, it is the
The way that I interpret this means that |
The The master branch here on github has new process management code for privsep which may help here.. |
I've left the master branch running for over 5 days and it seems to be working without issue on OpenBSD 6.6 |
I didn't mean to "ghost" you Roy. I had some stuff come up personally and professionally that I was/am dealing with. I know I don't have to explain that to you, but I want to to apologize anyway. I am in the middle of rotating out the machine that I have been using as my router; and when that is done, I'll try to replicate this problem in a controlled setting by hooking up the old router to an SBC running a DHCP and DCHPv6 server. Assuming I can replicate this problem, I'll build I don't want you to be held up on me though, so feel free to close this if you'd like. I can always open a new bug report. |
OK, I've got dhcpcd setup in OpenBSD-7.3 and I'll keep it running as long as I can. |
I still get crashes on OpenBSD 7.3, and it is normally every 5 days still. Sometimes it'll be 4 days or even 2 days. I was assigned new IPs from my ISP, and the lease is currently only 2 hours instead of 4 days like it was before; but the problem is still there. I also experimented with manually terminating dhcpcd every 3 days to see if that would "reset" this 5-day crash thing, but it does not: it would crash 2 days after I manually brought it back down. Point being it doesn't appear to be related to the length of the lease, using a "long-time assigned" IP, or having the process run a long time. Perhaps there is something specific with the packets that my ISP sends (e.g., they send some kind of packet that forces a disconnect)? |
how bad would it be to run a packet capture for n days to find out? and would running a protocol fuzzer get us there quicker? |
The penultimate time If there is a better command, then I am all ears. Just for reference |
I should clarify that the version of |
If it crashes again and you can't get a log, crashdump or anything useful you can try to do two things
|
as an aside: does privsep prevent crashdumps on all platforms, or just on OpenBSD? |
All platforms. |
OK. I won't run that router$ git clone https://github.com/NetworkConfiguration/dhcpcd && cd dhcpcd/ && ./configure && ./make
router$ su root
Password:
router# rcctl stop dhcpcd && ifconfig em0 down && ifconfig em0 -inet6 && ifconfig em0 -inet && rm -rf /var/run/dhcpcd && mv /usr/local/sbin/dhcpcd /home/zack/dhcp && mv src/dhcpcd /usr/local/sbin && chown root:daemon /usr/local/sbin/dhcpcd && rcctl start dhcpcd I will report back. |
It crashed in the expected time frame that 9.4.1 was going to crash. Unlike 9.4.1 though, I don't see Jan 26 03:24:19 router dhcpcd[41538]: ps_inet_dodispatch: Connection reset by peer
Jan 26 03:24:19 router dhcpcd[41538]: control_free: No such file or directory
Jan 26 03:24:19 router dhcpcd[41538]: ps_sendpsmmsg: Destination address required
Jan 26 03:24:19 router dhcpcd[41538]: ps_dostop: Destination address required in |
Wait, there actually is something in that
That IP belongs to Rethem Hosting LLC. My ISP is Xfinity/Comcast though. 19:13:01.666134 :: > ff02::1:ff62:31fb: icmp6: neighbor sol: who has fe80::7ec2:55ff:fe62:31fb
19:13:02.685399 fe80::7ec2:55ff:fe62:31fb > ff02::2: icmp6: router solicitation
19:13:02.803661 fe80::21c:73ff:fe00:99 > ff02::1: icmp6: router advertisement
19:13:03.596053 0.0.0.0.68 > 255.255.255.255.67: xid:0x68427ec [|bootp]
19:13:03.654892 96.120.140.45.67 > 73.14.244.157.68: xid:0x68427ec Y:73.14.244.157 G:96.120.140.45 [|bootp] [tos 0x10]
19:13:03.695587 fe80::7ec2:55ff:fe62:31fb.546 > ff02::1:2.547: DHCPv6 Rebind xid 1f4cf9 [hlim 1]
19:13:03.753507 2001:558:4070:b8::10.547 > fe80::7ec2:55ff:fe62:31fb.546: DHCPv6 Reply xid 1f4cf9 [flowlabel 0x14181]
19:13:03.753918 :: > ff02::1:ff34:57d5: icmp6: neighbor sol: who has 2001:558:6040:b9:7dd1:10d5:7434:57d5
|
Does Stating this another way, RFC 2131 only says that servers MUST send traffic to port 68 on the client, the client MUST send traffic to port 67 on the server, RFC 8415 states servers MUST send traffic to port 546 on the client, and clients MUST send traffic to port 547 on the server. |
I forgot to report the warnings that were generated when I compiled if.c:909:30: warning: cast from 'char *' to 'struct cmsghdr *' increases required alignment from 1 to 4 [-Wcast-align]
cm = (struct cmsghdr *)CMSG_NXTHDR(msg, cm))
^~~~~~~~~~~~~~~~~~~~
/usr/include/sys/socket.h:537:6: note: expanded from macro 'CMSG_NXTHDR'
(struct cmsghdr *)((char *)(cmsg) + _ALIGN((cmsg)->cmsg_len))) if-bsd.c:736:2: warning: kernel does not allow IPv6 address sharing [-W#warnings]
#warning kernel does not allow IPv6 address sharing
^
if-bsd.c:1405:2: warning: No SIOCGIFALIAS support [-W#warnings]
#warning No SIOCGIFALIAS support
^
if-bsd.c:1602:2: warning: kernel does not support RTM_MISS DST filtering [-W#warnings]
#warning kernel does not support RTM_MISS DST filtering
^
if-bsd.c:1723:2: warning: OS does not allow setting of RA bits hoplimit, retrans or reachable [-W#warnings]
#warning OS does not allow setting of RA bits hoplimit, retrans or reachable ipv6.c:91:4: warning: kernel does not report IPv6 address flag changes [-W#warnings]
# warning kernel does not report IPv6 address flag changes
^
ipv6.c:92:4: warning: polling tentative address flags periodically [-W#warnings]
# warning polling tentative address flags periodically ipv6nd.c:579:2: warning: kernel does not support userland sending ND6 advertisements [-W#warnings]
#warning kernel does not support userland sending ND6 advertisements ipv4ll.c:77(ipv4ll.o:(ipv4ll_pickaddr)): warning: random() may return deterministic values, is that what you want? |
RFC2131 and updates make no mention of what the source port should or must be. Update for #179.
I introduced checking the source port when I was improving the BPF code, but you're right there is no mention of what the source port must be so I've removed the check in the above commit on the master branch. The first kernel warning is an OpenBSD header that needs fixing in OpenBSD. Have you been able to get a crashdump with privsep disabled yet? |
A similar fix is not required for DHCPv6? You are not enforcing source ports for it like you were for DHCP?
No. As I have stated, this crashes very consistently in 5 day intervals; so it won't crash again until 5/10/2023 around 7:00 PM MDT. Is this required? I find it hard to believe that the reason for these crashes is not due to the incorrect enforcement of DHCP source port. This is a production environment, so I would rather not run it without privsep. Unless you are confident for some reason that this was not the problem, then I would prefer to re-compile from the |
Also, do you happen to know if a DHCP server is even allowed to send unsolicited traffic to a client? While |
Yes, DHCP servers can send unsolicited messages using the FORCERENEW op code. |
DHCPv6 does not use BPF and does not need the similar fix. |
OK. I went ahead and compiled |
Well, it still crashed. :( I re-compiled |
I should probably create a new issue or discussion, but I also noticed that despite having May 10 19:33:35 router dhcpcd[17177]: dhcpcd-10.0.1 starting
May 10 19:33:35 router dhcpcd[73937]: DUID 00:04:7d:eb:51:f7:c1:3f:11:ed:bd:36:7c:c2:55:62:31:fb
May 10 19:33:35 router dhcpcd[73937]: no interfaces have a carrier
May 10 19:33:35 router dhcpcd[73937]: em0: waiting for carrier
May 10 19:33:35 router dhcpcd[73937]: em0: carrier acquired
May 10 19:33:35 router dhcpcd[73937]: em0: IAID 00:00:00:00
May 10 19:33:35 router dhcpcd[73937]: em0: rebinding prior DHCPv6 lease
May 10 19:33:35 router dhcpcd[73937]: em0: carrier lost
May 10 19:33:35 router dhcpcd[73937]: em0: carrier acquired
May 10 19:33:35 router dhcpcd[73937]: em0: IAID 00:00:00:00
May 10 19:33:35 router dhcpcd[73937]: em0: rebinding prior DHCPv6 lease
May 10 19:33:35 router dhcpcd[73937]: em0: soliciting an IPv6 router
May 10 19:33:35 router dhcpcd[73937]: em0: Router Advertisement from fe80::21c:73ff:fe00:99
May 10 19:33:35 router dhcpcd[73937]: em0: advertised MTU 9192 is greater than link MTU 1500
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:42::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:43::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:d:4::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:44::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to 2001:558:1028:3e9f::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:41::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:46::/64
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to fd00:0:101:45::/64
May 10 19:33:35 router dhcpcd[73937]: em0: rebinding lease of 73.14.244.157
May 10 19:33:35 router dhcpcd[73937]: em0: probing address 73.14.244.157/23
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router dhcpcd[73937]: em0: leased 73.14.244.157 for 3461 seconds
May 10 19:33:35 router dhcpcd[73937]: em0: adding route to 73.14.244.0/23
May 10 19:33:35 router dhcpcd[73937]: em0: adding default route via 73.14.244.1
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router last message repeated 2 times
May 10 19:33:35 router dhcpcd[73937]: em0: failed to rebind prior DHCPv6 delegation
May 10 19:33:35 router dhcpcd[73937]: em0: no global addresses for default route
May 10 19:33:35 router last message repeated 7 times
May 10 19:33:35 router dhcpcd[73937]: em0: adding default route via fe80::21c:73ff:fe00:99
May 10 19:33:36 router ntpd[78749]: listening on ::1
May 10 19:33:36 router ntpd[78749]: listening on fdb5:d87:ae42:1::1
May 10 19:33:36 router ntpd[78749]: listening on 127.0.0.1
May 10 19:33:36 router ntpd[78749]: listening on 192.168.1.1
May 10 19:33:36 router ntpd[78749]: ntp engine ready
May 10 19:33:37 router ntpd[78749]: constraint reply from 2620:fe::fe: offset 0.865761
May 10 19:33:37 router ntpd[78749]: constraint reply from 9.9.9.9: offset 0.858526
May 10 19:33:39 router ntpd[78749]: ntp: couldn't bind to IPv6 query address: 2001:558:6040:b9:7dd1:10d5:7434:57d5: Can't assign requested address
May 10 19:33:39 router ntpd[97765]: Terminating
May 10 19:33:39 router savecore: no core dump
May 10 19:33:39 router rad[59423]: startup
May 10 19:33:39 router httpd[79025]: startup
May 10 19:33:49 router dhcpcd[73937]: em0: ADV 2001:558:6040:b9:7dd1:10d5:7434:57d5/128 from 2001:558:4070:b8::10
May 10 19:33:49 router dhcpcd[73937]: em0: REPLY6 received from 2001:558:4070:b8::10
May 10 19:33:49 router dhcpcd[73937]: em0: adding address 2001:558:6040:b9:7dd1:10d5:7434:57d5/128
May 10 19:33:49 router dhcpcd[73937]: em0: renew in 2353, rebind in 4513, expire in 5953 seconds
May 10 19:33:49 router dhcpcd[73937]: lo0: adding reject route to 2601:283:4e00:b1d0::/60 via ::1
May 10 19:33:49 router dhcpcd[73937]: em0: delegated prefix 2601:283:4e00:b1d0::/60 This forces me to have to remember to manually start |
Could I get you to add You'll need to restart dhcpcd for it to take effect. |
I receive router advertisements like every second, so I was logging too much for my liking. I will restart |
Well, it did not crash when it would normally crash. There are two reasons that I can think of for why it did not crash when it normally would. One, it won't crash without privsep enabled which sucks since that puts us in a Catch-22 situation: crash without logs when privsep is enabled or log but not crash without privsep. The second is I experienced a power outage that lasted almost 5 hours this morning, so perhaps that prevented the crash. I'll keep logging for another five days to see if it crashes. |
Not only did it not crash, but now my leases are back to 4 days. I think that is a coincidence and has nothing to do with disabling privsep, but I am > 99% confident that privsep breaks |
Here is everything that is "interesting" in router$ rg -v '^.+em0: Router Advertisement.+$' dhcpcd.log | rg -v '^.+em0: advertised MTU.+$'
May 15 12:53:00 [887]: dhcpcd-10.0.1 starting
May 15 12:53:00 [4563]: spawned manager process on PID 4563
May 15 12:53:00 [4563]: DUID 00:04:7d:eb:51:f7:c1:3f:11:ed:bd:36:7c:c2:55:62:31:fb
May 15 12:53:00 [4563]: no interfaces have a carrier
May 15 12:53:00 [4563]: em0: waiting for an IPv4 address
May 15 12:53:00 [4563]: em0: waiting for carrier
May 15 12:53:00 [4563]: em0: carrier acquired
May 15 12:53:00 [4563]: em0: interface updated
May 15 12:53:00 [4563]: em0: IAID 00:00:00:00
May 15 12:53:00 [4563]: em0: delaying IPv6 router solicitation for 0.6 seconds
May 15 12:53:00 [4563]: em0: delaying DHCPv6 for LL address
May 15 12:53:00 [4563]: em0: delaying IPv4 for 0.6 seconds
May 15 12:53:00 [4563]: em0: reading lease: /var/db/dhcpcd/em0.lease6
May 15 12:53:00 [4563]: em0: rebinding prior DHCPv6 lease
May 15 12:53:00 [4563]: em0: delaying REBIND6 (xid 0xa5b14d), next in 1.1 seconds
May 15 12:53:00 [4563]: em0: carrier lost
May 15 12:53:04 [4563]: em0: carrier acquired
May 15 12:53:04 [4563]: em0: interface updated
May 15 12:53:04 [4563]: em0: IAID 00:00:00:00
May 15 12:53:04 [4563]: em0: delaying IPv6 router solicitation for 0.8 seconds
May 15 12:53:04 [4563]: em0: reading lease: /var/db/dhcpcd/em0.lease6
May 15 12:53:04 [4563]: em0: rebinding prior DHCPv6 lease
May 15 12:53:04 [4563]: em0: delaying REBIND6 (xid 0x543bbf), next in 1.0 seconds
May 15 12:53:04 [4563]: em0: delaying IPv4 for 0.2 seconds
May 15 12:53:04 [4563]: em0: reading lease: /var/db/dhcpcd/em0.lease
May 15 12:53:04 [4563]: em0: rebinding lease of 73.14.244.157
May 15 12:53:04 [4563]: em0: sending REQUEST (xid 0x823e3ad3), next in 3.9 seconds
May 15 12:53:05 [4563]: em0: soliciting an IPv6 router
May 15 12:53:05 [4563]: em0: sending Router Solicitation
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:42::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:43::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:d:4::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:44::/64
May 15 12:53:05 [4563]: em0: adding route to 2001:558:1028:3e9f::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:41::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:46::/64
May 15 12:53:05 [4563]: em0: adding route to fd00:0:101:45::/64
May 15 12:53:05 [4563]: em0: adding default route via fe80::21c:73ff:fe00:99
May 15 12:53:05 [4563]: em0: multicasting REBIND6 (xid 0x543bbf), next in 1.1 seconds
May 15 12:53:05 [4563]: em0: REPLY6 received from 2001:558:4070:b8::10
May 15 12:53:05 [4563]: em0: adding address 2001:558:6040:b9:7dd1:10d5:7434:57d5/128
May 15 12:53:05 [4563]: em0: pltime 190799 seconds, vltime 190799 seconds
May 15 12:53:05 [4563]: em0: renew in 17999, rebind in 121679, expire in 190799 seconds
May 15 12:53:05 [4563]: lo0: adding reject route to 2601:283:4e00:b1d0::/60 via ::1
May 15 12:53:05 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease6
May 15 12:53:05 [4563]: em0: delegated prefix 2601:283:4e00:b1d0::/60
May 15 12:53:05 [4563]: em0: waiting for DHCPv6 DAD to complete
May 15 12:53:06 [4563]: em0: DHCPv6 DAD completed
May 15 12:53:08 [4563]: em0: sending REQUEST (xid 0x823e3ad3), next in 9.0 seconds
May 15 12:53:08 [4563]: em0: acknowledged 73.14.244.157 from 96.113.84.152
May 15 12:53:08 [4563]: em0: probing address 73.14.244.157/23
May 15 12:53:08 [4563]: em0: probing for 73.14.244.157
May 15 12:53:08 [4563]: em0: ARP probing 73.14.244.157 (1 of 3), next in 1.2 seconds
May 15 12:53:09 [4563]: em0: ARP probing 73.14.244.157 (2 of 3), next in 1.2 seconds
May 15 12:53:10 [4563]: em0: ARP probing 73.14.244.157 (3 of 3), next in 2.0 seconds
May 15 12:53:12 [4563]: em0: DAD completed for 73.14.244.157
May 15 12:53:12 [4563]: em0: leased 73.14.244.157 for 186842 seconds
May 15 12:53:12 [4563]: em0: renew in 14042 seconds, rebind in 143642 seconds
May 15 12:53:12 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease
May 15 12:53:12 [4563]: em0: adding IP address 73.14.244.157/23 broadcast 255.255.255.255
May 15 12:53:12 [4563]: em0: adding route to 73.14.244.0/23
May 15 12:53:12 [4563]: em0: adding default route via 73.14.244.1
May 15 12:53:12 [4563]: em0: ARP announcing 73.14.244.157 (1 of 2), next in 2.0 seconds
May 15 12:53:14 [4563]: em0: ARP announcing 73.14.244.157 (2 of 2)
May 15 16:47:14 [4563]: em0: renewing lease of 73.14.244.157
May 15 16:47:14 [4563]: em0: sending REQUEST (xid 0xd6ac90e0), next in 3.4 seconds
May 15 16:47:14 [4563]: em0: acknowledged 73.14.244.157 from 96.113.84.152
May 15 16:47:14 [4563]: em0: leased 73.14.244.157 for 345600 seconds
May 15 16:47:14 [4563]: em0: renew in 172800 seconds, rebind in 302400 seconds
May 15 16:47:14 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease
May 15 16:47:14 [4563]: em0: IP address 73.14.244.157/23 already exists
May 15 16:47:14 [4563]: em0: ARP announcing 73.14.244.157 (1 of 2), next in 2.0 seconds
May 15 16:47:16 [4563]: em0: ARP announcing 73.14.244.157 (2 of 2)
May 15 17:53:04 [4563]: em0: multicasting RENEW6 (xid 0x644eb6), next in 10.3 seconds
May 15 17:53:04 [4563]: em0: REPLY6 received from 2001:558:4070:b8::10
May 15 17:53:04 [4563]: em0: adding address 2001:558:6040:b9:7dd1:10d5:7434:57d5/128
May 15 17:53:04 [4563]: em0: pltime 345600 seconds, vltime 345600 seconds
May 15 17:53:04 [4563]: em0: renew in 172800, rebind in 276480, expire in 345600 seconds
May 15 17:53:04 [4563]: em0: writing lease: /var/db/dhcpcd/em0.lease6
May 15 17:53:04 [4563]: em0: delegated prefix 2601:283:4e00:b1d0::/60
May 15 19:18:00 [4563]: em0: truncated packet (0) from 104.152.52.210 Notice the last line above. That is the same packet that
While the IPv4 address is not exactly the same, it is from the same /24 IPv4 network. That IP also belongs to Rethem Hosting LLC. My original hypothesis was the UDP port not being 67 caused it to crash; however I already tried running |
They should be handled gracefully without privsep anyway. Fix for #179.
That's an excellent catch! So the privsep processes exit gracefully on a zero length packet which is exactly what you received so the above commit should fix things up for you now :)
Please open a new ticket for this. |
I re-compiled |
Received a truncated packet at 6:44 PM MDT, and |
For over a year now,
dhcpcd
crashes every 5 days. I "solved" this by running a ksh script every 30 minutes that starts it up when it is not running.Basic machine info:
Error that is logged to
/var/log/daemon
when it crashes:I ran
sysctl kern.nosuidcoredump=2
, but unfortunately no crash dump was created in/var/crash/dhcpcd
.I allow all egress traffic, ICMPv6 and ICMP ingress traffic, IPv6 UDP ingress traffic on the external interface destined to port 546, and IPv4 UDP ingress traffic on the external interface destined to port 68. I am not sure what additional information is useful. When
dhcpcd
is started up, the following is logged to/var/log/daemon
:Potentially relevant: when I run
dhcpcd -U em0
, it times out more often than not between the Neighbor Discovery info and the DHCPv6 info. Regardless if it times out,/var/log/daemon
shows:Not sure if the
control_free
error being one of the errors that is shown when crashing is important.The text was updated successfully, but these errors were encountered: