Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jool causes boot to hang for five minutes #409

Closed
DanielG opened this issue Jul 22, 2023 · 15 comments
Closed

jool causes boot to hang for five minutes #409

DanielG opened this issue Jul 22, 2023 · 15 comments
Labels
Milestone

Comments

@DanielG
Copy link

DanielG commented Jul 22, 2023

Hi,

I'm observing some rather weird behaviour. If I modprobe jool during boot the system seems to hang for about 5min and then resume booting. You'll notice the kernel throws WARNING: CPU: 1 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0 just before the hang. I don't think that's jool related. If I disable the service(s) it's still happening and boot is normal.

Any idea what's going on?

Thanks,
--Daniel

Loading Linux 6.1.0-10-amd64 ...
Loading initial ramdisk ...     
[    0.000000] Linux version 6.1.0-10-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1 (2023-07-03)
[    0.000000] Command line: BOOT_IMAGE=/@rootfs/boot/vmlinuz-6.1.0-10-amd64 root=UUID=1c087fb1-cbda-4b4b-be70-9c94093bb63c ro rootflags=subvol=@rootfs console=ttyS0 console=tty1
...
[    8.412506] ixgbevf 0000:01:00.0: NIC Link is Up 10 Gbps
[    8.416575] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
[    8.806954] ixgbevf 0000:02:00.0: NIC Link is Up 10 Gbps
[    8.810903] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[   10.535751] ixgbevf 0000:03:00.0: NIC Link is Up 10 Gbps
[   10.539741] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
[   12.294510] jool_common: loading out-of-tree module taints kernel.
[   12.296899] jool_common: module verification failed: signature and/or required key missing - tainting kernel
[   12.305080] Jool: Core Jool v4.1.9.0 module inserted.
[   12.307984] NAT64 Jool v4.1.9.0 module inserted.
[   20.687031] ------------[ cut here ]------------
[   20.688316] WARNING: CPU: 1 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0
[   20.690187] Modules linked in: jool(OE) jool_common(OE) dummy nft_redir nft_masq nft_nat nf_log_syslog nft_log nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_limit intel_rapl_msr intel_rapl_common ghash_clmulni_intel sha512_ssse3 sha512_generic snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm iTCO_wdt snd_timer intel_pmc_bxt snd aesni_intel iTCO_vendor_support qxl crypto_simd soundcore virtio_rng pcspkr cryptd watchdog virtio_console drm_ttm_helper rng_core virtio_balloon ttm drm_kms_helper button evdev serio_raw binfmt_misc nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink loop fuse drm efi_pstore dm_mod configfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic crct10dif_pclmul xhci_pci crct10dif_common crc32_pclmul i2c_i801 crc32c_intel
[   20.691022]  xhci_hcd ahci libahci psmouse libata usbcore i2c_smbus lpc_ich usb_common virtio_scsi scsi_mod virtio_blk scsi_common virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring ixgbevf
[   20.705472] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE      6.1.0-10-amd64 #1  Debian 6.1.37-1
[   20.706473] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[   20.707377] RIP: 0010:__skb_flow_dissect+0xa91/0x1cd0
[   20.708067] Code: 00 bf 02 00 00 00 49 89 0c 12 66 89 78 02 41 8b 55 00 e9 e7 f6 ff ff 49 8b 44 24 18 48 85 c0 74 09 4c 8b 78 30 e9 30 f7 ff ff <0f> 0b e9 f6 f7 ff ff f6 c2 20 0f 84 e4 f6 ff ff 41 0f b7 45 0e 4c
[   20.710030] RSP: 0018:ffffa611000e0340 EFLAGS: 00010246
[   20.710742] RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffa611000e0458
[   20.711580] RDX: ffffffffb5640b00 RSI: ffff9a221215f800 RDI: 0000000000000000
[   20.712421] RBP: ffffa611000e0428 R08: 0000000000000000 R09: 0000000000000000
[   20.713262] R10: ffffa611000e0458 R11: 0000000000000001 R12: ffff9a221215f800
[   20.714102] R13: ffffffffb5640b00 R14: ffff9a22062a9480 R15: 0000000000000000
[   20.714942] FS:  0000000000000000(0000) GS:ffff9a223ec80000(0000) knlGS:0000000000000000
[   20.715842] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.716582] CR2: 0000556bdfcb1010 CR3: 0000000034648000 CR4: 00000000003506e0
[   20.717421] Call Trace:
[   20.717947]  <IRQ>
[   20.718430]  ? __warn+0x7d/0xc0
[   20.718988]  ? __skb_flow_dissect+0xa91/0x1cd0
[   20.719631]  ? report_bug+0xe6/0x170
[   20.720207]  ? handle_bug+0x41/0x70
[   20.720772]  ? exc_invalid_op+0x13/0x60
[   20.721359]  ? asm_exc_invalid_op+0x16/0x20
[   20.721990]  ? __skb_flow_dissect+0xa91/0x1cd0
[   20.722615]  __skb_get_hash+0x60/0x200
[   20.723195]  nft_trace_init+0xf9/0x110 [nf_tables]
[   20.723856]  nft_do_chain+0x553/0x5d0 [nf_tables]
[   20.724506]  nft_do_chain_inet+0xa4/0x110 [nf_tables]
[   20.725177]  nf_hook_slow+0x41/0xc0
[   20.725723]  __ip_local_out+0xdd/0x160
[   20.726299]  ? ip_output+0x130/0x130
[   20.726854]  ip_local_out+0x17/0x70
[   20.727391]  nf_send_reset+0x29f/0x2d0 [nf_reject_ipv4]
[   20.728049]  nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet]
[   20.728743]  nft_do_chain+0x198/0x5d0 [nf_tables]
[   20.729372]  nft_do_chain_inet+0xa4/0x110 [nf_tables]
[   20.730087]  nf_hook_slow+0x41/0xc0
[   20.730655]  ip_local_deliver+0xce/0x110
[   20.731254]  ? ip_protocol_deliver_rcu+0x230/0x230
[   20.731911]  ip_sublist_rcv_finish+0x7e/0x90
[   20.732522]  ip_sublist_rcv+0x183/0x230
[   20.733090]  ? ip_sublist_rcv+0x230/0x230
[   20.733664]  ip_list_rcv+0x139/0x170
[   20.734204]  __netif_receive_skb_list_core+0x29e/0x2c0
[   20.734856]  netif_receive_skb_list_internal+0x1cd/0x300
[   20.735517]  ? dev_gro_receive+0x3b1/0x730
[   20.736075]  napi_complete_done+0x6d/0x1a0
[   20.736623]  ixgbevf_poll+0x1067/0x1380 [ixgbevf]
[   20.737221]  ? blake2s_update+0x48/0xc0
[   20.737740]  __napi_poll+0x2b/0x160
[   20.738234]  net_rx_action+0x2a5/0x360
[   20.738733]  __do_softirq+0xf0/0x2fe
[   20.739215]  ? handle_edge_irq+0x9f/0x240
[   20.739693]  __irq_exit_rcu+0xc7/0x130
[   20.740181]  common_interrupt+0xb9/0xd0
[   20.740669]  </IRQ>
[   20.741013]  <TASK>
[   20.741357]  asm_common_interrupt+0x22/0x40
[   20.741877] RIP: 0010:native_safe_halt+0xb/0x10
[   20.742418] Code: 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 bf e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 59 59 5d 00 fb f4 <e9> 20 c2 3c 00 eb 07 0f 00 2d 49 59 5d 00 f4 e9 11 c2 3c 00 cc 0f
[   20.744260] RSP: 0018:ffffa6110009bed8 EFLAGS: 00000282
[   20.744866] RAX: ffffffffb4435740 RBX: ffff9a2201266600 RCX: 0000000000000000
[   20.745623] RDX: 4000000000000000 RSI: ffffffffb4d4105e RDI: ffffffffb4d1a58d
[   20.746392] RBP: 0000000000000001 R08: 0000000a0eb79338 R09: 00000000000000a7
[   20.747155] R10: 0000000000000000 R11: 000000000000004c R12: 0000000000000000
[   20.747916] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   20.748679]  ? __sched_text_end+0x6/0x6
[   20.749181]  default_idle+0xa/0x10
[   20.749648]  default_idle_call+0x36/0xf0
[   20.750528]  do_idle+0x225/0x2b0
[   20.751439]  cpu_startup_entry+0x19/0x20
[   20.752488]  start_secondary+0x130/0x150
[   20.753532]  secondary_startup_64_no_verify+0xe5/0xeb
[   20.754743]  </TASK>
[   20.755458] ---[ end trace 0000000000000000 ]---
[  308.534910] SIIT Jool v4.1.9.0 module inserted.
@DanielG
Copy link
Author

DanielG commented Jul 22, 2023

So the way I managed to trigger this is by starting the jool services via /etc/network/interfaces instead of directly via the system unit:

auto nat64
iface nat64 inet6 loopback
        pre-up    ip link add $IFACE type dummy
        post-down ip link del $IFACE

	address 2001:678:4d8:ac64:1::1/72
	
	up   systemctl start jool jool_siit
	down systemctl stop  jool jool_siit

	# Note: these are attached to this device for convenience
	# only. Jool will (TODO: Check) intercept these before we get to
	# routing.
        up ip -6 route add 2001:678:4d8:ac64::10.10.0.0/112 dev $IFACE proto static
        up ip -6 route add 2001:678:4d8:64::192.168.0.0/112 dev $IFACE proto static

I do this to have a convenient way to start/stop jool and have a place to install routes to be redistributed into my IGP.

In testing this I disabled the jool/jool_siit systemd units to make sure the DHCP script I have for dynamically updating pool4 doesn't break when jool is not loaded or no jool instance is active.

So my workaround for now is to just re-enable the jool services. It seems if jool is loaded before any interfaces go up it works alright.

One more, probably irrelevant, detail, I had already modified the jool units slightly: instead of After=network.target I have After=network-pre.target. Perhaps without this the bug might rear it's head again?

--Daniel

@DanielG
Copy link
Author

DanielG commented Jul 24, 2023

I've just observed the hang happening even when jool is modprob'ed before any interfaces come up:

[    0.000000] Linux version 6.1.0-10-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1 (2023-07-03)
...
[    8.453845] jool_common: loading out-of-tree module taints kernel.
[    8.456250] jool_common: module verification failed: signature and/or required key missing - tainting kernel
[    8.464125] Jool: Core Jool v4.1.9.0 module inserted.
[    8.467158] NAT64 Jool v4.1.9.0 module inserted.
[    8.543762] ixgbevf 0000:01:00.0: NIC Link is Up 10 Gbps
[    8.547918] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
[    9.017364] ixgbevf 0000:02:00.0: NIC Link is Up 10 Gbps
[    9.020713] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[   10.321697] ixgbevf 0000:03:00.0: NIC Link is Up 10 Gbps
[   10.325625] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready

--Daniel

@ydahhrk
Copy link
Member

ydahhrk commented Jul 25, 2023

What's your distro?

I don't think that's jool related.

I mean both warning and delay happen right between the jool modprobe and the jool_siit modprobe. Jool looks pretty sus to me.

If I disable the service(s) it's still happening and boot is normal.

What does "it" and "boot is normal" refer to?

The WARNING still happens but there is no 5 minute delay?

Or the 5 minute delay still happens but the WARNING disappears?

One more, probably irrelevant, detail, I had already modified the jool units slightly: instead of After=network.target I have After=network-pre.target.

What was the goal of this?

I mean... if Jool starts before the interface starts... wouldn't Jool intercept interface setup traffic? And maybe this is what's causing the delay?

Jool cannot ignore the addresses that belong to its own node if said addresses have not yet been assigned to the interface. That's how it tells them apart. Maybe it has something to do with that?

Do you have a means to enable Jool's debug logging while the delay is happening?

I've just observed the hang happening even when jool is modprob'ed before any interfaces come up:

What do you mean? There doesn't seem to be any 5 minute hang in this output.

@DanielG
Copy link
Author

DanielG commented Jul 25, 2023 via email

@DanielG
Copy link
Author

DanielG commented Jul 25, 2023

If I disable the service(s) it's still happening and boot is normal. What does "it" and "boot is normal" refer to? The WARNING still happens but there is no 5 minute delay?
"it": the 5min delay, "normal" no delay to ability to login via console (or otherwise). As I said, the WARNING is probably a red herring.

Oh, my bad I didn't read enough context. Fixed: "it": the printed WARNING. I.e. put another way if I disable jool and it's not loaded the WARNING still prints and the delay doesn't happen ergo it's (likely) unrelated to the delay.

@ydahhrk
Copy link
Member

ydahhrk commented Jul 26, 2023

Sure, I already did that when debugging #396, but I'm afraid I've forgotten how to do it. Do you have a docs link for me?

The normal way is

sudo jool global update logging-debug true

But that requires a running instance, so it's probably too late for your purposes.

The other way is enabling all debug logging by default by compiling the kernel modules with -DDEBUG:

make CFLAGS_MODULE=-DDEBUG

But I don't know if you can pass these kinds of flags to a Debian packaged module.

If it would help I'd be happy to give you ssh/libvirt access to a clone of this VM

Okay.

@DanielG
Copy link
Author

DanielG commented Aug 27, 2023

I compiled the modules with the debug target from src/mod/{common,nat64,siit} but I don't see anything interesting. In between the load of the nat64 and siit modules (where the hang happens) I see packets being dumped, so the system isn't hard locked or anything but still no clue as to what is hanging.

I also checked systemd-analyze blame to make sure this isn't a userspace problem and that just shows networking.service (which calls ifup) taking 5min.

One thing I note is -DDEBUG doesn't seem to enable messages like LOG_DEBUG("Inserting Core Jool..."); which would tell us if we're hanging during or before the modprobe jool_siit call.

Any other ideas?

--Daniel

@ydahhrk
Copy link
Member

ydahhrk commented Aug 28, 2023

One thing I note is -DDEBUG doesn't seem to enable messages like LOG_DEBUG("Inserting Core Jool..."); which would tell us if we're hanging during or before the modprobe jool_siit call.

That's a pretty healthy indication that -DDEBUG is not kicking in:

al@debian11:~$ make debug && sudo make install
...
al@debian11:~$ sudo modprobe jool_siit
al@debian11:~$ sudo dmesg -c
[ 3508.501898] Jool: Inserting Core Jool...
[ 3508.501901] Jool: Initializing common modules.
[ 3508.501947] Jool: Registering Generic Netlink family...
[ 3508.501973] Jool: Core Jool v4.1.10.0 module inserted.
[ 3508.502833] Inserting SIIT Jool...
[ 3508.502835] SIIT Jool v4.1.10.0 module inserted.

If enabling the flag is too much trouble, try raising the severity of the relevant logs:

LOG_DEBUG("Inserting Core Jool..."); -> pr_info("Inserting Core Jool...\n");

Probably mind the newline.

@ydahhrk
Copy link
Member

ydahhrk commented Aug 30, 2023

Ok, I replicated the bug. Sorry for the wait.

Some notes:

One thing I note is -DDEBUG doesn't seem to enable messages like LOG_DEBUG("Inserting Core Jool..."); which would tell us if we're hanging during or before the modprobe jool_siit call.

I think this happened because you have two versions of Jool installed: The debian package and the custom -DDEBUG one you compiled. When the two exist simultaneously, the Debian package takes precedence.

However, if you uninstall the Debian package, you lose the systemd unit files, so you have to reinstall them manually:

sudo curl https://raw.githubusercontent.com/NICMx/Jool/debian/debian/jool-tools.jool.service -o /lib/systemd/system/jool.service
sudo curl https://raw.githubusercontent.com/NICMx/Jool/debian/debian/jool-tools.jool_siit.service -o /lib/systemd/system/jool_siit.service

AND you have to rename the userspace clients in them, because the custom binaries are installed somewhere else:

sudo sed -i s:/usr/bin/jool:/usr/local/bin/jool: /lib/systemd/system/jool.service
sudo sed -i s:/usr/bin/jool_siit:/usr/local/bin/jool_siit: /lib/systemd/system/jool_siit.service

Make sure the configuration files are in place...

sudo mkdir -p /etc/jool
sudo tee /etc/jool/jool.conf << EOF
{
	"instance": "sample-nat64",
	"framework": "netfilter",
	"global": { "pool6": "64:ff9b::/96" }
}
EOF
sudo tee /etc/jool/jool_siit.conf << EOF
{
	"instance": "sample-siit",
	"framework": "netfilter"
}
EOF

Also force the old Jool out, and fulfill some systemd bureaucracy...

sudo modprobe -r jool jool_siit
sudo systemctl daemon-reload

Anyway, that logs the 5-minute gap a couple seconds before the first Jool DEBUG message.

I'll try researching more tomorrow.

@ydahhrk
Copy link
Member

ydahhrk commented Aug 30, 2023

Found it: https://github.com/NICMx/Jool/blob/debian/debian/jool-tools.jool.service#L4

The systemd unit file states Jool's service needs to start after the network has been initialized. You're starting the Jool service during network initialization. Ergo, they deadlock until timeout.

I see two solutions, and they both seem to work, but I don't know all the consequences:

  • Replace After=network.target with After=network-pre.target.
  • Remove After=network.target.

Any opinions?

Jool doesn't really have anything to do with the configuration of the interfaces, and can stand idle while there is no network.

But NAT64 Jool requires defrag, and depending on configuration, either Jool might depend on iptables. They're both kernel modules that probably belong in the category "firewall services".

@DanielG
Copy link
Author

DanielG commented Aug 30, 2023

/facepalm that's likely why I had After=network-pre.target in there at the beginning. How I forgot about why I had this is beyond understanding. Bloddy degenerating fleshy squish brain :]

Yeah I think network-pre.target is appropriate to use for jool. On my system it is odered like this:

Before=systemd-networkd.service NetworkManager.service connman.service shutdown.target networking.service network.target
After=dpdk.service netfilter-persistent.service nftables.service

So (eg) nftables will come up first then network-pre (and the jool if we use After=network-pre.target), then ifup though networking.service, all the connection managers and network.target.

Looks good to me.

But NAT64 Jool requires defrag

IIRC the kernel has a refcount for whether any module wants defrag and starts doing it if anyone requests it. iptables/nftables does this when any rules involving NAT are installed for example.

If jool needs defrag (does it really?) it should likely do the same thing and not depend on module load ordering, no?

FYI: Someone finally noticed the WARN flow_dissector thing: https://lore.kernel.org/lkml/e2b4ee49-0998-ff3b-f817-5076d0b43b77@gmail.com/T/

@ydahhrk
Copy link
Member

ydahhrk commented Aug 30, 2023

/facepalm that's likely why I had After=network-pre.target in there at the beginning. How I forgot about why I had this is beyond understanding. Bloddy degenerating fleshy squish brain :]

I'm on the same boat. I should have paid more attention to that comment of yours :/

IIRC the kernel has a refcount for whether any module wants defrag and starts doing it if anyone requests it. iptables/nftables does this when any rules involving NAT are installed for example.

Right.

If jool needs defrag (does it really?)

NAT64 Jool relies on defrag because it needs to know the transport header ports of all packets. If defrag is disabled, NAT64 Jool can't access the ports of nonzero fragment-offset fragments (because the TCP/UDP header is missing).

it should likely do the same thing and not depend on module load ordering, no?

I think you're right.

My concern is that I'm not fluent with systemd's philosophy and ruleset, so I'm afraid of another deadlock.

ie. Jool depends on defrag, but something on systemd forces defrag to start after wherever Jool plants itself.

Looks good to me.

Ok, then let's settle on After=network-pre.target.

@DanielG
Copy link
Author

DanielG commented Aug 30, 2023

NAT64 Jool relies on defrag

Right, I was thinking in SIIT terms.

My concern is that I'm not fluent with systemd's philosophy and ruleset, so I'm afraid of another deadlock. ie. Jool depends on defrag, but something on systemd forces defrag to start after wherever Jool plants itself.

Context: I've fought my fair share of battles with systemd and am a frequent reader of its (decent) man pages.

I'm not sure what you mean by "forces defrag to start". There's no service for that unless you mean firewall rules i.e nftables/iptabes.service?

Very few units declare an ordering on network-pre.target so I woudn't worry too much either way. Jf it's better for jool to start after the firewall this is where we want to be anyway, since nftables.service declares Before=network-pre.target (so the order would be nftables, then jool)

Honestly I'd also be fine with just leaving the jool units as they are, my config is pretty odd I'll admit : )

And this time I'll make a bloody note as to why I change my jool units to network-pre.target haha.

Thanks,
--Daniel

ydahhrk added a commit that referenced this issue Aug 31, 2023
The old After=network.target causes deadlock when users attempt to start
the service during network initialization.

Fixes #409.
@ydahhrk
Copy link
Member

ydahhrk commented Aug 31, 2023

Context: I've fought my fair share of battles with systemd and am a frequent reader of its (decent) man pages.

I'm not sure what you mean by "forces defrag to start". There's no service for that unless you mean firewall rules i.e nftables/iptabes.service?

Very few units declare an ordering on network-pre.target so I woudn't worry too much either way. Jf it's better for jool to start after the firewall this is where we want to be anyway, since nftables.service declares Before=network-pre.target (so the order would be nftables, then jool)

Thank you; I needed to hear this.

Honestly I'd also be fine with just leaving the jool units as they are, my config is pretty odd I'll admit : )

I think your configuration is pretty reasonable.

Patch is up, I'll have it released in 4.1.11, once #410 is settled.

@ydahhrk ydahhrk added the Bug label Dec 22, 2023
@ydahhrk ydahhrk added this to the 4.1.11 milestone Dec 22, 2023
@ydahhrk
Copy link
Member

ydahhrk commented Dec 24, 2023

Jool 4.1.11 has been released; closing.

@ydahhrk ydahhrk closed this as completed Dec 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants