Externally initiated IPv4 sessions in Stateful NAT64 mode crash the kernel #137

Closed
toreanderson opened this Issue Mar 10, 2015 · 8 comments

Projects

None yet

2 participants

@toreanderson
Contributor

I just had one of my Jool test nodes experience the following kernel panic:

[  498.576084] general protection fault: 0000 [#1] SMP 
[  498.576637] Modules linked in: jool(OX) nf_defrag_ipv6 nf_defrag_ipv4 cirrus ttm drm_kms_helper drm syscopyarea sysfillrect i2c_piix4 sysimgblt mac_hid serio_raw lp parport psmouse pata_acpi floppy
[  498.578742] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OX 3.13.0-46-generic #77-Ubuntu
[  498.579561] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  498.580049] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[  498.580049] RIP: 0010:[<ffffffffa010de42>]  [<ffffffffa010de42>] icmp64_send+0x32/0x250 [jool]
[  498.580049] RSP: 0018:ffff88011fc03d88  EFLAGS: 00010206
[  498.580049] RAX: 0000000080000101 RBX: 04007f4698060081 RCX: ffff8801197193f8
[  498.580049] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88011fc03c50
[  498.580049] RBP: ffff88011fc03db0 R08: 0000000000000000 R09: 0000000000000000
[  498.580049] R10: 0000000000000002 R11: 0000000000000005 R12: ffff880119719410
[  498.580049] R13: ffff88011fc03e08 R14: ffff88011fc03e08 R15: ffff880036e1da50
[  498.580049] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[  498.580049] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  498.580049] CR2: 00007f58138cc000 CR3: 0000000115475000 CR4: 00000000000006f0
[  498.580049] Stack:
[  498.580049]  ffffffffa01247e0 ffff880119719448 ffff880119719410 ffff88011fc03e08
[  498.580049]  ffff88011fc03e08 ffff88011fc03dd0 ffffffffa0111287 0000000000000001
[  498.580049]  ffff880036e1da00 ffff88011fc03e70 ffffffffa0116232 ffff880036e1d0f0
[  498.580049] Call Trace:
[  498.580049]  <IRQ> 
[  498.580049]  [<ffffffffa0111287>] pktqueue_send+0xb7/0x150 [jool]
[  498.580049]  [<ffffffffa0116232>] cleaner_timer+0x1d2/0x6e0 [jool]
[  498.580049]  [<ffffffff81099524>] ? scheduler_tick+0xa4/0xf0
[  498.580049]  [<ffffffffa0116060>] ? flush_aux+0x110/0x110 [jool]
[  498.580049]  [<ffffffff81074396>] call_timer_fn+0x36/0x100
[  498.580049]  [<ffffffffa0116060>] ? flush_aux+0x110/0x110 [jool]
[  498.580049]  [<ffffffff8107532f>] run_timer_softirq+0x1ef/0x2f0
[  498.580049]  [<ffffffff8106cc2c>] __do_softirq+0xec/0x2c0
[  498.580049]  [<ffffffff8106d175>] irq_exit+0x105/0x110
[  498.580049]  [<ffffffff81734155>] smp_apic_timer_interrupt+0x45/0x60
[  498.580049]  [<ffffffff81732add>] apic_timer_interrupt+0x6d/0x80
[  498.580049]  <EOI> 
[  498.580049]  [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10
[  498.580049]  [<ffffffff8101caaf>] default_idle+0x1f/0xc0
[  498.580049]  [<ffffffff8101d376>] arch_cpu_idle+0x26/0x30
[  498.580049]  [<ffffffff810bef85>] cpu_startup_entry+0xc5/0x290
[  498.580049]  [<ffffffff8170f787>] rest_init+0x77/0x80
[  498.580049]  [<ffffffff81d35f70>] start_kernel+0x438/0x443
[  498.580049]  [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[  498.580049]  [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[  498.580049]  [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[  498.580049]  [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[  498.580049] Code: ff 0f 84 8e 00 00 00 48 8b 7f 28 48 85 ff 0f 84 81 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 08 48 8b 1f 48 85 db 74 5e <48> 83 7b 20 00 74 57 0f b7 43 7e 41 89 d4 66 c1 c0 08 66 3d 00 
[  498.580049] RIP  [<ffffffffa010de42>] icmp64_send+0x32/0x250 [jool]
[  498.580049]  RSP <ffff88011fc03d88>
[  498.614509] ---[ end trace b580988ab7b98a01 ]---
[  498.615862] Kernel panic - not syncing: Fatal exception in interrupt
[  498.617493] drm_kms_helper: panic occurred, switching back to text console

The node is a KVM-based virtual machine, running x86_64 Ubuntu 14.04.2. Jool was loaded using modprobe jool pool6=2a02:c0::46:43:0:0/96 pool4=185.47.43.0/24, all other config was left at its defaults.

I will try to see if I can figure out a way to reproduce the crash.

@toreanderson
Contributor

It seems to crash quite quickly after bootup actually. Console messages from another crash with Jool built with debugging:

[   31.005133] ===============================================
[   31.005734] Catching IPv4 packet: 119.28.2.37->185.47.43.1
[   31.006289] Step 1: Determining the Incoming Tuple
[   31.006776] tuple IPv4-ICMP 119.28.2.37#28026 -> 185.47.43.1#28026
[   31.007599] Done step 1.
[   31.009097] Step 2: Filtering and Updating
[   31.009097] There is no BIB entry for the incoming IPv4 packet.
[   31.009097] Done: Step 2.
[   31.009097] Returning the packet to the kernel.
[   31.735136] ===============================================
[   31.736886] Catching IPv4 packet: 183.22.142.135->185.47.43.202
[   31.738569] Step 1: Determining the Incoming Tuple
[   31.739095] tuple IPv4-UDP 183.22.142.135#31920 -> 185.47.43.202#35255
[   31.739095] Done step 1.
[   31.739095] Step 2: Filtering and Updating
[   31.739095] There is no BIB entry for the incoming IPv4 packet.
[   31.739095] Done: Step 2.
[   31.739095] Returning the packet to the kernel.
[   33.206638] ===============================================
[   33.210617] Catching IPv4 packet: 183.22.142.135->185.47.43.202
[   33.210617] Step 1: Determining the Incoming Tuple
[   33.210617] tuple IPv4-TCP 183.22.142.135#31976 -> 185.47.43.202#35255
[   33.210617] Done step 1.
[   33.210617] Step 2: Filtering and Updating
[   33.210617] Session entry: ::#0 - 2a02:c0::46:43:b716:8e87#31976 | 185.47.43.202#35255 - 183.22.142.135#31976
[   33.210617] Done: Step 2.
[   33.210617] Step 3: Computing the Outgoing Tuple
[   33.210617] tuple IPv6-TCP 2a02:c0::46:43:b716:8e87#31976 -> ::#0
[   33.210617] Done step 3.
[   33.210617] Step 4: Translating the Packet
[   33.210617] Done step 4.
[   33.210617] Sending skb via device 'eth0'
[   33.210617] Success.
[   34.765369] ===============================================
[   34.767950] Catching IPv4 packet: 183.22.142.135->185.47.43.202
[   34.769343] Step 1: Determining the Incoming Tuple
[   34.769343] tuple IPv4-UDP 183.22.142.135#31920 -> 185.47.43.202#35255
[   34.769343] Done step 1.
[   34.769343] Step 2: Filtering and Updating
[   34.769343] There is no BIB entry for the incoming IPv4 packet.
[   34.769343] Done: Step 2.
[   34.769343] Returning the packet to the kernel.
[   36.220044] ===============================================
[   36.222686] Deleting expired sessions...
[   36.224915] Cleaner name: TCP_SYN
[   36.227028] BUG: unable to handle kernel NULL pointer dereference at 0000000000000035
[   36.228901] IP: [<ffffffffa0105e42>] icmp64_send+0x32/0x250 [jool]
[   36.228901] PGD 0 
[   36.228901] Oops: 0000 [#1] SMP 
[   36.228901] Modules linked in: jool(OX) nf_defrag_ipv6 nf_defrag_ipv4 cirrus i2c_piix4 ttm drm_kms_helper serio_raw drm syscopyarea sysfillrect sysimgblt mac_hid lp parport psmouse pata_acpi floppy
[   36.228901] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OX 3.13.0-46-generic #77-Ubuntu
[   36.228901] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[   36.228901] task: ffff880119b5c800 ti: ffff880119b8c000 task.ti: ffff880119b8c000
[   36.228901] RIP: 0010:[<ffffffffa0105e42>]  [<ffffffffa0105e42>] icmp64_send+0x32/0x250 [jool]
[   36.228901] RSP: 0018:ffff88011fc83d88  EFLAGS: 00010202
[   36.228901] RAX: 0000000000000101 RBX: 0000000000000015 RCX: 0000000000000000
[   36.228901] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88011fc83c50
[   36.228901] RBP: ffff88011fc83db0 R08: 0000000000000000 R09: 0000000000000000
[   36.228901] R10: 0000000000000007 R11: 0000000000cccccc R12: ffff880118bcb000
[   36.228901] R13: ffff88011fc83e08 R14: ffff88011fc83e08 R15: ffff880115b38690
[   36.228901] FS:  0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
[   36.228901] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   36.228901] CR2: 0000000000000035 CR3: 0000000115b02000 CR4: 00000000000006e0
[   36.228901] Stack:
[   36.228901]  ffff88011fc83e20 ffff880118bcb038 ffff880118bcb000 ffff88011fc83e08
[   36.228901]  ffff88011fc83e08 ffff88011fc83dd0 ffffffffa0109287 0000000000000001
[   36.228901]  ffff880115b38640 ffff88011fc83e70 ffffffffa010e232 ffff880115b38b90
[   36.228901] Call Trace:
[   36.228901]  <IRQ> 
[   36.228901]  [<ffffffffa0109287>] pktqueue_send+0xb7/0x150 [jool]
[   36.228901]  [<ffffffffa010e232>] cleaner_timer+0x1d2/0x6e0 [jool]
[   36.228901]  [<ffffffff81099524>] ? scheduler_tick+0xa4/0xf0
[   36.228901]  [<ffffffffa010e060>] ? flush_aux+0x110/0x110 [jool]
[   36.228901]  [<ffffffff81074396>] call_timer_fn+0x36/0x100
[   36.228901]  [<ffffffffa010e060>] ? flush_aux+0x110/0x110 [jool]
[   36.228901]  [<ffffffff8107532f>] run_timer_softirq+0x1ef/0x2f0
[   36.228901]  [<ffffffff8106cc2c>] __do_softirq+0xec/0x2c0
[   36.228901]  [<ffffffff8106d175>] irq_exit+0x105/0x110
[   36.228901]  [<ffffffff81734155>] smp_apic_timer_interrupt+0x45/0x60
[   36.228901]  [<ffffffff81732add>] apic_timer_interrupt+0x6d/0x80
[   36.228901]  <EOI> 
[   36.228901]  [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10
[   36.228901]  [<ffffffff8101caaf>] default_idle+0x1f/0xc0
[   36.228901]  [<ffffffff8101d376>] arch_cpu_idle+0x26/0x30
[   36.228901]  [<ffffffff810bef85>] cpu_startup_entry+0xc5/0x290
[   36.228901]  [<ffffffff810414dd>] start_secondary+0x21d/0x2d0
[   36.228901] Code: ff 0f 84 8e 00 00 00 48 8b 7f 28 48 85 ff 0f 84 81 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 83 ec 08 48 8b 1f 48 85 db 74 5e <48> 83 7b 20 00 74 57 0f b7 43 7e 41 89 d4 66 c1 c0 08 66 3d 00 
[   36.228901] RIP  [<ffffffffa0105e42>] icmp64_send+0x32/0x250 [jool]
[   36.228901]  RSP <ffff88011fc83d88>
[   36.228901] CR2: 0000000000000035
[   36.228901] ---[ end trace cd4214f5c9b66c0a ]---
[   36.228901] Kernel panic - not syncing: Fatal exception in interrupt
[   36.228901] drm_kms_helper: panic occurred, switching back to text console

A tcpdump taken on the software bridge on the VM hypervisor shows the packets arriving on or being received from the Jool node in the moments leading up to the crash (I have the full PCAP if you want it):

11:29:55.845271 IP (tos 0x0, ttl 46, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
    119.28.2.37 > 185.47.43.1: ICMP echo request, id 28026, seq 0, length 64
11:29:56.575281 IP (tos 0x0, ttl 116, id 50664, offset 0, flags [none], proto UDP (17), length 61)
    183.22.142.135.31920 > 185.47.43.202.35255: UDP, length 33
11:29:58.046777 IP (tos 0x0, ttl 115, id 50844, offset 0, flags [DF], proto TCP (6), length 48)
    183.22.142.135.31976 > 185.47.43.202.35255: Flags [S], cksum 0x2c3d (correct), seq 2953467304, win 8192, options [mss 1452,nop,nop,sackOK], length 0
11:29:58.075045 IP6 (hlim 114, next-header TCP (6) payload length: 28) 2a02:c0::46:43:b716:8e87.31976 > ::.0: Flags [S], cksum 0x6fa3 (correct), seq 2953467304, win 8192, options [mss 1452,nop,nop,sackOK], length 0
11:29:59.605528 IP (tos 0x0, ttl 116, id 50979, offset 0, flags [none], proto UDP (17), length 61)
    183.22.142.135.31920 > 185.47.43.202.35255: UDP, length 33
[..the crash seems to occur at this point, at least there is no debugging output relating to the following packets..]
11:30:02.189778 IP (tos 0x0, ttl 53, id 9664, offset 0, flags [none], proto TCP (6), length 44)
    188.165.57.236.http > 185.47.43.11.48416: Flags [S.], cksum 0xaa82 (correct), seq 606526479, ack 354418689, win 16384, options [mss 1460], length 0
11:30:02.532500 IP (tos 0x0, ttl 115, id 49793, offset 0, flags [DF], proto TCP (6), length 48)
    36.74.10.105.intersan > 185.47.43.111.microsoft-ds: Flags [S], cksum 0xe94f (correct), seq 1413557117, win 65535, options [mss 1412,nop,nop,sackOK], length 0
11:30:04.047054 IP (tos 0x0, ttl 115, id 51468, offset 0, flags [DF], proto TCP (6), length 48)
    183.22.142.135.31976 > 185.47.43.202.35255: Flags [S], cksum 0x2c3d (correct), seq 2953467304, win 8192, options [mss 1452,nop,nop,sackOK], length 0

There's also something really odd going on with those IPv4 packets that gets translated with to an IPv6 destination addr#port of ::#0. I don't know if that has any significance or not to the kernel crash or should be considered a separate bug.

@toreanderson
Contributor

Ok, so with an IPv4 /24 routed to it and added to the IPv4 pool, it crashes within minutes just from background internet noise. Every crash I've seen is preceded by the Cleaner name: TCP_SYN messages. I believe --drop-externally-initiated-tcp ON does seem to help. So I'm getting more and more suspicious about those bogus sessions to ::#0 having something to do with this.

@toreanderson
Contributor

I've bisected it. The problem with crashes starts occuring with commit b2dbe4b:

b2dbe4bd5baa67264a50ed3191ba01f0a0ab709f is the first bad commit
commit b2dbe4bd5baa67264a50ed3191ba01f0a0ab709f
Author: Alberto Leiva Popper <ydahhrk@gmail.com>
Date:   Fri Feb 20 19:05:29 2015 -0600

    Removed everything related to control buffers. Jool now encapsulates packets so the kernel doesn't waste any NF_ACCEPTed packets.
    Man, this one was a disaster.
    See https://github.com/NICMx/NAT64/issues/116#issuecomment-74343381.

However the bogus sessions to ::#0 actually start occurring earlier, beginning with commit d85857e:

d85857ec0395152ae6154f8c53dd20270f312a52 is the first bad commit
commit d85857ec0395152ae6154f8c53dd20270f312a52
Author: Alberto Leiva Popper <ydahhrk@gmail.com>
Date:   Tue Feb 17 10:02:36 2015 -0600

    Removed some Host-Based Edge Translation code, clarified some issues in the doc.
    I found a fairly important bug in packet, so this is mostly just a checkpoint.

However they don't appear to actually do any damage (like crash the kernel) until commit b2dbe4b, the bogus sessions seem to expire just fine after the five second V4_INIT state, with no ill effects.

I'm very sorry for not noticing this issue before the v3.3.0 release (my testing was mostly focused on the stateless mode).

@toreanderson toreanderson changed the title from Linux kernel panic (GPF) with Jool in Stateful NAT64 mode to Externally initiated IPv4 sessions in Stateful NAT64 mode crash the kernel Mar 10, 2015
@ydahhrk
Member
ydahhrk commented Mar 10, 2015

However the bogus sessions to ::#0 actually start occurring earlier, beginning with commit d85857e:

From RFC 6146:

The STE transport IPv6 source address is left unspecified and may be populated by other protocols that are out of the scope of this specification.

(...)

The result is that the NAT64 will not drop the packet based on the filtering, nor create a BIB entry. Instead, the NAT64 will only create the Session Table Entry and store the packet. The motivation for this is to support simultaneous open of TCP connections.

That's weird. I do remember testing this before releasing.

I guess if these sessions are really causing this, we should find another way to represent them and/or store them elsewhere.

"Cleaner name: TCP_SYN" happens when Jool is trying to delete expired sessions. I'm going to take a look at this code right away.

I'm very sorry for not noticing this issue before the v3.3.0 release (my testing was mostly focused on the stateless mode).

On the contrary, sorry for the trouble, and thank you for being so thorough with your testing.

@ydahhrk
Member
ydahhrk commented Mar 10, 2015

I think I found it.

https://github.com/NICMx/NAT64/blob/8c9988a61c4e9e0409244f79082faed4f5b9aaf6/mod/common/icmp_wrapper.c#L126

The "original packet" no longer exists; it's a bogus pointer.

I just uploaded the likely solution to the issue137 branch. I can't test it yet because I have an appointment right now. Will be back in an hour.

This logic was also used in the fragment database, so I tweaked that too.

@toreanderson
Contributor

6c0773c does indeed appear to fix the crashes. (At least my test node hasn't crashed after 20 minutes of uptime with an /24 used as the IPv4 pool, that didn't happen before.)

The ::#0 session entries do still show up whenever there's an unsolicited TCP packet (with SYN and optionally ACK set) coming in from the IPv4 side. I suppose that's as expected.

Instead, the NAT64 will only create the Session Table Entry and store the packet.

This doesn't actually seem to work right, though. Maybe I wasn't too clear about it earlier, but these IPv6 packets destined for [::]:0 do make it out onto the wire (6c0773c made no difference). This tcpdump taken on the hypervisor (i.e., "outside" the Jool VM with MAC address 00:18:59:19:04:05) demonstrates it:

18:20:15.681590 e4:11:5b:9b:8f:29 > 00:18:59:19:04:05, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 60, id 60348, offset 0, flags [none], proto TCP (6), length 40)
    87.238.49.250.29150 > 185.47.43.111.http: Flags [S.], cksum 0xe331 (correct), seq 494103772, ack 4235812698, win 1480, length 0
18:20:15.681789 00:18:59:19:04:05 > e4:11:5b:9b:8f:29, ethertype IPv6 (0x86dd), length 74: (hlim 59, next-header TCP (6) payload length: 20) 2a02:c0::46:43:57ee:31fa.29150 > ::.0: Flags [S.], cksum 0x9cd5 (correct), seq 494103772, ack 4235812698, win 1480, length 0

(The IPv4 packet was generated on 87.238.49.250 using nping --tcp --flags syn,ack -p 80 185.47.43.111.)

The fact that Jool forwards such packets violates section 2.5.2 of RFC4291, which states:

The unspecified address must not be used as the destination address of IPv6 packets or in IPv6 Routing headers.

@ydahhrk ydahhrk added a commit that referenced this issue Mar 10, 2015
@ydahhrk ydahhrk Second half of issue #137.
"Simultaneous Open Sessions" were being trated as normal sessions, which lead to bogus packets.

SO sessions are no longer returned by session_get().
9f63ff5
@ydahhrk
Member
ydahhrk commented Mar 10, 2015

Oops, sorry :/

It seems we had never realized this because we do not assign default routes to the NAT64 machine in the lab. I actually had to add it now to force the :: packet to reach the cord.

(Isn't it a little weird that Linux routes packets towards :: without complaining?)

Anyway, I just commited a fix and there no longer seems to be any bogus traffic. Please bounce back if you see any more quirks.

Since this is critical, we will release Jool 3.3.1 right away (with #132 and #133).

(The code is still at the issue137 branch)

@toreanderson
Contributor

The issue137 branch looks good to me now - no crashes, no packets to [::]:0 seen on the wire.

(Isn't it a little weird that Linux routes packets towards :: without complaining?)

Yes and no I guess. One could argue that the Class E space in IPv4 was forever broken because of host stacks enforced that those addresses shouldn't be used. Now it's impossible to change that because of the huge installed base of host stacks that refuse to work with those addresses. So by allowing packets destined for the unspecified address, I guess the door is kept open for someone to standardise some use of such packets...

@ydahhrk ydahhrk added a commit that closed this issue Mar 10, 2015
@ydahhrk ydahhrk Whenever a packet gets copied, its "original packet" needs to be upda…
…ted. Otherwise it points to memory that, after translation, will become garbage.

Probably fixes #137.
6c0773c
@ydahhrk ydahhrk closed this in 6c0773c Mar 10, 2015
@ydahhrk ydahhrk added this to the 3.3.1 milestone Mar 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment