Kernel crash with Jool v3.5.1 #232

toreanderson · 2016-11-10T12:22:58Z

One of our SIIT-DC BRs just crashed. It's an x86_64 server running Ubuntu 14.05.5 and kernel 4.4.0-45-generic. This could be the hardware going faulty for all I know (it's the first time this has happened), but I'm including the oops from the serial console below. It mentions various Jool-related functions, so I'm assuming you'd be interested in taking a look.

[1418084.279692] BUG: unable to handle kernel paging request at ffff88007a402000
[1418084.284507] IP: [<ffffffff813e2d12>] __memcpy+0x12/0x20
[1418084.292070] PGD 2205067 PUD 47ffff067 PMD 274a98063 PTE 800000007a402161
[1418084.295390] Oops: 0003 [#1] SMP 
[1418084.312621] Modules linked in: mptctl 8021q garp mrp stp llc jool_siit(OE) ipmi_ssif bonding gpio_ich coretemp kvm_intel kvm ast irqbypass ttm drm_kms_helper input_leds joydev drm fb_sys_fops syscopyarea sysfillrect tpm_infineon 8250_fintek sysimgblt mac_hid ipmi_si ipmi_msghandler i7core_edac edac_core ioatdma lpc_ich shpchp i5500_temp lp parport xt_mark ip6table_mangle nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables btrfs xor raid6_pq igb mptsas hid_generic i2c_algo_bit mptscsih dca ahci usbhid mptbase ptp libahci hid scsi_transport_sas pps_core fjes
[1418084.416369] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G          IOE   4.4.0-45-generic #66~14.04.1-Ubuntu
[1418084.433389] Hardware name: SUN MICROSYSTEMS SUN FIRE X4170 SERVER          /ASSY,MOTHERBOARD,X4170, BIOS 07060309 07/10/2013
[1418084.453445] task: ffff880276a0f080 ti: ffff880276a18000 task.ti: ffff880276a18000
[1418084.472151] RIP: 0010:[<ffffffff813e2d12>]  [<ffffffff813e2d12>] __memcpy+0x12/0x20
[1418084.491447] RSP: 0018:ffff880277d03750  EFLAGS: 00010202
[1418084.492814] RAX: ffff88006a69f4fc RBX: 00000000fffffffc RCX: 000000001e053a9f
[1418084.511816] RDX: 0000000000000004 RSI: ffff88024d27f712 RDI: ffff88007a401ffc
[1418084.514370] RBP: ffff880277d037a8 R08: ffff88007b344000 R09: ffff880277d03a08
[1418084.532811] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000028
[1418084.552382] R13: 00000000fffffffc R14: ffff88006a69f4fc R15: 000000000000002c
[1418084.555226] FS:  0000000000000000(0000) GS:ffff880277d00000(0000) knlGS:0000000000000000
[1418084.573104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1418084.591776] CR2: ffff88007a402000 CR3: 0000000001e0c000 CR4: 00000000000006e0
[1418084.594332] Stack:
[1418084.611423]  ffffffff816e2d66 0000000000004600 ffff880400000060 12d353951b5a0575
[1418084.615153]  ffffffff816f8450 ffff88007b344000 ffff880277d03a08 ffff88023d51cbfe
[1418084.633019]  ffff88006a69f4e8 ffff880277d03aa8 0000000000000060 ffff880277d037c0
[1418084.651732] Call Trace:
[1418084.652690]  <IRQ> 
[1418084.653349]  [<ffffffff816e2d66>] ? skb_copy_bits+0x66/0x2d0
[1418084.671567]  [<ffffffff816f8450>] ? dev_queue_xmit+0x10/0x20
[1418084.673536]  [<ffffffffc0446420>] copy_payload+0x90/0xe0 [jool_siit]
[1418084.691774]  [<ffffffffc044498d>] ttp46_tcp+0x10d/0x170 [jool_siit]
[1418084.694030]  [<ffffffffc04465be>] ttpcomm_translate_inner_packet+0xee/0x240 [jool_siit]
[1418084.712967]  [<ffffffffc044398f>] post_icmp6error+0xaf/0x1a0 [jool_siit]
[1418084.731816]  [<ffffffffc04444a3>] ttp46_icmp+0x143/0x520 [jool_siit]
[1418084.733552]  [<ffffffffc044685d>] translating_the_packet+0xad/0x2f0 [jool_siit]
[1418084.752745]  [<ffffffffc044ce50>] core_common+0x20/0xf0 [jool_siit]
[1418084.754707]  [<ffffffffc044cfc7>] core_4to6+0xa7/0x130 [jool_siit]
[1418084.773072]  [<ffffffffc0453185>] hook_ipv4+0x15/0x20 [jool_siit]
[1418084.791478]  [<ffffffff8172b2cd>] nf_iterate+0x5d/0x70
[1418084.793705]  [<ffffffff8172b346>] nf_hook_slow+0x66/0xc0
[1418084.811468]  [<ffffffff817326a3>] ip_rcv+0x303/0x3e0
[1418084.813389]  [<ffffffff81731c80>] ? inet_del_offload+0x40/0x40
[1418084.832558]  [<ffffffff816f57ab>] __netif_receive_skb_core+0x36b/0x9c0
[1418084.835651]  [<ffffffffc051a850>] ? bond_resend_igmp_join_requests_delayed+0x80/0x80 [bonding]
[1418084.854153]  [<ffffffff816f5e18>] __netif_receive_skb+0x18/0x60
[1418084.872538]  [<ffffffff816f5e83>] netif_receive_skb_internal+0x23/0x80
[1418084.876184]  [<ffffffff816f6a93>] napi_gro_receive+0xc3/0x110
[1418084.893145]  [<ffffffffc01ab97d>] igb_clean_rx_irq+0x38d/0x6c0 [igb]
[1418084.895906]  [<ffffffffc01ac013>] igb_poll+0x363/0x720 [igb]
[1418084.913155]  [<ffffffff81036929>] ? sched_clock+0x9/0x10
[1418084.915810]  [<ffffffff810ac8d2>] ? sched_clock_cpu+0x72/0xa0
[1418084.939181]  [<ffffffff810a6655>] ? check_preempt_curr+0x75/0x90
[1418084.942383]  [<ffffffff810a6689>] ? ttwu_do_wakeup+0x19/0xe0
[1418084.953452]  [<ffffffff816f6274>] net_rx_action+0x164/0x350
[1418084.956085]  [<ffffffff81081f7d>] __do_softirq+0xdd/0x290
[1418084.973315]  [<ffffffff81082355>] irq_exit+0x95/0xa0
[1418084.975518]  [<ffffffff817fcee6>] do_IRQ+0x56/0xd0
[1418084.993256]  [<ffffffff817fafc2>] common_interrupt+0x82/0x82
[1418084.996847]  <EOI> 
[1418085.011423]  [<ffffffff81695a35>] ? cpuidle_enter_state+0xd5/0x250
[1418085.015568]  [<ffffffff81695a14>] ? cpuidle_enter_state+0xb4/0x250
[1418085.033139]  [<ffffffff81695be7>] cpuidle_enter+0x17/0x20
[1418085.035561]  [<ffffffff810be102>] call_cpuidle+0x32/0x60
[1418085.053152]  [<ffffffff81695bc3>] ? cpuidle_select+0x13/0x20
[1418085.055579]  [<ffffffff810be3b9>] cpu_startup_entry+0x289/0x350
[1418085.073459]  [<ffffffff8104f319>] start_secondary+0x149/0x170
[1418085.075869] Code: 74 0e 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d c3 e8 b4 fc ff ff eb eb 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 
[1418085.120794] RIP  [<ffffffff813e2d12>] __memcpy+0x12/0x20
[1418085.132582]  RSP <ffff880277d03750>
[1418085.134732] CR2: ffff88007a402000
[1418085.140858] ---[ end trace 7537c99c0e03fc19 ]---
[1418085.140864] BUG: unable to handle kernel paging request at 0000571a90000190
[1418085.140871] IP: [<ffffffff817ac7bc>] rt6_score_route+0x10c/0x1c0
[1418085.140873] PGD 0 
[1418085.140874] Oops: 0000 [#2] SMP 
[1418085.140900] Modules linked in: mptctl 8021q garp mrp stp llc jool_siit(OE) ipmi_ssif bonding gpio_ich coretemp kvm_intel kvm ast irqbypass ttm drm_kms_helper input_leds joydev drm fb_sys_fops syscopyarea sysfillrect tpm_infineon 8250_fintek sysimgblt mac_hid ipmi_si ipmi_msghandler i7core_edac edac_core ioatdma lpc_ich shpchp i5500_temp lp parport xt_mark ip6table_mangle nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables btrfs xor raid6_pq igb mptsas hid_generic i2c_algo_bit mptscsih dca ahci usbhid mptbase ptp libahci hid scsi_transport_sas pps_core fjes
[1418085.140903] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G      D   IOE   4.4.0-45-generic #66~14.04.1-Ubuntu
[1418085.140904] Hardware name: SUN MICROSYSTEMS SUN FIRE X4170 SERVER          /ASSY,MOTHERBOARD,X4170, BIOS 07060309 07/10/2013
[1418085.140904] task: ffff880476893e80 ti: ffff8804768fc000 task.ti: ffff8804768fc000
[1418085.140907] RIP: 0010:[<ffffffff817ac7bc>]  [<ffffffff817ac7bc>] rt6_score_route+0x10c/0x1c0
[1418085.140908] RSP: 0018:ffff88047fdc36e8  EFLAGS: 00010206
[1418085.140909] RAX: ffff880473df0000 RBX: 000000000000000a RCX: 000000000000001c
[1418085.140910] RDX: ffff88007a4007c0 RSI: 0000000000000008 RDI: 0000000000000000
[1418085.140910] RBP: ffff88047fdc3700 R08: 00000000c000022a R09: 0000000001020100
[1418085.140911] R10: 0000000010000000 R11: 00000000e86a0900 R12: 0000571a90000000
[1418085.140912] R13: 0000000000000002 R14: ffff88047fdc37ac R15: ffff88047fdc37ab
[1418085.140913] FS:  0000000000000000(0000) GS:ffff88047fdc0000(0000) knlGS:0000000000000000
[1418085.140914] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1418085.140915] CR2: 0000571a90000190 CR3: 0000000001e0c000 CR4: 00000000000006e0
[1418085.140915] Stack:
[1418085.140917]  ffff88027627fd80 0000000000000000 0000000000000002 ffff88047fdc3750
[1418085.140918]  ffffffff817acbd8 0000000000000100 ffff88047fdc3950 0000000000000000
[1418085.140920]  0000000000000002 0000000000000000 0000000000000400 ffff88047fdc3930
[1418085.140920] Call Trace:
[1418085.140924]  <IRQ> 
[1418085.140924]  [<ffffffff817acbd8>] find_match+0x78/0x300
[1418085.140926]  [<ffffffff817acf36>] ip6_pol_route.isra.40+0xd6/0x5e0
[1418085.140928]  [<ffffffff817ad470>] ? ip6_pol_route_input+0x30/0x30
[1418085.140930]  [<ffffffff817ad49a>] ip6_pol_route_output+0x2a/0x30
[1418085.140935]  [<ffffffff817d5187>] fib6_rule_action+0xb7/0x1f0
[1418085.140940]  [<ffffffff81714227>] fib_rules_lookup+0xc7/0x160
[1418085.140949]  [<ffffffffc0453c20>] ? interface_contains+0x40/0xe0 [jool_siit]
[1418085.140951]  [<ffffffff817d5474>] fib6_rule_lookup+0x44/0xa0
[1418085.140953]  [<ffffffff817ad470>] ? ip6_pol_route_input+0x30/0x30
[1418085.140955]  [<ffffffff817ab33d>] ip6_route_output_flags+0xdd/0x120
[1418085.140961]  [<ffffffffc044cacf>] __route6+0x10f/0x200 [jool_siit]
[1418085.140968]  [<ffffffffc044cc25>] route+0x45/0x50 [jool_siit]
[1418085.140974]  [<ffffffffc044cc53>] sendpkt_send+0x23/0x200 [jool_siit]
[1418085.140980]  [<ffffffffc044cec6>] core_common+0x96/0xf0 [jool_siit]
[1418085.140986]  [<ffffffffc044cfc7>] core_4to6+0xa7/0x130 [jool_siit]
[1418085.140993]  [<ffffffffc0453185>] hook_ipv4+0x15/0x20 [jool_siit]
[1418085.140995]  [<ffffffff8172b2cd>] nf_iterate+0x5d/0x70
[1418085.140997]  [<ffffffff8172b346>] nf_hook_slow+0x66/0xc0
[1418085.140998]  [<ffffffff817326a3>] ip_rcv+0x303/0x3e0
[1418085.140999]  [<ffffffff81731c80>] ? inet_del_offload+0x40/0x40
[1418085.141002]  [<ffffffff816f57ab>] __netif_receive_skb_core+0x36b/0x9c0
[1418085.141007]  [<ffffffffc051a850>] ? bond_resend_igmp_join_requests_delayed+0x80/0x80 [bonding]
[1418085.141009]  [<ffffffff816f5e18>] __netif_receive_skb+0x18/0x60
[1418085.141010]  [<ffffffff816f5e83>] netif_receive_skb_internal+0x23/0x80
[1418085.141012]  [<ffffffff816f6a93>] napi_gro_receive+0xc3/0x110
[1418085.141018]  [<ffffffffc01ab97d>] igb_clean_rx_irq+0x38d/0x6c0 [igb]
[1418085.141023]  [<ffffffffc01ac013>] igb_poll+0x363/0x720 [igb]
[1418085.141026]  [<ffffffff810f74c0>] ? tick_sched_do_timer+0x30/0x30
[1418085.141028]  [<ffffffff816f6274>] net_rx_action+0x164/0x350
[1418085.141031]  [<ffffffff81081f7d>] __do_softirq+0xdd/0x290
[1418085.141033]  [<ffffffff81082355>] irq_exit+0x95/0xa0
[1418085.141034]  [<ffffffff817fcee6>] do_IRQ+0x56/0xd0
[1418085.141037]  [<ffffffff817fafc2>] common_interrupt+0x82/0x82
[1418085.141040]  <EOI> 
[1418085.141040]  [<ffffffff81695a35>] ? cpuidle_enter_state+0xd5/0x250
[1418085.141041]  [<ffffffff81695a14>] ? cpuidle_enter_state+0xb4/0x250
[1418085.141043]  [<ffffffff81695be7>] cpuidle_enter+0x17/0x20
[1418085.141045]  [<ffffffff810be102>] call_cpuidle+0x32/0x60
[1418085.141047]  [<ffffffff81695bc3>] ? cpuidle_select+0x13/0x20
[1418085.141048]  [<ffffffff810be3b9>] cpu_startup_entry+0x289/0x350
[1418085.141051]  [<ffffffff8104f319>] start_secondary+0x149/0x170
[1418085.141064] Code: 01 d9 01 ce b9 20 00 00 00 2b 4a 08 48 8b 12 d3 ee 48 8d 14 f2 4c 8b 22 4d 85 e4 75 0e e9 83 00 00 00 4d 8b 24 24 4d 85 e4 74 7a <49> 3b 84 24 90 01 00 00 75 ed 41 8b 94 24 9c 01 00 00 41 8b 8c 
[1418085.141067] RIP  [<ffffffff817ac7bc>] rt6_score_route+0x10c/0x1c0
[1418085.141067]  RSP <ffff88047fdc36e8>
[1418085.141068] CR2: 0000571a90000190
[1418085.141070] ---[ end trace 7537c99c0e03fc1a ]---
[1418085.141071] BUG: unable to handle kernel 
[1418085.141072] Kernel panic - not syncing: Fatal exception in interrupt
[1418085.141074] paging request at 0000571a90000190
[1418085.141078] IP: [<ffffffff817ac7bc>] rt6_score_route+0x10c/0x1c0
[1418085.141079] PGD 0 
[1418085.141081] Oops: 0000 [#3] SMP 
[1418085.141109] Modules linked in: mptctl 8021q garp mrp stp llc jool_siit(OE) ipmi_ssif bonding gpio_ich coretemp kvm_intel kvm ast irqbypass ttm drm_kms_helper input_leds joydev drm fb_sys_fops syscopyarea sysfillrect tpm_infineon 8250_fintek sysimgblt mac_hid ipmi_si ipmi_msghandler i7core_edac edac_core ioatdma lpc_ich shpchp i5500_temp lp parport xt_mark ip6table_mangle nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_ta

The text was updated successfully, but these errors were encountered:

ydahhrk · 2016-11-10T22:44:46Z

Thank you.

It looks like Jool's fault to me. The bug is probably present in the 3.4 series too.

Going to allocate time for a review asap...

ydahhrk · 2016-11-15T00:16:04Z

Was defrag active? In other words, what is the output of

$ lsmod | grep defrag

?

(we'll just have to assume your current output is the same it had when it crashed)

toreanderson · 2016-11-15T00:19:40Z

Looks that way:

$ lsmod | grep defrag
nf_defrag_ipv6         36864  1 nf_conntrack_ipv6
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4

It is extremely likely that this was the case before the crash, too.

ydahhrk · 2016-11-15T00:47:23Z

Thanks

So I've been scanning the code for roughly a week now and I feel like I should report something, for the sake of collecting my thoughts here if nothing else. We've been trying to find #232. It's one hell of a bug. On one hand, it's easy to tell from the stack trace that the crash happened during the translation of the *inner* payload of an ICMP error from IPv4 to IPv6 caused by a TCP packet. On the other hand, there is no way to reproduce it yet, the review is yielding little more than optimizations and the failure rate (once in the two years the relevant code has existed) suggests that the problem is something otherworldly (ie. undefined behavior, which could have been triggered anywhere in the kernel). There was no hairpinning involved. ICMP errors are never supposed to be fragmented. Even if this particular packet were, the crash happened during the first fragment's copy. The fact that SIIT is the one that crashed is fortunate since it means there's less code to worry about. It crashed during one of the `memcpy()`s of the kernel's `skb_copy_bits()`, sitting at Jool's `copy_payload()`. The crash looks like a typical memory access fault, which would mean that at least one of the following fields had an incorrect value during the copy: state->in.skb skb->len skb->data skb->data_len skb_shinfo(skb)->nr_frags skb_shinfo(skb)->frags skb_shinfo(skb)->frag_list pkt_payload_offset(&state->in) skb->head skb->network_header skb->data pkt->payload pkt_payload(&state->out) pkt->payload pkt_payload_len_frag(&state->out) skb->len skb->data_len pkt->payload skb->head skb->network_header skb_shinfo(skb)->nr_frags skb_shinfo(skb)->frags skb_shinfo(skb)->frag_list Jool rarely needs to edit the incoming packet and when it does it's via the kernel API. The borked field is most likely one of the outgoing ones. ------------------------ Ok so I haven't necessarily fixed the bug but I did find room for improvement. Fragment translation is one area where I feel Jool is too kernel-aware and, though I don't see even potential problems in this code now (given that fragmentation has little to do with the crashed packet to begin with), future kernel refactors regarding fragment representation can come back and shoot me in the foot. The problem is that Jool is copying subsequent packet payload *and even pages* when a simple reference grab can do the job. Subsequent fragments lack headers so they can theoretically be quirklessly shared between incoming and translated packets. Fixing this would have the additional benefit of speeding up translation since only head data (not paged nor fragmented) would need to be copied. I can also see it trumping the offloading problem but I've been there before and I'm not getting my hopes up. IIRC, I implemented it as it is because the kernel's suggested fragment-transparent solution does not necessarily account for the potential header growth (from IPv4 to IPv6) and the kernel can find itself in deep trouble if an skb cannot be `skb_push`ed enough. I did some tests however, and it seems that precisely when fragmentation is involved the kernel tends to reserve plenty of excess headroom for some reason. So I might be on to something. I also found other small errors but I don't see a kernel panic coming out of any of them. In fact, since this is the first time I've seen them I'm somewhat skeptical as to whether I actually fixed something or introduced more problems. Currently testing. If anything, this commit should stick because I added and updated loads of documentation during the review.

TheRedTrainer · 2016-11-22T22:52:41Z

hi, @toreanderson . Question: was the offload off?

ethtool --show-offload eth0 | grep receive-offload

toreanderson · 2016-11-23T06:00:03Z

Yes.

$ ethtool --show-offload eth0 | grep receive-offload
generic-receive-offload: off
large-receive-offload: off [fixed]
$ ethtool --show-offload eth1 | grep receive-offload
generic-receive-offload: off
large-receive-offload: off [fixed] 
$ ethtool --show-offload bond0 | grep receive-offload
generic-receive-offload: off
large-receive-offload: on

ydahhrk · 2016-12-02T17:51:03Z

IIIIIIIII FFFFFFFFFOOOOOOOOOOUUUUUUUUUUNNNNNNNDDDDDDDDDDD IIIIIIITTTTTTTT!!!

Well, yours crashed in __memcpy. Mine crashed in memcpy. Close enough.

BUG: unable to handle kernel paging request at cec3b000
IP: [<c13035f4>] memcpy+0x14/0x30
*pdpt = 0000000001aae001 *pde = 0000000010e88063 *pte = 800000000ec3b161
Oops: 0003 [#1] SMP
Modules linked in: jool_siit(OX) vboxsf(OX) snd_intel8x0 snd_ac97_codec ac97_bus openvswitch gre vxlan snd_pcm ip_tunnel snd_page_alloc libcrc32c snd_seq_midi snd_seq_midi_event snd_raw
midi snd_seq snd_seq_device snd_timer vboxvideo(OX) snd drm joydev vboxguest(OX) rfcomm bnep bluetooth serio_raw soundcore i2c_piix4 video parport_pc mac_hid ppdev lp parport hid_generic usbhid hid ps
mouse ahci libahci e1000 pata_acpi
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OX 3.13.0-103-generic #150-Ubuntu
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: c1934a00 ti: ddc0a000 task.ti: c1928000
EIP: 0060:[<c13035f4>] EFLAGS: 00210212 CPU: 0
EIP is at memcpy+0x14/0x30
EAX: cc70defc EBX: fffffff9 ECX: 3f6b4bbd EDX: d00c3ed2
ESI: d25f0fd6 EDI: cec3b000 EBP: ddc0ba2c ESP: ddc0ba20
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: cec3b000 CR3: 1b5d6000 CR4: 000006f0
Stack:
 fffffff9 00000029 00000021 ddc0ba5c c1571153 026433c6 d0c5a840 00000001
 00000000 cc70defc c99d2428 d0718a69 d0c5a840 00000044 fffffff9 ddc0ba7c
 e26a5486 fffffff9 00000014 00000000 cc70deb8 ddc0bc48 cc70dee8 ddc0bab8
Call Trace:
 [<c1571153>] skb_copy_bits+0x53/0x1d0
 [<e26a5486>] copy_payload+0x76/0xd0 [jool_siit]
 [<e26a39f5>] ttp46_tcp+0xf5/0x160 [jool_siit]
 [<e26a5617>] ttpcomm_translate_inner_packet+0xd7/0x240 [jool_siit]
 [<c15a7a5f>] ? nlmsg_notify+0x4f/0xb0
 [<c159c61f>] ? sch_direct_xmit+0x3f/0x190
 [<e26a2947>] post_icmp6error+0xa7/0x1c0 [jool_siit]
 [<e26a3613>] ttp46_icmp+0x253/0x540 [jool_siit]
 [<e26a588d>] translating_the_packet+0x8d/0x2a0 [jool_siit]
 [<e26a948c>] ? key_contains+0x1c/0x30 [jool_siit]
 [<e26a7c78>] ? pkt_init_ipv4+0x2d8/0x4a0 [jool_siit]
 [<e26ab852>] core_common+0x22/0xf0 [jool_siit]
 [<e26abf86>] ? xlator_find+0xa6/0xb0 [jool_siit]
 [<e26ab9b1>] core_4to6+0x91/0x130 [jool_siit]
 [<e0fcdc1f>] ? e1000_xmit_frame+0x89f/0xe40 [e1000]
 [<c1580001>] ? napi_gro_frags+0xc1/0x140
 [<c15a7a5f>] ? nlmsg_notify+0x4f/0xb0
 [<c159c61f>] ? sch_direct_xmit+0x3f/0x190
 [<c1581303>] ? __dev_queue_xmit+0x83/0x490
 [<c11303a6>] ? put_page+0x26/0x30
 [<c1572bc6>] ? skb_free_head+0x46/0x60
 [<c1570b24>] ? kfree_skbmem+0x34/0x90
 [<e26b14a2>] hook_ipv4+0x12/0x20 [jool_siit]
 [<c15a996c>] nf_iterate+0x6c/0x80
 [<c15af330>] ? inet_del_offload+0x30/0x30
 [<c15a99dc>] nf_hook_slow+0x5c/0x100
 [<c15af330>] ? inet_del_offload+0x30/0x30
 [<c15afdba>] ip_rcv+0x33a/0x430
 [<c15af330>] ? inet_del_offload+0x30/0x30
 [<c157f677>] __netif_receive_skb_core+0x577/0x750
 [<c157f866>] __netif_receive_skb+0x16/0x60
 [<c157f8cf>] netif_receive_skb+0x1f/0x80
 [<c15800e7>] napi_gro_receive+0x67/0x90
 [<e0fce4e5>] e1000_clean_rx_irq+0x275/0x4d0 [e1000]
 [<c1572b9c>] ? skb_free_head+0x1c/0x60
 [<c1570b24>] ? kfree_skbmem+0x34/0x90
 [<e0fcf37d>] e1000_clean+0x1cd/0x7f0 [e1000]
 [<c1063b43>] ? internal_add_timer+0x13/0x40
 [<c1065347>] ? mod_timer+0xe7/0x1f0
 [<c13617c9>] ? cursor_timer_handler+0x39/0x40
 [<c157fba0>] net_rx_action+0x110/0x210
 [<c105cf90>] __do_softirq+0xd0/0x250
 [<c105cec0>] ? cpu_callback+0x190/0x190
 <IRQ>
 [<c105d3c5>] ? irq_exit+0x95/0xa0
 [<c166e035>] ? do_IRQ+0x45/0xb0
 [<c107a837>] ? hrtimer_start+0x27/0x30
 [<c166ddf3>] ? common_interrupt+0x33/0x38
 [<c1017b30>] ? mwait_idle+0x50/0x70
 [<c10182b6>] ? arch_cpu_idle+0x26/0x30
 [<c10a7a21>] ? cpu_startup_entry+0x201/0x250
 [<c1655312>] ? rest_init+0x62/0x70
 [<c19c9ae7>] ? start_kernel+0x3a9/0x3af
 [<c19c9575>] ? repair_env_string+0x51/0x51
 [<c19c939c>] ? i386_start_kernel+0x137/0x13a
Code: 54 2b 43 50 88 43 4e 5b 5d c3 90 8d 74 26 00 e8 63 fc ff ff eb e8 90 55 89 e5 57 56 53 66 66 66 66 90 89 cb 89 c7 c1 e9 02 89 d6 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 5b 5e 5f 5d c3
8d b6 00 00 00
EIP: [<c13035f4>] memcpy+0x14/0x30 SS:ESP 0068:ddc0ba20
CR2: 00000000cec3b000
---[ end trace c48c84d63fd53114 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xc1000000 (relocation range: 0xc0000000-0xe07effff)

The bug has been fixed since commit 52deab1. In other words, it has been fixed all along. I'm so angry.

Releasing 3.5.2.

Includes - A fix to the 6791 pool: was always using host addresses, regardless of whether the pool had elements or not. - More graybox improvements. - More comments.

toreanderson · 2016-12-03T15:18:00Z

Nice work! 👍

Could you say if the triggering factor is some kind of malformed packet or just some random memory corruption or similar that is very unlikely to happen very often? That is, should I consider this a security issue that could be triggered by a specially crafted packet sent from anywhere on the Internet?

ydahhrk · 2016-12-03T15:36:25Z

The trigger is a single, very specific packet that is all an attacker needs to murder the kernel. The packet itself is unlikely to happen naturally.

(The "security vulnerability" tag is rather redundant because ~~every critical bug so far has been a security vulnerability~~ oops! except for this one, and every security vulnerability should logically be treated as a serious bug.)

ydahhrk · 2016-12-03T15:49:12Z

By the way:

The bug takes a slightly different shape in Jool 3.4, and it's unclear to me whether it is conductive to a panic or not.

Jool 3.4.6 will be released on Monday regardless.

ydahhrk · 2016-12-03T15:54:06Z

Sorry for the inconveniences.

toreanderson · 2016-12-03T16:23:03Z

Not at all, thanks for the quick follow up!

Not sure if the bug yields a panic in 3.4, but at the very least this will prevent some legitimate packets from being dropped.

alexanderkjall · 2016-12-30T23:17:21Z

Hi, will you apply for a CVE number for this bug? I just happened to notice this bug at random and if there had been a CVE for it would have been possible to detect in an automated way.

Best regards

ydahhrk · 2016-12-31T18:43:46Z

Hi, will you apply for a CVE number for this bug? I just happened to notice this bug at random and if there had been a CVE for it would have been possible to detect in an automated way.

Ok, request sent. I used the "Distributed Weakness Filing Project" (iwantacve.org) option.

ydahhrk added the Bug (critical) label Nov 10, 2016

ydahhrk added a commit that referenced this issue Nov 23, 2016

Progress on #232

25df84f

ydahhrk mentioned this issue Nov 28, 2016

Add Device Driver mode #140

Open

ydahhrk added this to the 3.5.2 milestone Dec 2, 2016

ydahhrk added a commit that referenced this issue Dec 2, 2016

Add more improvements inspired by the #232 review

d62b2ff

Includes - A fix to the 6791 pool: was always using host addresses, regardless of whether the pool had elements or not. - More graybox improvements. - More comments.

ydahhrk closed this as completed Dec 2, 2016

ydahhrk added the Security vulnerability User is compromised label Dec 3, 2016

ydahhrk added a commit that referenced this issue Dec 6, 2016

Mirror the #232 tweaks on the 3.4 code

cb2eb07

Not sure if the bug yields a panic in 3.4, but at the very least this will prevent some legitimate packets from being dropped.

toreanderson mentioned this issue Jun 12, 2017

Kernel panic with Jool v3.5.6 (kernel BUG at .../skbuff.h:1826) #247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel crash with Jool v3.5.1 #232

Kernel crash with Jool v3.5.1 #232

toreanderson commented Nov 10, 2016

ydahhrk commented Nov 10, 2016 •

edited

Loading

ydahhrk commented Nov 15, 2016

toreanderson commented Nov 15, 2016

ydahhrk commented Nov 15, 2016

TheRedTrainer commented Nov 22, 2016 •

edited

Loading

toreanderson commented Nov 23, 2016

ydahhrk commented Dec 2, 2016 •

edited

Loading

toreanderson commented Dec 3, 2016

ydahhrk commented Dec 3, 2016 •

edited

Loading

ydahhrk commented Dec 3, 2016

ydahhrk commented Dec 3, 2016

toreanderson commented Dec 3, 2016

alexanderkjall commented Dec 30, 2016

ydahhrk commented Dec 31, 2016

Kernel crash with Jool v3.5.1 #232

Kernel crash with Jool v3.5.1 #232

Comments

toreanderson commented Nov 10, 2016

ydahhrk commented Nov 10, 2016 • edited Loading

ydahhrk commented Nov 15, 2016

toreanderson commented Nov 15, 2016

ydahhrk commented Nov 15, 2016

TheRedTrainer commented Nov 22, 2016 • edited Loading

toreanderson commented Nov 23, 2016

ydahhrk commented Dec 2, 2016 • edited Loading

toreanderson commented Dec 3, 2016

ydahhrk commented Dec 3, 2016 • edited Loading

ydahhrk commented Dec 3, 2016

ydahhrk commented Dec 3, 2016

toreanderson commented Dec 3, 2016

alexanderkjall commented Dec 30, 2016

ydahhrk commented Dec 31, 2016

ydahhrk commented Nov 10, 2016 •

edited

Loading

TheRedTrainer commented Nov 22, 2016 •

edited

Loading

ydahhrk commented Dec 2, 2016 •

edited

Loading

ydahhrk commented Dec 3, 2016 •

edited

Loading