Incorrect fragment handling on higher kernels #231

Open
ydahhrk opened this Issue Oct 6, 2016 · 1 comment

Projects

None yet

1 participant

@ydahhrk
Member
ydahhrk commented Oct 6, 2016

Updated some kernel (4.4.1), found this several times on the log:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3260 at /home/aleiva/Jool-3.5.0/mod/stateful/fragment_db.c:364 fragdb_handle+0x45/0x50 [jool]()
This code is supposed to be unreachable in kernels 3.13+! Please report.
Modules linked in: jool(OE) nf_defrag_ipv6 nf_defrag_ipv4 coretemp crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper vmw_balloon joydev cryptd input_leds vmwgfx serio_raw ttm drm_kms_helper drm fb_sys_fops bnep syscopyarea rfcomm sysfillrect sysimgblt bluetooth vmw_vmci shpchp 8250_fintek pata_acpi mac_hid i2c_piix4 parport_pc ppdev lp parport psmouse vmxnet3 mptspi mptscsih mptbase scsi_transport_spi floppy fjes
CPU: 0 PID: 3260 Comm: ping6 Tainted: G        W  OE   4.4.1-040401-generic #201601311534
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
 0000000000000000 00000000b6bb53ca ffff88013fc03a98 ffffffff813c8e14
 ffff88013fc03ae0 ffff88013fc03ad0 ffffffff8107dba2 ffff8800b3a02000
 ffff88013fc03b98 0000000000000018 ffff8800b47ff200 000000000000dd86
Call Trace:
 <IRQ>  [<ffffffff813c8e14>] dump_stack+0x44/0x60
 [<ffffffff8107dba2>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107dc3c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffffc02f10a5>] fragdb_handle+0x45/0x50 [jool]
 [<ffffffffc02fd2ce>] core_6to4+0xbe/0x170 [jool]
 [<ffffffff810bf802>] ? __wake_up_common+0x52/0x90
 [<ffffffffc02dd5d0>] ? nf_ct_frag6_gather+0xd0/0xde0 [nf_defrag_ipv6]
 [<ffffffff81647f2f>] ? evdev_pass_values+0x1af/0x220
 [<ffffffffc02dd0dc>] ? ipv6_defrag+0xcc/0x1b0 [nf_defrag_ipv6]
 [<ffffffff8164842e>] ? evdev_events+0xae/0xd0
 [<ffffffffc02f2805>] hook_ipv6+0x15/0x20 [jool]
 [<ffffffff8172b282>] nf_iterate+0x62/0x80
 [<ffffffff8172b313>] nf_hook_slow+0x73/0xd0
 [<ffffffff817a263e>] ipv6_rcv+0x41e/0x4c0
 [<ffffffff817a1cd0>] ? ip6_make_skb+0x1e0/0x1e0
 [<ffffffff816f4536>] __netif_receive_skb_core+0x6e6/0xa40
 [<ffffffff810ad900>] ? sched_clock_init+0x60/0x90
 [<ffffffff810a74e4>] ? check_preempt_curr+0x54/0x90
 [<ffffffff810a7539>] ? ttwu_do_wakeup+0x19/0xc0
 [<ffffffff816f48a8>] __netif_receive_skb+0x18/0x60
 [<ffffffff816f5568>] process_backlog+0xa8/0x150
 [<ffffffff816f4dd0>] net_rx_action+0x210/0x320
 [<ffffffff81082446>] __do_softirq+0xf6/0x250
 [<ffffffff817ff8cc>] do_softirq_own_stack+0x1c/0x30
 <EOI>  [<ffffffff81081f28>] do_softirq.part.19+0x38/0x40
 [<ffffffff81081fad>] __local_bh_enable_ip+0x7d/0x80
 [<ffffffff8179ea67>] ip6_finish_output2+0x1a7/0x4d0
 [<ffffffff817beb8c>] ? raw6_getfrag+0xac/0x100
 [<ffffffff817a1386>] ip6_finish_output+0xa6/0x100
 [<ffffffff817a1433>] ip6_output+0x53/0x110
 [<ffffffff817d9e27>] ? __ip6_local_out+0xb7/0xd0
 [<ffffffff817d9e75>] ip6_local_out+0x35/0x40
 [<ffffffff817a1a53>] ip6_send_skb+0x23/0x70
 [<ffffffff817a1aed>] ip6_push_pending_frames+0x4d/0x50
 [<ffffffff817bff01>] rawv6_sendmsg+0xa41/0xcd0
 [<ffffffff8121a3d0>] ? poll_select_copy_remaining+0x140/0x140
 [<ffffffff816da570>] ? sock_common_recvmsg+0x40/0x70
 [<ffffffff816d7bcb>] ? sock_recvmsg+0x3b/0x50
 [<ffffffff8176e4e5>] inet_sendmsg+0x65/0xa0
 [<ffffffff816d80f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d86e1>] SYSC_sendto+0x101/0x190
 [<ffffffff816d9781>] ? __sys_recvmsg+0x51/0x90
 [<ffffffff810f06c5>] ? ktime_get_ts64+0x45/0xf0
 [<ffffffff816d920e>] SyS_sendto+0xe/0x10
 [<ffffffff817fdbb6>] entry_SYSCALL_64_fastpath+0x16/0x75
---[ end trace 79082aedf6dc6af0 ]---

This is a warning, not a panic. NAT64 only. Tested on Jool 3.5. Relevant and surrounding code did not change between 3.4 and 3.5, so this is likely a bug on both Jool series. Will confirm later.

Affects fragmented packets. I don't know what happens to them; they probably get dropped. Trying to replicate it I seem to have triggered it once by querying either Steam or the Playstation Store for the first time in a while. If my traffic really caused it, the endnodes managed to stabilize the connection automatically; I didn't notice any disruptions.

@ydahhrk
Member
ydahhrk commented Oct 6, 2016

Totally a 3.4 bug as well.

It seems to happen when the packets are both paged and fragmented. This seems to prevent nf_defrag_ipv6 from purging the fragment header. Indeed, the packet is getting dropped silently. I guess TCP is noticing this, hence the automatic fix.

This changes the way Jool has to cope with nf_defrag_ipv6. I'm going to have to study this again.

@ydahhrk ydahhrk added a commit that referenced this issue Oct 6, 2016
@ydahhrk ydahhrk Silence the #231 warning
The bug is already known so there's no need to keep adding noise to
the kernel log.
ae5a6dd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment