Incorrect fragment handling on higher kernels #231

ydahhrk · 2016-10-06T16:59:30Z

Updated some kernel (4.4.1), found this several times on the log:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3260 at /home/aleiva/Jool-3.5.0/mod/stateful/fragment_db.c:364 fragdb_handle+0x45/0x50 [jool]()
This code is supposed to be unreachable in kernels 3.13+! Please report.
Modules linked in: jool(OE) nf_defrag_ipv6 nf_defrag_ipv4 coretemp crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper vmw_balloon joydev cryptd input_leds vmwgfx serio_raw ttm drm_kms_helper drm fb_sys_fops bnep syscopyarea rfcomm sysfillrect sysimgblt bluetooth vmw_vmci shpchp 8250_fintek pata_acpi mac_hid i2c_piix4 parport_pc ppdev lp parport psmouse vmxnet3 mptspi mptscsih mptbase scsi_transport_spi floppy fjes
CPU: 0 PID: 3260 Comm: ping6 Tainted: G        W  OE   4.4.1-040401-generic #201601311534
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
 0000000000000000 00000000b6bb53ca ffff88013fc03a98 ffffffff813c8e14
 ffff88013fc03ae0 ffff88013fc03ad0 ffffffff8107dba2 ffff8800b3a02000
 ffff88013fc03b98 0000000000000018 ffff8800b47ff200 000000000000dd86
Call Trace:
 <IRQ>  [<ffffffff813c8e14>] dump_stack+0x44/0x60
 [<ffffffff8107dba2>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107dc3c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffffc02f10a5>] fragdb_handle+0x45/0x50 [jool]
 [<ffffffffc02fd2ce>] core_6to4+0xbe/0x170 [jool]
 [<ffffffff810bf802>] ? __wake_up_common+0x52/0x90
 [<ffffffffc02dd5d0>] ? nf_ct_frag6_gather+0xd0/0xde0 [nf_defrag_ipv6]
 [<ffffffff81647f2f>] ? evdev_pass_values+0x1af/0x220
 [<ffffffffc02dd0dc>] ? ipv6_defrag+0xcc/0x1b0 [nf_defrag_ipv6]
 [<ffffffff8164842e>] ? evdev_events+0xae/0xd0
 [<ffffffffc02f2805>] hook_ipv6+0x15/0x20 [jool]
 [<ffffffff8172b282>] nf_iterate+0x62/0x80
 [<ffffffff8172b313>] nf_hook_slow+0x73/0xd0
 [<ffffffff817a263e>] ipv6_rcv+0x41e/0x4c0
 [<ffffffff817a1cd0>] ? ip6_make_skb+0x1e0/0x1e0
 [<ffffffff816f4536>] __netif_receive_skb_core+0x6e6/0xa40
 [<ffffffff810ad900>] ? sched_clock_init+0x60/0x90
 [<ffffffff810a74e4>] ? check_preempt_curr+0x54/0x90
 [<ffffffff810a7539>] ? ttwu_do_wakeup+0x19/0xc0
 [<ffffffff816f48a8>] __netif_receive_skb+0x18/0x60
 [<ffffffff816f5568>] process_backlog+0xa8/0x150
 [<ffffffff816f4dd0>] net_rx_action+0x210/0x320
 [<ffffffff81082446>] __do_softirq+0xf6/0x250
 [<ffffffff817ff8cc>] do_softirq_own_stack+0x1c/0x30
 <EOI>  [<ffffffff81081f28>] do_softirq.part.19+0x38/0x40
 [<ffffffff81081fad>] __local_bh_enable_ip+0x7d/0x80
 [<ffffffff8179ea67>] ip6_finish_output2+0x1a7/0x4d0
 [<ffffffff817beb8c>] ? raw6_getfrag+0xac/0x100
 [<ffffffff817a1386>] ip6_finish_output+0xa6/0x100
 [<ffffffff817a1433>] ip6_output+0x53/0x110
 [<ffffffff817d9e27>] ? __ip6_local_out+0xb7/0xd0
 [<ffffffff817d9e75>] ip6_local_out+0x35/0x40
 [<ffffffff817a1a53>] ip6_send_skb+0x23/0x70
 [<ffffffff817a1aed>] ip6_push_pending_frames+0x4d/0x50
 [<ffffffff817bff01>] rawv6_sendmsg+0xa41/0xcd0
 [<ffffffff8121a3d0>] ? poll_select_copy_remaining+0x140/0x140
 [<ffffffff816da570>] ? sock_common_recvmsg+0x40/0x70
 [<ffffffff816d7bcb>] ? sock_recvmsg+0x3b/0x50
 [<ffffffff8176e4e5>] inet_sendmsg+0x65/0xa0
 [<ffffffff816d80f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d86e1>] SYSC_sendto+0x101/0x190
 [<ffffffff816d9781>] ? __sys_recvmsg+0x51/0x90
 [<ffffffff810f06c5>] ? ktime_get_ts64+0x45/0xf0
 [<ffffffff816d920e>] SyS_sendto+0xe/0x10
 [<ffffffff817fdbb6>] entry_SYSCALL_64_fastpath+0x16/0x75
---[ end trace 79082aedf6dc6af0 ]---

This is a warning, not a panic. NAT64 only. Tested on Jool 3.5. Relevant and surrounding code did not change between 3.4 and 3.5, so this is likely a bug on both Jool series. Will confirm later.

Affects fragmented packets. I don't know what happens to them; they probably get dropped. Trying to replicate it I seem to have triggered it once by querying either Steam or the Playstation Store for the first time in a while. If my traffic really caused it, the endnodes managed to stabilize the connection automatically; I didn't notice any disruptions.

The text was updated successfully, but these errors were encountered:

ydahhrk · 2016-10-06T17:50:38Z

Totally a 3.4 bug as well.

It seems to happen when the packets are both paged and fragmented. This seems to prevent nf_defrag_ipv6 from purging the fragment header. Indeed, the packet is getting dropped silently. I guess TCP is noticing this, hence the automatic fix.

This changes the way Jool has to cope with nf_defrag_ipv6. I'm going to have to study ~~this~~ [Link was dead] again.

The bug is already known so there's no need to keep adding noise to the kernel log.

ydahhrk · 2017-12-07T17:59:52Z

Planning to remove the defrag dependency in Jool 4, so this will be indirectly fixed by #140.

ydahhrk · 2017-12-07T18:06:27Z

~~Things~~ Things

ydahhrk · 2018-07-06T22:38:05Z

This code is supposed to be unreachable in kernels 3.13+! Please report.

Turns out that this bug took too long to get fixed, and the kernels that were supposed to reach the stated code are seemingly phasing out of relevance. Right now, the earliest most relevant kernels that I'm aware of are

core: 3.16
Debian 8: 3.16 (Debian 7 support ended in May.)
Ubuntu 14.04: 3.13
LEDE: 4.4.42
RHEL 7 (CentOS, Fedora, Red Hat): They claim their kernel is 3.10, but this has always been an annoying misnomer. In practice, and as far as this issue is concerned, the RHEL 7.0 kernel is precisely 3.13.

At this point it seems that the sensible solution is to just drop support for kernels 3.12- and call it a day. This is the current plan. If anyone disagrees, please comment.

- Removed glue code for kernels older than 3.13. (Mostly the fragment database. Including --fragment-arrival-timeout.) Basically terminates #231. - Fixed some --instance bugs. Usage of the command should also be simpler. It is certainly easier to explain in the documentation, at least. - The stats system got in the way of something (can't recall what), and I decided that it needed a refactor. Only the API and callers were updated; the innards were removed because I'm out of time. - Separate Netfilter hook code (kernel_hook_netfilter.c) from iptables hook code (kernel_hook_iptables.c). - To account for the design of iptables, VERDICT_ACCEPT became VERDICT_UNTRANSLATABLE (accept on Netfilter, drop on iptables). Jool no longer NF_ACCEPTs at iptables; it didn't make sense because of the rule matching.

ydahhrk · 2019-01-17T18:35:52Z

Released; closing.

ydahhrk added the Bug label Oct 6, 2016

ydahhrk added a commit that referenced this issue Oct 6, 2016

Silence the #231 warning

ae5a6dd

The bug is already known so there's no need to keep adding noise to the kernel log.

ydahhrk modified the milestone: 3.5.5 Jun 5, 2017

ydahhrk removed this from the 3.5.5 milestone Nov 24, 2017

ydahhrk added the Depends on #140 Cannot fix until issue #140 is addressed label Dec 7, 2017

ydahhrk removed the Depends on #140 Cannot fix until issue #140 is addressed label Jul 6, 2018

ydahhrk added this to the 3.6.0 milestone Nov 23, 2018

ydahhrk modified the milestones: 3.6.0, 4.0.0 Jan 9, 2019

ydahhrk closed this as completed Jan 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect fragment handling on higher kernels #231

Incorrect fragment handling on higher kernels #231

ydahhrk commented Oct 6, 2016

ydahhrk commented Oct 6, 2016 •

edited

Loading

ydahhrk commented Dec 7, 2017

ydahhrk commented Dec 7, 2017 •

edited

Loading

ydahhrk commented Jul 6, 2018 •

edited

Loading

ydahhrk commented Jan 17, 2019

Incorrect fragment handling on higher kernels #231

Incorrect fragment handling on higher kernels #231

Comments

ydahhrk commented Oct 6, 2016

ydahhrk commented Oct 6, 2016 • edited Loading

ydahhrk commented Dec 7, 2017

ydahhrk commented Dec 7, 2017 • edited Loading

ydahhrk commented Jul 6, 2018 • edited Loading

ydahhrk commented Jan 17, 2019

ydahhrk commented Oct 6, 2016 •

edited

Loading

ydahhrk commented Dec 7, 2017 •

edited

Loading

ydahhrk commented Jul 6, 2018 •

edited

Loading