New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect fragment handling on higher kernels #231

Open
ydahhrk opened this Issue Oct 6, 2016 · 4 comments

Comments

Projects
None yet
1 participant
@ydahhrk
Member

ydahhrk commented Oct 6, 2016

Updated some kernel (4.4.1), found this several times on the log:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3260 at /home/aleiva/Jool-3.5.0/mod/stateful/fragment_db.c:364 fragdb_handle+0x45/0x50 [jool]()
This code is supposed to be unreachable in kernels 3.13+! Please report.
Modules linked in: jool(OE) nf_defrag_ipv6 nf_defrag_ipv4 coretemp crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper vmw_balloon joydev cryptd input_leds vmwgfx serio_raw ttm drm_kms_helper drm fb_sys_fops bnep syscopyarea rfcomm sysfillrect sysimgblt bluetooth vmw_vmci shpchp 8250_fintek pata_acpi mac_hid i2c_piix4 parport_pc ppdev lp parport psmouse vmxnet3 mptspi mptscsih mptbase scsi_transport_spi floppy fjes
CPU: 0 PID: 3260 Comm: ping6 Tainted: G        W  OE   4.4.1-040401-generic #201601311534
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
 0000000000000000 00000000b6bb53ca ffff88013fc03a98 ffffffff813c8e14
 ffff88013fc03ae0 ffff88013fc03ad0 ffffffff8107dba2 ffff8800b3a02000
 ffff88013fc03b98 0000000000000018 ffff8800b47ff200 000000000000dd86
Call Trace:
 <IRQ>  [<ffffffff813c8e14>] dump_stack+0x44/0x60
 [<ffffffff8107dba2>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107dc3c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffffc02f10a5>] fragdb_handle+0x45/0x50 [jool]
 [<ffffffffc02fd2ce>] core_6to4+0xbe/0x170 [jool]
 [<ffffffff810bf802>] ? __wake_up_common+0x52/0x90
 [<ffffffffc02dd5d0>] ? nf_ct_frag6_gather+0xd0/0xde0 [nf_defrag_ipv6]
 [<ffffffff81647f2f>] ? evdev_pass_values+0x1af/0x220
 [<ffffffffc02dd0dc>] ? ipv6_defrag+0xcc/0x1b0 [nf_defrag_ipv6]
 [<ffffffff8164842e>] ? evdev_events+0xae/0xd0
 [<ffffffffc02f2805>] hook_ipv6+0x15/0x20 [jool]
 [<ffffffff8172b282>] nf_iterate+0x62/0x80
 [<ffffffff8172b313>] nf_hook_slow+0x73/0xd0
 [<ffffffff817a263e>] ipv6_rcv+0x41e/0x4c0
 [<ffffffff817a1cd0>] ? ip6_make_skb+0x1e0/0x1e0
 [<ffffffff816f4536>] __netif_receive_skb_core+0x6e6/0xa40
 [<ffffffff810ad900>] ? sched_clock_init+0x60/0x90
 [<ffffffff810a74e4>] ? check_preempt_curr+0x54/0x90
 [<ffffffff810a7539>] ? ttwu_do_wakeup+0x19/0xc0
 [<ffffffff816f48a8>] __netif_receive_skb+0x18/0x60
 [<ffffffff816f5568>] process_backlog+0xa8/0x150
 [<ffffffff816f4dd0>] net_rx_action+0x210/0x320
 [<ffffffff81082446>] __do_softirq+0xf6/0x250
 [<ffffffff817ff8cc>] do_softirq_own_stack+0x1c/0x30
 <EOI>  [<ffffffff81081f28>] do_softirq.part.19+0x38/0x40
 [<ffffffff81081fad>] __local_bh_enable_ip+0x7d/0x80
 [<ffffffff8179ea67>] ip6_finish_output2+0x1a7/0x4d0
 [<ffffffff817beb8c>] ? raw6_getfrag+0xac/0x100
 [<ffffffff817a1386>] ip6_finish_output+0xa6/0x100
 [<ffffffff817a1433>] ip6_output+0x53/0x110
 [<ffffffff817d9e27>] ? __ip6_local_out+0xb7/0xd0
 [<ffffffff817d9e75>] ip6_local_out+0x35/0x40
 [<ffffffff817a1a53>] ip6_send_skb+0x23/0x70
 [<ffffffff817a1aed>] ip6_push_pending_frames+0x4d/0x50
 [<ffffffff817bff01>] rawv6_sendmsg+0xa41/0xcd0
 [<ffffffff8121a3d0>] ? poll_select_copy_remaining+0x140/0x140
 [<ffffffff816da570>] ? sock_common_recvmsg+0x40/0x70
 [<ffffffff816d7bcb>] ? sock_recvmsg+0x3b/0x50
 [<ffffffff8176e4e5>] inet_sendmsg+0x65/0xa0
 [<ffffffff816d80f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d86e1>] SYSC_sendto+0x101/0x190
 [<ffffffff816d9781>] ? __sys_recvmsg+0x51/0x90
 [<ffffffff810f06c5>] ? ktime_get_ts64+0x45/0xf0
 [<ffffffff816d920e>] SyS_sendto+0xe/0x10
 [<ffffffff817fdbb6>] entry_SYSCALL_64_fastpath+0x16/0x75
---[ end trace 79082aedf6dc6af0 ]---

This is a warning, not a panic. NAT64 only. Tested on Jool 3.5. Relevant and surrounding code did not change between 3.4 and 3.5, so this is likely a bug on both Jool series. Will confirm later.

Affects fragmented packets. I don't know what happens to them; they probably get dropped. Trying to replicate it I seem to have triggered it once by querying either Steam or the Playstation Store for the first time in a while. If my traffic really caused it, the endnodes managed to stabilize the connection automatically; I didn't notice any disruptions.

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Oct 6, 2016

Member

Totally a 3.4 bug as well.

It seems to happen when the packets are both paged and fragmented. This seems to prevent nf_defrag_ipv6 from purging the fragment header. Indeed, the packet is getting dropped silently. I guess TCP is noticing this, hence the automatic fix.

This changes the way Jool has to cope with nf_defrag_ipv6. I'm going to have to study this [Link was dead] again.

Member

ydahhrk commented Oct 6, 2016

Totally a 3.4 bug as well.

It seems to happen when the packets are both paged and fragmented. This seems to prevent nf_defrag_ipv6 from purging the fragment header. Indeed, the packet is getting dropped silently. I guess TCP is noticing this, hence the automatic fix.

This changes the way Jool has to cope with nf_defrag_ipv6. I'm going to have to study this [Link was dead] again.

ydahhrk added a commit that referenced this issue Oct 6, 2016

Silence the #231 warning
The bug is already known so there's no need to keep adding noise to
the kernel log.

@ydahhrk ydahhrk modified the milestone: 3.5.5 Jun 5, 2017

@ydahhrk ydahhrk removed this from the 3.5.5 milestone Nov 24, 2017

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Dec 7, 2017

Member

Planning to remove the defrag dependency in Jool 4, so this will be indirectly fixed by #140.

Member

ydahhrk commented Dec 7, 2017

Planning to remove the defrag dependency in Jool 4, so this will be indirectly fixed by #140.

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Dec 7, 2017

Member

Things Things

Member

ydahhrk commented Dec 7, 2017

Things Things

@ydahhrk

This comment has been minimized.

Show comment
Hide comment
@ydahhrk

ydahhrk Jul 6, 2018

Member

This code is supposed to be unreachable in kernels 3.13+! Please report.

Turns out that this bug took too long to get fixed, and the kernels that were supposed to reach the stated code are seemingly phasing out of relevance. Right now, the earliest most relevant kernels that I'm aware of are

At this point it seems that the sensible solution is to just drop support for kernels 3.12- and call it a day. This is the current plan. If anyone disagrees, please comment.

Member

ydahhrk commented Jul 6, 2018

This code is supposed to be unreachable in kernels 3.13+! Please report.

Turns out that this bug took too long to get fixed, and the kernels that were supposed to reach the stated code are seemingly phasing out of relevance. Right now, the earliest most relevant kernels that I'm aware of are

At this point it seems that the sensible solution is to just drop support for kernels 3.12- and call it a day. This is the current plan. If anyone disagrees, please comment.

@ydahhrk ydahhrk removed the Depends on #140 label Jul 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment