Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect fragment handling on higher kernels #231

Closed
ydahhrk opened this issue Oct 6, 2016 · 5 comments
Closed

Incorrect fragment handling on higher kernels #231

ydahhrk opened this issue Oct 6, 2016 · 5 comments
Labels
Milestone

Comments

@ydahhrk
Copy link
Member

ydahhrk commented Oct 6, 2016

Updated some kernel (4.4.1), found this several times on the log:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3260 at /home/aleiva/Jool-3.5.0/mod/stateful/fragment_db.c:364 fragdb_handle+0x45/0x50 [jool]()
This code is supposed to be unreachable in kernels 3.13+! Please report.
Modules linked in: jool(OE) nf_defrag_ipv6 nf_defrag_ipv4 coretemp crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper vmw_balloon joydev cryptd input_leds vmwgfx serio_raw ttm drm_kms_helper drm fb_sys_fops bnep syscopyarea rfcomm sysfillrect sysimgblt bluetooth vmw_vmci shpchp 8250_fintek pata_acpi mac_hid i2c_piix4 parport_pc ppdev lp parport psmouse vmxnet3 mptspi mptscsih mptbase scsi_transport_spi floppy fjes
CPU: 0 PID: 3260 Comm: ping6 Tainted: G        W  OE   4.4.1-040401-generic #201601311534
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
 0000000000000000 00000000b6bb53ca ffff88013fc03a98 ffffffff813c8e14
 ffff88013fc03ae0 ffff88013fc03ad0 ffffffff8107dba2 ffff8800b3a02000
 ffff88013fc03b98 0000000000000018 ffff8800b47ff200 000000000000dd86
Call Trace:
 <IRQ>  [<ffffffff813c8e14>] dump_stack+0x44/0x60
 [<ffffffff8107dba2>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8107dc3c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffffc02f10a5>] fragdb_handle+0x45/0x50 [jool]
 [<ffffffffc02fd2ce>] core_6to4+0xbe/0x170 [jool]
 [<ffffffff810bf802>] ? __wake_up_common+0x52/0x90
 [<ffffffffc02dd5d0>] ? nf_ct_frag6_gather+0xd0/0xde0 [nf_defrag_ipv6]
 [<ffffffff81647f2f>] ? evdev_pass_values+0x1af/0x220
 [<ffffffffc02dd0dc>] ? ipv6_defrag+0xcc/0x1b0 [nf_defrag_ipv6]
 [<ffffffff8164842e>] ? evdev_events+0xae/0xd0
 [<ffffffffc02f2805>] hook_ipv6+0x15/0x20 [jool]
 [<ffffffff8172b282>] nf_iterate+0x62/0x80
 [<ffffffff8172b313>] nf_hook_slow+0x73/0xd0
 [<ffffffff817a263e>] ipv6_rcv+0x41e/0x4c0
 [<ffffffff817a1cd0>] ? ip6_make_skb+0x1e0/0x1e0
 [<ffffffff816f4536>] __netif_receive_skb_core+0x6e6/0xa40
 [<ffffffff810ad900>] ? sched_clock_init+0x60/0x90
 [<ffffffff810a74e4>] ? check_preempt_curr+0x54/0x90
 [<ffffffff810a7539>] ? ttwu_do_wakeup+0x19/0xc0
 [<ffffffff816f48a8>] __netif_receive_skb+0x18/0x60
 [<ffffffff816f5568>] process_backlog+0xa8/0x150
 [<ffffffff816f4dd0>] net_rx_action+0x210/0x320
 [<ffffffff81082446>] __do_softirq+0xf6/0x250
 [<ffffffff817ff8cc>] do_softirq_own_stack+0x1c/0x30
 <EOI>  [<ffffffff81081f28>] do_softirq.part.19+0x38/0x40
 [<ffffffff81081fad>] __local_bh_enable_ip+0x7d/0x80
 [<ffffffff8179ea67>] ip6_finish_output2+0x1a7/0x4d0
 [<ffffffff817beb8c>] ? raw6_getfrag+0xac/0x100
 [<ffffffff817a1386>] ip6_finish_output+0xa6/0x100
 [<ffffffff817a1433>] ip6_output+0x53/0x110
 [<ffffffff817d9e27>] ? __ip6_local_out+0xb7/0xd0
 [<ffffffff817d9e75>] ip6_local_out+0x35/0x40
 [<ffffffff817a1a53>] ip6_send_skb+0x23/0x70
 [<ffffffff817a1aed>] ip6_push_pending_frames+0x4d/0x50
 [<ffffffff817bff01>] rawv6_sendmsg+0xa41/0xcd0
 [<ffffffff8121a3d0>] ? poll_select_copy_remaining+0x140/0x140
 [<ffffffff816da570>] ? sock_common_recvmsg+0x40/0x70
 [<ffffffff816d7bcb>] ? sock_recvmsg+0x3b/0x50
 [<ffffffff8176e4e5>] inet_sendmsg+0x65/0xa0
 [<ffffffff816d80f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d86e1>] SYSC_sendto+0x101/0x190
 [<ffffffff816d9781>] ? __sys_recvmsg+0x51/0x90
 [<ffffffff810f06c5>] ? ktime_get_ts64+0x45/0xf0
 [<ffffffff816d920e>] SyS_sendto+0xe/0x10
 [<ffffffff817fdbb6>] entry_SYSCALL_64_fastpath+0x16/0x75
---[ end trace 79082aedf6dc6af0 ]---

This is a warning, not a panic. NAT64 only. Tested on Jool 3.5. Relevant and surrounding code did not change between 3.4 and 3.5, so this is likely a bug on both Jool series. Will confirm later.

Affects fragmented packets. I don't know what happens to them; they probably get dropped. Trying to replicate it I seem to have triggered it once by querying either Steam or the Playstation Store for the first time in a while. If my traffic really caused it, the endnodes managed to stabilize the connection automatically; I didn't notice any disruptions.

@ydahhrk ydahhrk added the Bug label Oct 6, 2016
@ydahhrk
Copy link
Member Author

ydahhrk commented Oct 6, 2016

Totally a 3.4 bug as well.

It seems to happen when the packets are both paged and fragmented. This seems to prevent nf_defrag_ipv6 from purging the fragment header. Indeed, the packet is getting dropped silently. I guess TCP is noticing this, hence the automatic fix.

This changes the way Jool has to cope with nf_defrag_ipv6. I'm going to have to study this [Link was dead] again.

ydahhrk added a commit that referenced this issue Oct 6, 2016
The bug is already known so there's no need to keep adding noise to
the kernel log.
@ydahhrk ydahhrk modified the milestone: 3.5.5 Jun 5, 2017
@ydahhrk ydahhrk removed this from the 3.5.5 milestone Nov 24, 2017
@ydahhrk
Copy link
Member Author

ydahhrk commented Dec 7, 2017

Planning to remove the defrag dependency in Jool 4, so this will be indirectly fixed by #140.

@ydahhrk ydahhrk added the Depends on #140 Cannot fix until issue #140 is addressed label Dec 7, 2017
@ydahhrk
Copy link
Member Author

ydahhrk commented Dec 7, 2017

Things Things

@ydahhrk
Copy link
Member Author

ydahhrk commented Jul 6, 2018

This code is supposed to be unreachable in kernels 3.13+! Please report.

Turns out that this bug took too long to get fixed, and the kernels that were supposed to reach the stated code are seemingly phasing out of relevance. Right now, the earliest most relevant kernels that I'm aware of are

At this point it seems that the sensible solution is to just drop support for kernels 3.12- and call it a day. This is the current plan. If anyone disagrees, please comment.

@ydahhrk ydahhrk removed the Depends on #140 Cannot fix until issue #140 is addressed label Jul 6, 2018
ydahhrk added a commit that referenced this issue Oct 8, 2018
- Removed glue code for kernels older than 3.13.
  (Mostly the fragment database. Including --fragment-arrival-timeout.)
  Basically terminates #231.
- Fixed some --instance bugs.
  Usage of the command should also be simpler. It is certainly easier to
  explain in the documentation, at least.
- The stats system got in the way of something (can't recall what), and
  I decided that it needed a refactor.
  Only the API and callers were updated; the innards were removed
  because I'm out of time.
- Separate Netfilter hook code (kernel_hook_netfilter.c) from iptables
  hook code (kernel_hook_iptables.c).
- To account for the design of iptables, VERDICT_ACCEPT became
  VERDICT_UNTRANSLATABLE (accept on Netfilter, drop on iptables).
  Jool no longer NF_ACCEPTs at iptables; it didn't make sense because
  of the rule matching.
@ydahhrk ydahhrk added this to the 3.6.0 milestone Nov 23, 2018
@ydahhrk ydahhrk modified the milestones: 3.6.0, 4.0.0 Jan 9, 2019
@ydahhrk
Copy link
Member Author

ydahhrk commented Jan 17, 2019

Released; closing.

@ydahhrk ydahhrk closed this as completed Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant