Skip to content

Conversation

pvts-mat
Copy link
Contributor

[LTS 8.8]
CVE-2023-3812
VULN-3182

Problem

https://www.cve.org/CVERecord?id=CVE-2023-3812

An out-of-bounds memory access flaw was found in the Linux kernel’s TUN/TAP device driver functionality in how a user generates a malicious (too big) networking packet when napi frags is enabled. This flaw allows a local user to crash or potentially escalate their privileges on the system.

Applicability

The tun/tap devices are enabled in ciqlts8_8:

grep '^CONFIG_TUN=' configs/*.config

configs/kernel-aarch64-debug.config:CONFIG_TUN=m
configs/kernel-aarch64.config:CONFIG_TUN=m
configs/kernel-ppc64le-debug.config:CONFIG_TUN=m
configs/kernel-ppc64le.config:CONFIG_TUN=m
configs/kernel-s390x-debug.config:CONFIG_TUN=m
configs/kernel-s390x.config:CONFIG_TUN=m
configs/kernel-x86_64-debug.config:CONFIG_TUN=m
configs/kernel-x86_64.config:CONFIG_TUN=m

The "enablement of napi frags" mentioned in CVE description doesn't relate to any kernel configuration option, but to the IFF_NAPI_FRAGS flag, implemented for tun/tap devices in the 90e33d4 commit (the one introducing the bug), which can be passed to the ioctl call during the tun device allocation procedure done by a user (see https://www.kernel.org/doc/Documentation/networking/tuntap.txt), so it must be assumed it is enabled.

Solution

The mainline fix is provided in the 363a532 commit. The official backport to the kernel version 4.19 (the closest to ciqlts8_8's 4.18) was done in aa815bf without any changes. Additionaly, the fix was applied already to Rocky ciqlts8_6 in 680bcfa and to Rocky ciqlts9_4 in 3eb34f2 in the same form.

kABI check: passed

DEBUG=1 CVE=CVE-2023-3812 ./ninja.sh _kabi_checked__x86_64--test--ciqlts8_8-CVE-2023-3812

ninja: Entering directory `/data/build/rocky-patching'
[0/1] Check ABI of kernel [ciqlts8_8-CVE-2023-3812]
++ uname -m
+ python3 /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/check-kabi -k /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/Module.kabi_x86_64 -s vms/x86_64--build--ciqlts8_8/build_files/kernel-src-tree-ciqlts8_8-CVE-2023-3812/Module.symvers
kABI check passed
+ touch state/kernels/ciqlts8_8-CVE-2023-3812/x86_64/kabi_checked

Boot test: passed

boot-test.log

Kselftests: passed relative

Coverage

Specific tests were skipped which proved to be unreliable in the past. See the rocky.yml file for details.

android, bpf (except test_progs, test_progs-no_alu32, test_kmod.sh, test_xsk.sh, test_sockmap), breakpoints, capabilities, cgroup, core, cpu-hotplug, cpufreq, drivers/net/bonding, drivers/net/team, efivarfs, exec, firmware, fpu, ftrace, futex, gpio, intel_pstate, ipc, kcmp, kexec, kvm, lib, livepatch, membarrier, memfd, memory-hotplug, mount, mqueue, net/forwarding (except sch_tbf_root.sh, sch_tbf_prio.sh, mirror_gre_vlan_bridge_1q.sh, sch_ets.sh, sch_tbf_ets.sh, mirror_gre_bridge_1d_vlan.sh, ipip_hier_gre_keys.sh, tc_actions.sh), net/mptcp (except simult_flows.sh), net (except udpgro_fwd.sh, ip_defrag.sh, xfrm_policy.sh, reuseaddr_conflict, reuseport_addr_any.sh, gro.sh, txtimestamp.sh, udpgso_bench.sh), netfilter (except nft_trans_stress.sh), nsfs, pstore, ptrace, rseq, sgx, sigaltstack, size, splice, static_keys, sync, sysctl, tc-testing, tdx, timens, timers (except raw_skew), tpm2, user, vm, x86, zram

Reference

kselftests–ciqlts8_8–run1.log
kselftests–ciqlts8_8–run2.log
kselftests–ciqlts8_8–run3.log

Patch

kselftests–ciqlts8_8-CVE-2023-3812–run1.log
kselftests–ciqlts8_8-CVE-2023-3812–run2.log

Comparison

The results are the same in the patched and reference kernel.

ktests.xsh diff -d kselftests*.log

Column    File
--------  ---------------------------------------------
Status0   kselftests--ciqlts8_8--run1.log
Status1   kselftests--ciqlts8_8--run2.log
Status2   kselftests--ciqlts8_8--run3.log
Status3   kselftests--ciqlts8_8-CVE-2023-3812--run1.log
Status4   kselftests--ciqlts8_8-CVE-2023-3812--run2.log

Specific tests: dropped

An attempt was made to replicate the bug indicated in CVE. Dropped after a while when it became clear it would require dealing with too many technicalities of programmatically setting up a tun device, not really worth it. Can be done on demand.

jira VULN-3182
cve CVE-2023-3812
commit-author Ziyang Xuan <william.xuanziyang@huawei.com>
commit 363a532

Recently, we got two syzkaller problems because of oversize packet
when napi frags enabled.

One of the problems is because the first seg size of the iov_iter
from user space is very big, it is 2147479538 which is bigger than
the threshold value for bail out early in __alloc_pages(). And
skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc
reserves without __GFP_NOWARN flag. Thus we got a warning as following:

========================================================
WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
...
Call trace:
 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
 __alloc_pages_node include/linux/gfp.h:550 [inline]
 alloc_pages_node include/linux/gfp.h:564 [inline]
 kmalloc_large_node+0x94/0x350 mm/slub.c:4038
 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
 __skb_grow include/linux/skbuff.h:2779 [inline]
 tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036

The other problem is because odd IPv6 packets without NEXTHDR_NONE
extension header and have big packet length, it is 2127925 which is
bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in
ipv6_gro_receive(), network_header offset and transport_header offset
are all bigger than U16_MAX. That would trigger skb->network_header
and skb->transport_header overflow error, because they are all '__u16'
type. Eventually, it would affect the value for __skb_push(skb, value),
and make it be a big value. After __skb_push() in ipv6_gro_receive(),
skb->data would less than skb->head, an out of bounds memory bug occurred.
That would trigger the problem as following:

==================================================================
BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260
...
Call trace:
 dump_backtrace+0xd8/0x130
 show_stack+0x1c/0x50
 dump_stack_lvl+0x64/0x7c
 print_address_description.constprop.0+0xbc/0x2e8
 print_report+0x100/0x1e4
 kasan_report+0x80/0x120
 __asan_load8+0x78/0xa0
 eth_type_trans+0x100/0x260
 napi_gro_frags+0x164/0x550
 tun_get_user+0xda4/0x1270
 tun_chr_write_iter+0x74/0x130
 do_iter_readv_writev+0x130/0x1ec
 do_iter_write+0xbc/0x1e0
 vfs_writev+0x13c/0x26c

To fix the problems, restrict the packet size less than
(ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved
skb space in napi_alloc_skb() because transport_header is an offset from
skb->head. Add len check in tun_napi_alloc_frags() simply.

Fixes: 90e33d4 ("tun: enable napi_gro_frags() for TUN/TAP driver")
	Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
	Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 363a532)
	Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@PlaidCat PlaidCat merged commit 35d1c79 into ctrliq:ciqlts8_8 May 15, 2025
2 checks passed
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Aug 29, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4
commit-author Daniel Borkmann <daniel@iogearbox.net>
commit d1a783d

Add various tests to check maximum number of supported programs
being attached:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  ./test_progs -t tc_opts
  [    1.185325] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz
  [    1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns
  [    1.276408] clocksource: Switched to clocksource tsc
  ctrliq#252     tc_opts_after:OK
  ctrliq#253     tc_opts_append:OK
  ctrliq#254     tc_opts_basic:OK
  ctrliq#255     tc_opts_before:OK
  ctrliq#256     tc_opts_chain_classic:OK
  ctrliq#257     tc_opts_chain_mixed:OK
  ctrliq#258     tc_opts_delete_empty:OK
  ctrliq#259     tc_opts_demixed:OK
  ctrliq#260     tc_opts_detach:OK
  ctrliq#261     tc_opts_detach_after:OK
  ctrliq#262     tc_opts_detach_before:OK
  ctrliq#263     tc_opts_dev_cleanup:OK
  ctrliq#264     tc_opts_invalid:OK
  ctrliq#265     tc_opts_max:OK              <--- (new test)
  ctrliq#266     tc_opts_mixed:OK
  ctrliq#267     tc_opts_prepend:OK
  ctrliq#268     tc_opts_replace:OK
  ctrliq#269     tc_opts_revision:OK
  Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED

	Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
	Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230929204121.20305-2-daniel@iogearbox.net
(cherry picked from commit d1a783d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Aug 29, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4
commit-author Daniel Borkmann <daniel@iogearbox.net>
commit f9b0879

Add a new test case which performs double query of the bpf_mprog through
libbpf API, but also via raw bpf(2) syscall. This is testing to gather
first the count and then in a subsequent probe the full information with
the program array without clearing passed structs in between.

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  ./test_progs -t tc_opts
  [    1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz
  [    1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns
  [    1.402734] clocksource: Switched to clocksource tsc
  [    1.426639] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  ctrliq#252     tc_opts_after:OK
  ctrliq#253     tc_opts_append:OK
  ctrliq#254     tc_opts_basic:OK
  ctrliq#255     tc_opts_before:OK
  ctrliq#256     tc_opts_chain_classic:OK
  ctrliq#257     tc_opts_chain_mixed:OK
  ctrliq#258     tc_opts_delete_empty:OK
  ctrliq#259     tc_opts_demixed:OK
  ctrliq#260     tc_opts_detach:OK
  ctrliq#261     tc_opts_detach_after:OK
  ctrliq#262     tc_opts_detach_before:OK
  ctrliq#263     tc_opts_dev_cleanup:OK
  ctrliq#264     tc_opts_invalid:OK
  ctrliq#265     tc_opts_max:OK
  ctrliq#266     tc_opts_mixed:OK
  ctrliq#267     tc_opts_prepend:OK
  ctrliq#268     tc_opts_query:OK            <--- (new test)
  ctrliq#269     tc_opts_replace:OK
  ctrliq#270     tc_opts_revision:OK
  Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED

	Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.net
	Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
(cherry picked from commit f9b0879)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Aug 29, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4
commit-author Daniel Borkmann <daniel@iogearbox.net>
commit 685446b

Add a new test case to query on an empty bpf_mprog and pass the revision
directly into expected_revision for attachment to assert that this does
succeed.

  ./test_progs -t tc_opts
  [    1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz
  [    1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns
  [    1.412419] clocksource: Switched to clocksource tsc
  [    1.428671] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  ctrliq#252     tc_opts_after:OK
  ctrliq#253     tc_opts_append:OK
  ctrliq#254     tc_opts_basic:OK
  ctrliq#255     tc_opts_before:OK
  ctrliq#256     tc_opts_chain_classic:OK
  ctrliq#257     tc_opts_chain_mixed:OK
  ctrliq#258     tc_opts_delete_empty:OK
  ctrliq#259     tc_opts_demixed:OK
  ctrliq#260     tc_opts_detach:OK
  ctrliq#261     tc_opts_detach_after:OK
  ctrliq#262     tc_opts_detach_before:OK
  ctrliq#263     tc_opts_dev_cleanup:OK
  ctrliq#264     tc_opts_invalid:OK
  ctrliq#265     tc_opts_max:OK
  ctrliq#266     tc_opts_mixed:OK
  ctrliq#267     tc_opts_prepend:OK
  ctrliq#268     tc_opts_query:OK
  ctrliq#269     tc_opts_query_attach:OK     <--- (new test)
  ctrliq#270     tc_opts_replace:OK
  ctrliq#271     tc_opts_revision:OK
  Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED

	Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.net
	Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
(cherry picked from commit 685446b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Aug 29, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4
commit-author Daniel Borkmann <daniel@iogearbox.net>
commit 2451630

Add several new test cases which assert corner cases on the mprog query
mechanism, for example, around passing in a too small or a larger array
than the current count.

  ./test_progs -t tc_opts
  ctrliq#252     tc_opts_after:OK
  ctrliq#253     tc_opts_append:OK
  ctrliq#254     tc_opts_basic:OK
  ctrliq#255     tc_opts_before:OK
  ctrliq#256     tc_opts_chain_classic:OK
  ctrliq#257     tc_opts_chain_mixed:OK
  ctrliq#258     tc_opts_delete_empty:OK
  ctrliq#259     tc_opts_demixed:OK
  ctrliq#260     tc_opts_detach:OK
  ctrliq#261     tc_opts_detach_after:OK
  ctrliq#262     tc_opts_detach_before:OK
  ctrliq#263     tc_opts_dev_cleanup:OK
  ctrliq#264     tc_opts_invalid:OK
  ctrliq#265     tc_opts_max:OK
  ctrliq#266     tc_opts_mixed:OK
  ctrliq#267     tc_opts_prepend:OK
  ctrliq#268     tc_opts_query:OK
  ctrliq#269     tc_opts_query_attach:OK
  ctrliq#270     tc_opts_replace:OK
  ctrliq#271     tc_opts_revision:OK
  Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED

	Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
	Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
	Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231017081728.24769-1-daniel@iogearbox.net
(cherry picked from commit 2451630)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants