Skip to content

Conversation

pvts-mat
Copy link
Contributor

@pvts-mat pvts-mat commented Aug 11, 2025

[CBR 7.9]
CVE-2023-5717
VULN-7623

Problem

https://www.cve.org/CVERecord?id=CVE-2023-5717

A heap out-of-bounds write vulnerability in the Linux kernel's Linux Kernel Performance Events (perf) component can be exploited to achieve local privilege escalation. If perf_read_group() is called while an event's sibling_list is smaller than its child's sibling_list, it can increment or write to memory locations outside of the allocated buffer.

Applicability: yes

The perf component is included with the CONFIG_PERF_EVENTS option, which is enabled in all ciqcbr7_9 configs

$ grep 'CONFIG_PERF_EVENTS\b' configs/*.config

configs/kernel-3.10.0-x86_64-debug.config:CONFIG_PERF_EVENTS=y
configs/kernel-3.10.0-x86_64.config:CONFIG_PERF_EVENTS=y

The commit "flipping the order of child_list and sibling_list" which introduced the bug - fa8c269 - was backported to ciqcbr7_9 in 170ca9a. The fixing commit 32671e3 is missing and wasn't backported.

Solution

The mainline fix 32671e3 adds a new group_generation field to the perf_event struct. This breaks CBR 7.9 kABI. The field was preserved, but moved to the end of the struct and wrapped in the RH_KABI_EXTEND macro. Unlike in the case of LTS 8.6 (#475) the investigation of whether it's safe to do was not necessary, because the struct already contained multiple RH_KABI_EXTEND(…) fields at the end, which could not have been added otherwise:

RH_KABI_EXTEND(struct list_head migrate_entry)
RH_KABI_EXTEND(struct list_head active_entry)
RH_KABI_EXTEND(void *pmu_private)
#if defined(CONFIG_FUNCTION_TRACER) && !defined(CONFIG_X86_64)
RH_KABI_EXTEND(struct ftrace_ops ftrace_ops)
#endif
/* address range filters */
RH_KABI_EXTEND(struct perf_addr_filters_head addr_filters)
/* vma address array for file-based filders */
RH_KABI_EXTEND(unsigned long *addr_filters_offs)
RH_KABI_EXTEND(unsigned long addr_filters_gen)
RH_KABI_EXTEND(struct list_head sb_list)
RH_KABI_EXTEND(u64 (*clock)(void))
/* The cumulative AND of all event_caps for events in this group. */
RH_KABI_EXTEND(int group_caps)
/*
* Node on the pinned or flexible tree located at the event context;
*/
RH_KABI_EXTEND(struct rb_node group_node)
RH_KABI_EXTEND(u64 group_index)
RH_KABI_EXTEND(struct list_head active_list)
#ifdef CONFIG_BPF_SYSCALL
RH_KABI_EXTEND(perf_overflow_handler_t orig_overflow_handler)
RH_KABI_EXTEND(struct bpf_prog *prog)
#endif
RH_KABI_EXTEND(unsigned long rcu_batches)
RH_KABI_EXTEND(int rcu_pending)
#endif /* CONFIG_PERF_EVENTS */
};

Additionally, a fix-of-the-fix on the mainlie was commited in a71ef31 which was also included in this backport.

kABI check: passed

$ python /mnt/code/kernel-dist-git-el-7.9/SOURCES/check-kabi -k /mnt/code/kernel-dist-git-el-7.9/SOURCES/Module.kabi_x86_64 -s /mnt/build_files/kernel-src-tree-ciqcbr7_9-CVE-2023-5717/Module.symvers
$ echo $?
0

Boot test: passed

boot-test.log

Kselftests: passed relative

Reference

kselftests–ciqcbr7_9–run1.log
kselftests–ciqcbr7_9–run2.log
kselftests–ciqcbr7_9–run3.log

Patch

kselftests–ciqcbr7_9-CVE-2023-5717–run1.log
kselftests–ciqcbr7_9-CVE-2023-5717–run2.log
kselftests–ciqcbr7_9-CVE-2023-5717–run3.log

Comparison

The results were compared manually with Meld. No differences indicative of a problem introduced by the patch were found.

Specific tests: passed

While not strictly testing the provided patch, a very basic sanity check of the perf_events module was done to see if it remains functional.

Reference

$ uname -r 
3.10.0-ciqcbr7_9
$ sudo perf stat -B dd if=/dev/zero of=/dev/null count=1000000

1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 1.36361 s, 375 MB/s

 Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':

          1,366.22 msec task-clock                #    0.995 CPUs utilized          
                 3      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               217      page-faults               #    0.159 K/sec                  
     5,856,326,009      cycles                    #    4.287 GHz                    
     2,520,750,706      instructions              #    0.43  insn per cycle         
       544,415,208      branches                  #  398.483 M/sec                  
        11,042,875      branch-misses             #    2.03% of all branches        

       1.372717065 seconds time elapsed

       0.602402000 seconds user
       0.770071000 seconds sys

Patch

$ uname -r 
3.10.0-ciqcbr7_9-CVE-2023-5717
$ sudo perf stat -B dd if=/dev/zero of=/dev/null count=1000000

1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 1.39469 s, 367 MB/s

 Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':

          1,396.45 msec task-clock                #    0.995 CPUs utilized          
                 5      context-switches          #    0.004 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               218      page-faults               #    0.156 K/sec                  
     5,900,275,843      cycles                    #    4.225 GHz                    
     2,520,173,133      instructions              #    0.43  insn per cycle         
       544,174,329      branches                  #  389.684 M/sec                  
        11,045,660      branch-misses             #    2.03% of all branches        

       1.404119544 seconds time elapsed

       0.669277000 seconds user
       0.734597000 seconds sys

jira VULN-7623
cve CVE-2023-5717
commit-author Peter Zijlstra <peterz@infradead.org>
commit 32671e3
upstream-diff The mainline fix 32671e3
  adds a new `group_generation' field to the `perf_event' struct. This
  breaks CBR 7.9 kABI. The new field was preserved, but moved to the end
  of the struct and wrapped in the `RH_KABI_EXTEND' macro. It can be
  assumed the kABI in this particular case is preserved based on the fact
  that there are already plenty of `RH_KABI_EXTEND(...)' fields at the end
  which could not have been added if the premise was false.

Because group consistency is non-atomic between parent (filedesc) and children
(inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum
non-matching counter groups -- with non-sensical results.

Add group_generation to distinguish the case where a parent group removes and
adds an event and thus has the same number, but a different configuration of
events as inherited groups.

This became a problem when commit fa8c269 ("perf/core: Invert
perf_read_group() loops") flipped the order of child_list and sibling_list.
Previously it would iterate the group (sibling_list) first, and for each
sibling traverse the child_list. In this order, only the group composition of
the parent is relevant. By flipping the order the group composition of the
child (inherited) events becomes an issue and the mis-match in group
composition becomes evident.

That said; even prior to this commit, while reading of a group that is not
equally inherited was not broken, it still made no sense.

(Ab)use ECHILD as error return to indicate issues with child process group
composition.

Fixes: fa8c269 ("perf/core: Invert perf_read_group() loops")
	Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net
(cherry picked from commit 32671e3)
	Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-7623
cve-bf CVE-2023-5717
commit-author Peter Zijlstra <peterz@infradead.org>
commit a71ef31

Smatch is awesome.

Fixes: 32671e3 ("perf: Disallow mis-matched inherited group reads")
	Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a71ef31)
	Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

@PlaidCat PlaidCat merged commit 83b75d7 into ctrliq:ciqcbr7_9 Aug 13, 2025
1 of 2 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 29, 2025
Migration may be raced with fallocating hole.  remove_inode_single_folio
will unmap the folio if the folio is still mapped.  However, it's called
without folio lock.  If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it.  Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again.  As a result, we triggered BUG in filemap_unaccount_folio.

The log is as follows:
 BUG: Bad page cache in process hugetlb  pfn:156c00
 page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
 head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
 aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
 flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
 page_type: f4(hugetlb)
 page dumped because: still mapped when deleted
 CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x4f/0x70
  filemap_unaccount_folio+0xc4/0x1c0
  __filemap_remove_folio+0x38/0x1c0
  filemap_remove_folio+0x41/0xd0
  remove_inode_hugepages+0x142/0x250
  hugetlbfs_fallocate+0x471/0x5a0
  vfs_fallocate+0x149/0x380

Hold folio lock before checking if the folio is mapped to avold race with
migration.

Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1 ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
github-actions bot pushed a commit that referenced this pull request Oct 2, 2025
commit 7b73876 upstream.

Migration may be raced with fallocating hole.  remove_inode_single_folio
will unmap the folio if the folio is still mapped.  However, it's called
without folio lock.  If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it.  Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again.  As a result, we triggered BUG in filemap_unaccount_folio.

The log is as follows:
 BUG: Bad page cache in process hugetlb  pfn:156c00
 page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
 head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
 aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
 flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
 page_type: f4(hugetlb)
 page dumped because: still mapped when deleted
 CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x4f/0x70
  filemap_unaccount_folio+0xc4/0x1c0
  __filemap_remove_folio+0x38/0x1c0
  filemap_remove_folio+0x41/0xd0
  remove_inode_hugepages+0x142/0x250
  hugetlbfs_fallocate+0x471/0x5a0
  vfs_fallocate+0x149/0x380

Hold folio lock before checking if the folio is mapped to avold race with
migration.

Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1 ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants