Skip to content

[pull] master from torvalds:master#102

Merged
pull[bot] merged 19 commits intoantiscam-ico:masterfrom
torvalds:master
Feb 22, 2020
Merged

[pull] master from torvalds:master#102
pull[bot] merged 19 commits intoantiscam-ico:masterfrom
torvalds:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Feb 22, 2020

See Commits and Changes for more details.


Created by pull[bot]. Want to support this open source service? Please star it : )

Sam Bobroff and others added 19 commits February 17, 2020 12:47
Recovering a dead PHB can currently cause a deadlock as the PCI
rescan/remove lock is taken twice.

This is caused as part of an existing bug in
eeh_handle_special_event(). The pe is processed while traversing the
PHBs even though the pe is unrelated to the loop. This causes the pe
to be, incorrectly, processed more than once.

Untangling this section can move the pe processing out of the loop and
also outside the locked section, correcting both problems.

Fixes: 2e25505 ("powerpc/eeh: Fix crash when edev->pdev changes")
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Tested-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0547e82dbf90ee0729a2979a8cac5c91665c621f.1581051445.git.sbobroff@linux.ibm.com
Commit 55c8fc3 ("powerpc/8xx: reintroduce 16K pages with HW
assistance") redefined pte_t as a struct of 4 pte_basic_t, because
in 16K pages mode there are four identical entries in the
page table. But the size of hugepage tables is calculated based
of the size of (void *). Therefore, we end up with page tables
of size 1k instead of 4k for 512k pages.

As 512k hugepage tables are the same size as standard page tables,
ie 4k, use the standard page tables instead of PGT_CACHE tables.

Fixes: 3fb69c6 ("powerpc/8xx: Enable 512k hugepage support with HW assistance")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/90ec56a2315be602494619ed0223bba3b0b8d619.1580997007.git.christophe.leroy@c-s.fr
With HW assistance all page tables must be 4k aligned, the 8xx drops
the last 12 bits during the walk.

Redefine HUGEPD_SHIFT_MASK to mask last 12 bits out. HUGEPD_SHIFT_MASK
is used to for alignment of page table cache.

Fixes: 22569b8 ("powerpc/8xx: Enable 8M hugepage support with HW assistance")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/778b1a248c4c7ca79640eeff7740044da6a220a0.1581264115.git.christophe.leroy@c-s.fr
In ITLB miss handled the line supposed to clear bits 20-23 on the L2
ITLB entry is buggy and does indeed nothing, leading to undefined
value which could allow execution when it shouldn't.

Properly do the clearing with the relevant instruction.

Fixes: 74fabca ("powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4f70c2778163affce8508a210f65d140e84524b4.1581272050.git.christophe.leroy@c-s.fr
… delivery

After a treclaim, we expect to be in non-transactional state. If we
don't clear the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state,
handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
transaction using tm_reclaim_current() but without clearing the
thread's MSR[TS]. This can cause the TM Bad Thing exception below if
later we pagefault and get preempted trying to access the user's
sigframe, using __put_user(). Afterwards, when we are rescheduled back
into do_page_fault() (but now in suspended state since the thread's
MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition
from suspended to non-transactional state is invalid.

  Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
  NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
  REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
  MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
  CFAR: c00000000000dda4 IRQMASK: 0
  PACATMSCRATCH: 800000010280b033
  GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
  GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
  GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
  GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
  GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
  GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
  NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
  LR [c000000000034728] handle_rt_signal64+0x78/0xc50
  Call Trace:
  [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
  [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
  [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
  Instruction dump:
  7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
  60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
  ---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ...				# userspace in NON-TRANSACTIONAL state
  tbegin			# userspace in TRANSACTIONAL state
  signal delivery		# kernelspace in SUSPENDED state
  handle_rt_signal64()
    get_tm_stackpointer()
      treclaim			# kernelspace in NON-TRANSACTIONAL state
    __put_user()
      page fault happens. We will never get back here because of the TM Bad Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
    __schedule()
      __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
  __switch_to(our_task)
    switch_to_tm()
      tm_recheckpoint_new_task()
        trechkpt			# kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()    completes
  rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
  TM Bad Thing			# illegal transition from suspended to non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS]
just after treclaim in get_tm_stackpointer() so that we stay in
non-transactional state in case we are preempted. In order to make
treclaim and clearing the thread's MSR[TS] atomic from a preemption
perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
used. It's also necessary to save the previous value of the thread's
MSR before get_tm_stackpointer() is called so that it can be exposed
to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

Fixes: 2b0a576 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.com
hash_page() needs to read page tables from kernel memory. When entire
kernel memory is mapped by BATs, which is normally the case when
CONFIG_STRICT_KERNEL_RWX is not set, it works even if the page hosting
the page table is not referenced in the MMU hash table.

However, if the page where the page table resides is not covered by
a BAT, a DSI fault can be encountered from hash_page(), and it loops
forever. This can happen when CONFIG_STRICT_KERNEL_RWX is selected
and the alignment of the different regions is too small to allow
covering the entire memory with BATs. This also happens when
CONFIG_DEBUG_PAGEALLOC is selected or when booting with 'nobats'
flag.

Also, if the page containing the kernel stack is not present in the
MMU hash table, registers cannot be saved and a recursive DSI fault
is encountered.

To allow hash_page() to properly do its job at all time and load the
MMU hash table whenever needed, it must run with data MMU disabled.
This means it must be called before re-enabling data MMU. To allow
this, registers clobbered by hash_page() and create_hpte() have to
be saved in the thread struct together with SRR0, SSR1, DAR and DSISR.
It is also necessary to ensure that DSI prolog doesn't overwrite
regs saved by prolog of the current running exception. That means:
- DSI can only use SPRN_SPRG_SCRATCH0
- Exceptions must free SPRN_SPRG_SCRATCH0 before writing to the stack.

This also fixes the Oops reported by Erhard when create_hpte() is
called by add_hash_page().

Due to prolog size increase, a few more exceptions had to get split
in two parts.

Fixes: cd08f10 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Erhard F. <erhard_f@mailbox.org>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206501
Link: https://lore.kernel.org/r/64a4aa44686e9fd4b01333401367029771d9b231.1581761633.git.christophe.leroy@c-s.fr
With CONFIG_VMAP_STACK, data MMU has to be enabled
to read data on the stack.

Fixes: cd08f10 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d2330584f8c42d3039896e2b56f5d39676dc919c.1581669558.git.christophe.leroy@c-s.fr
power_save_ppc32_restore() is called during exception entry, before
re-enabling the MMU. It substracts KERNELBASE from the address
of nap_save_msscr0 to access it.

With CONFIG_VMAP_STACK enabled, data MMU translation has already been
re-enabled, so power_save_ppc32_restore() has to access
nap_save_msscr0 by its virtual address.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Fixes: cd08f10 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7bce32ccbab3ba3e3e0f27da6961bf6313df97ed.1581663140.git.christophe.leroy@c-s.fr
The ls (lookup symbol) and zr (reboot) commands use xmon's getstring()
helper to read a string argument from the xmon prompt. This function
skips over leading whitespace, but doesn't check if the first
"non-whitespace" character is a newline which causes some odd
behaviour (<enter> indicates a the enter key was pressed):

  0:mon> ls printk<enter>
  printk: c0000000001680c4

  0:mon> ls<enter>
  printk<enter>
  Symbol '
  printk' not found.
  0:mon>

With commit 2d9b332 ("powerpc/xmon: Allow passing an argument to
ppc_md.restart()") we have a similar problem with the zr command.
Previously zr took no arguments so "zr<enter> would trigger a reboot.
With that patch applied a second newline needs to be sent in order for
the reboot to occur. Fix this by checking if the leading whitespace
ended on a newline:

  0:mon> ls<enter>
  Symbol '' not found.

Fixes: 2d9b332 ("powerpc/xmon: Allow passing an argument to ppc_md.restart()")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200217041343.2454-1-oohall@gmail.com
Commit e0d5896 ("arm64: lse: fix LSE atomics with LLVM's integrated
assembler") broke the build when clang is used in connjunction with the
binutils assembler ("-no-integrated-as"). This happens because
__LSE_PREAMBLE is defined as ".arch armv8-a+lse", which overrides the
version of the CPU architecture passed via the "-march" paramter to gas:

$ aarch64-none-linux-gnu-as -EL -I ./arch/arm64/include
                                -I ./arch/arm64/include/generated
                                -I ./include -I ./include
                                -I ./arch/arm64/include/uapi
                                -I ./arch/arm64/include/generated/uapi
                                -I ./include/uapi -I ./include/generated/uapi
                                -I ./init -I ./init
                                -march=armv8.3-a -o init/do_mounts.o
                                /tmp/do_mounts-d7992a.s
/tmp/do_mounts-d7992a.s: Assembler messages:
/tmp/do_mounts-d7992a.s:1959: Error: selected processor does not support `autiasp'
/tmp/do_mounts-d7992a.s:2021: Error: selected processor does not support `paciasp'
/tmp/do_mounts-d7992a.s:2157: Error: selected processor does not support `autiasp'
/tmp/do_mounts-d7992a.s:2175: Error: selected processor does not support `paciasp'
/tmp/do_mounts-d7992a.s:2494: Error: selected processor does not support `autiasp'

Fix the issue by replacing ".arch armv8-a+lse" with ".arch_extension lse".
Sami confirms that the clang integrated assembler does now support the
'.arch_extension' directive, so this change will be fine even for LTO
builds in future.

Fixes: e0d5896 ("arm64: lse: fix LSE atomics with LLVM's integrated assembler")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reported-by: Amit Kachhap <Amit.Kachhap@arm.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Fixes: 12c3f1f ("powerpc/32s: get rid of CPU_FTR_601 feature")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a99fc0ad65b87a1ba51cfa3e0e9034ee294c3e07.1582034961.git.christophe.leroy@c-s.fr
Add brackets around the evaluation of the 'addr' parameter to the
untagged_addr() macro so that the cast to 'u64' applies to the result
of the expression.

Cc: <stable@vger.kernel.org>
Fixes: 597399d ("arm64: tags: Preserve tags for addresses translated via TTBR1")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
Fix trivial spelling error enought to enough in memory.rst.

Cc: trivial@kernel.org
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Will Deacon <will@kernel.org>
Currently the arm64 kernel ignores the top address byte passed to brk(),
mmap() and mremap(). When the user is not aware of the 56-bit address
limit or relies on the kernel to return an error, untagging such
pointers has the potential to create address aliases in user-space.
Passing a tagged address to munmap(), madvise() is permitted since the
tagged pointer is expected to be inside an existing mapping.

The current behaviour breaks the existing glibc malloc() implementation
which relies on brk() with an address beyond 56-bit to be rejected by
the kernel.

Remove untagging in the above functions by partially reverting commit
ce18d17 ("mm: untag user pointers in mmap/munmap/mremap/brk"). In
addition, update the arm64 tagged-address-abi.rst document accordingly.

Link: https://bugzilla.redhat.com/1797052
Fixes: ce18d17 ("mm: untag user pointers in mmap/munmap/mremap/brk")
Cc: <stable@vger.kernel.org> # 5.4.x-
Cc: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Victor Stinner <vstinner@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

arch/x86/xen/enlighten_pv.c: In function ‘xen_write_msr_safe’:
arch/x86/xen/enlighten_pv.c:904:12: warning: statement will never be executed [-Wswitch-unreachable]
  904 |   unsigned which;
      |            ^~~~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200220062318.69299-1-keescook@chromium.org
Reviewed-by: Juergen Gross <jgross@suse.com>
[boris: made @which an 'unsigned int']
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
xen_maybe_preempt_hcall() is called from the exception entry point
xen_do_hypervisor_callback with interrupts disabled.

_cond_resched() evades the might_sleep() check in cond_resched() which
would have caught that and schedule_debug() unfortunately lacks a check
for irqs_disabled().

Enable interrupts around the call and use cond_resched() to catch future
issues.

Fixes: fdfd811 ("x86/xen: allow privcmd hypercalls to be preempted")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/878skypjrh.fsf@nanos.tec.linutronix.de
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
…l/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.6. This is two weeks worth as I was out
  sick last week:

   - Three fixes for the recently added VMAP_STACK on 32-bit.

   - Three fixes related to hugepages on 8xx (32-bit).

   - A fix for a bug in our transactional memory handling that could
     lead to a kernel crash if we saw a page fault during signal
     delivery.

   - A fix for a deadlock in our PCI EEH (Enhanced Error Handling) code.

   - A couple of other minor fixes.

  Thanks to: Christophe Leroy, Erhard F, Frederic Barrat, Gustavo Luiz
  Duarte, Larry Finger, Leonardo Bras, Oliver O'Halloran, Sam Bobroff"

* tag 'powerpc-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S
  powerpc/xmon: Fix whitespace handling in getstring()
  powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK
  powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK
  powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK
  powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
  powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss
  powerpc/hugetlb: Fix 8M hugepages on 8xx
  powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size
  powerpc/eeh: Fix deadlock handling dead PHB
…git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "It's all straightforward apart from the changes to mmap()/mremap() in
  relation to their handling of address arguments from userspace with
  non-zero tag bits in the upper byte.

  The change to brk() is necessary to fix a nasty user-visible
  regression in malloc(), but we tightened up mmap() and mremap() at the
  same time because they also allow the user to create virtual aliases
  by accident. It's much less likely than brk() to matter in practice,
  but enforcing the principle of "don't permit the creation of mappings
  using tagged addresses" leads to a straightforward ABI without having
  to worry about the "but what if a crazy program did foo?" aspect of
  things.

  Summary:

   - Fix regression in malloc() caused by ignored address tags in brk()

   - Add missing brackets around argument to untagged_addr() macro

   - Fix clang build when using binutils assembler

   - Fix silly typo in virtual memory map documentation"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()
  docs: arm64: fix trivial spelling enought to enough in memory.rst
  arm64: memory: Add missing brackets to untagged_addr() macro
  arm64: lse: Fix LSE atomics with LLVM
…ux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two small fixes for Xen:

   - a fix to avoid warnings with new gcc

   - a fix for incorrectly disabled interrupts when calling
     _cond_resched()"

* tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Enable interrupts when calling _cond_resched()
  x86/xen: Distribute switch variables for initialization
@pull pull bot added the ⤵️ pull label Feb 22, 2020
@pull pull bot merged commit 54dedb5 into antiscam-ico:master Feb 22, 2020
pull bot pushed a commit that referenced this pull request Oct 30, 2020
I got the following lockdep splat with tree locks converted to rwsem
patches on btrfs/104:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0+ #102 Not tainted
  ------------------------------------------------------
  btrfs-cleaner/903 is trying to acquire lock:
  ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170

  but task is already holding lock:
  ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
	 down_read+0x40/0x130
	 caching_thread+0x53/0x5a0
	 btrfs_work_helper+0xfa/0x520
	 process_one_work+0x238/0x540
	 worker_thread+0x55/0x3c0
	 kthread+0x13a/0x150
	 ret_from_fork+0x1f/0x30

  -> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7b0
	 btrfs_cache_block_group+0x1e0/0x510
	 find_free_extent+0xb6e/0x12f0
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb1/0x330
	 alloc_tree_block_no_bg_flush+0x4f/0x60
	 __btrfs_cow_block+0x11d/0x580
	 btrfs_cow_block+0x10c/0x220
	 commit_cowonly_roots+0x47/0x2e0
	 btrfs_commit_transaction+0x595/0xbd0
	 sync_filesystem+0x74/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20
	 deactivate_locked_super+0x36/0xa0
	 cleanup_mnt+0x12d/0x190
	 task_work_run+0x5c/0xa0
	 exit_to_user_mode_prepare+0x1df/0x200
	 syscall_exit_to_user_mode+0x54/0x280
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&space_info->groups_sem){++++}-{3:3}:
	 down_read+0x40/0x130
	 find_free_extent+0x2ed/0x12f0
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb1/0x330
	 alloc_tree_block_no_bg_flush+0x4f/0x60
	 __btrfs_cow_block+0x11d/0x580
	 btrfs_cow_block+0x10c/0x220
	 commit_cowonly_roots+0x47/0x2e0
	 btrfs_commit_transaction+0x595/0xbd0
	 sync_filesystem+0x74/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20
	 deactivate_locked_super+0x36/0xa0
	 cleanup_mnt+0x12d/0x190
	 task_work_run+0x5c/0xa0
	 exit_to_user_mode_prepare+0x1df/0x200
	 syscall_exit_to_user_mode+0x54/0x280
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (btrfs-root-00){++++}-{3:3}:
	 __lock_acquire+0x1167/0x2150
	 lock_acquire+0xb9/0x3d0
	 down_read_nested+0x43/0x130
	 __btrfs_tree_read_lock+0x32/0x170
	 __btrfs_read_lock_root_node+0x3a/0x50
	 btrfs_search_slot+0x614/0x9d0
	 btrfs_find_root+0x35/0x1b0
	 btrfs_read_tree_root+0x61/0x120
	 btrfs_get_root_ref+0x14b/0x600
	 find_parent_nodes+0x3e6/0x1b30
	 btrfs_find_all_roots_safe+0xb4/0x130
	 btrfs_find_all_roots+0x60/0x80
	 btrfs_qgroup_trace_extent_post+0x27/0x40
	 btrfs_add_delayed_data_ref+0x3fd/0x460
	 btrfs_free_extent+0x42/0x100
	 __btrfs_mod_ref+0x1d7/0x2f0
	 walk_up_proc+0x11c/0x400
	 walk_up_tree+0xf0/0x180
	 btrfs_drop_snapshot+0x1c7/0x780
	 btrfs_clean_one_deleted_snapshot+0xfb/0x110
	 cleaner_kthread+0xd4/0x140
	 kthread+0x13a/0x150
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&fs_info->commit_root_sem);
				 lock(&caching_ctl->mutex);
				 lock(&fs_info->commit_root_sem);
    lock(btrfs-root-00);

   *** DEADLOCK ***

  3 locks held by btrfs-cleaner/903:
   #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
   #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
   #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

  stack backtrace:
  CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x8b/0xb0
   check_noncircular+0xcf/0xf0
   __lock_acquire+0x1167/0x2150
   ? __bfs+0x42/0x210
   lock_acquire+0xb9/0x3d0
   ? __btrfs_tree_read_lock+0x32/0x170
   down_read_nested+0x43/0x130
   ? __btrfs_tree_read_lock+0x32/0x170
   __btrfs_tree_read_lock+0x32/0x170
   __btrfs_read_lock_root_node+0x3a/0x50
   btrfs_search_slot+0x614/0x9d0
   ? find_held_lock+0x2b/0x80
   btrfs_find_root+0x35/0x1b0
   ? do_raw_spin_unlock+0x4b/0xa0
   btrfs_read_tree_root+0x61/0x120
   btrfs_get_root_ref+0x14b/0x600
   find_parent_nodes+0x3e6/0x1b30
   btrfs_find_all_roots_safe+0xb4/0x130
   btrfs_find_all_roots+0x60/0x80
   btrfs_qgroup_trace_extent_post+0x27/0x40
   btrfs_add_delayed_data_ref+0x3fd/0x460
   btrfs_free_extent+0x42/0x100
   __btrfs_mod_ref+0x1d7/0x2f0
   walk_up_proc+0x11c/0x400
   walk_up_tree+0xf0/0x180
   btrfs_drop_snapshot+0x1c7/0x780
   ? btrfs_clean_one_deleted_snapshot+0x73/0x110
   btrfs_clean_one_deleted_snapshot+0xfb/0x110
   cleaner_kthread+0xd4/0x140
   ? btrfs_alloc_root+0x50/0x50
   kthread+0x13a/0x150
   ? kthread_create_worker_on_cpu+0x40/0x40
   ret_from_fork+0x1f/0x30
  BTRFS info (device sdb): disk space caching is enabled
  BTRFS info (device sdb): has skinny extents

This happens because qgroups does a backref lookup when we create a
delayed ref.  From here it may have to look up a root from an indirect
ref, which does a normal lookup on the tree_root, which takes the read
lock on the tree_root nodes.

To fix this we need to add a variant for looking up roots that searches
the commit root of the tree_root.  Then when we do the backref search
using the commit root we are sure to not take any locks on the tree_root
nodes.  This gets rid of the lockdep splat when running btrfs/104.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
pull bot pushed a commit that referenced this pull request Jun 28, 2021
Before referencing 'host->data', the driver needs to check whether it is
null pointer, otherwise it will cause a null pointer reference.

This log reveals it:

[   29.355199] BUG: kernel NULL pointer dereference, address:
0000000000000014
[   29.357323] #PF: supervisor write access in kernel mode
[   29.357706] #PF: error_code(0x0002) - not-present page
[   29.358088] PGD 0 P4D 0
[   29.358280] Oops: 0002 [#1] PREEMPT SMP PTI
[   29.358595] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.12.4-
g70e7f0549188-dirty #102
[   29.359164] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[   29.359978] RIP: 0010:via_sdc_isr+0x21f/0x410
[   29.360314] Code: ff ff e8 84 aa d0 fd 66 45 89 7e 28 66 41 f7 c4 00
10 75 56 e8 72 aa d0 fd 66 41 f7 c4 00 c0 74 10 e8 65 aa d0 fd 48 8b 43
18 <c7> 40 14 ac ff ff ff e8 55 aa d0 fd 48 89 df e8 ad fb ff ff e9 77
[   29.361661] RSP: 0018:ffffc90000118e98 EFLAGS: 00010046
[   29.362042] RAX: 0000000000000000 RBX: ffff888107d77880
RCX: 0000000000000000
[   29.362564] RDX: 0000000000000000 RSI: ffffffff835d20bb
RDI: 00000000ffffffff
[   29.363085] RBP: ffffc90000118ed8 R08: 0000000000000001
R09: 0000000000000001
[   29.363604] R10: 0000000000000000 R11: 0000000000000001
R12: 0000000000008600
[   29.364128] R13: ffff888107d779c8 R14: ffffc90009c00200
R15: 0000000000008000
[   29.364651] FS:  0000000000000000(0000) GS:ffff88817bc80000(0000)
knlGS:0000000000000000
[   29.365235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   29.365655] CR2: 0000000000000014 CR3: 0000000005a2e000
CR4: 00000000000006e0
[   29.366170] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[   29.366683] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[   29.367197] Call Trace:
[   29.367381]  <IRQ>
[   29.367537]  __handle_irq_event_percpu+0x53/0x3e0
[   29.367916]  handle_irq_event_percpu+0x35/0x90
[   29.368247]  handle_irq_event+0x39/0x60
[   29.368632]  handle_fasteoi_irq+0xc2/0x1d0
[   29.368950]  __common_interrupt+0x7f/0x150
[   29.369254]  common_interrupt+0xb4/0xd0
[   29.369547]  </IRQ>
[   29.369708]  asm_common_interrupt+0x1e/0x40
[   29.370016] RIP: 0010:native_safe_halt+0x17/0x20
[   29.370360] Code: 07 0f 00 2d db 80 43 00 f4 5d c3 0f 1f 84 00 00 00
00 00 8b 05 c2 37 e5 01 55 48 89 e5 85 c0 7e 07 0f 00 2d bb 80 43 00 fb
f4 <5d> c3 cc cc cc cc cc cc cc 55 48 89 e5 e8 67 53 ff ff 8b 0d f9 91
[   29.371696] RSP: 0018:ffffc9000008fe90 EFLAGS: 00000246
[   29.372079] RAX: 0000000000000000 RBX: 0000000000000002
RCX: 0000000000000000
[   29.372595] RDX: 0000000000000000 RSI: ffffffff854f67a4
RDI: ffffffff85403406
[   29.373122] RBP: ffffc9000008fe90 R08: 0000000000000001
R09: 0000000000000001
[   29.373646] R10: 0000000000000000 R11: 0000000000000001
R12: ffffffff86009188
[   29.374160] R13: 0000000000000000 R14: 0000000000000000
R15: ffff888100258000
[   29.374690]  default_idle+0x9/0x10
[   29.374944]  arch_cpu_idle+0xa/0x10
[   29.375198]  default_idle_call+0x6e/0x250
[   29.375491]  do_idle+0x1f0/0x2d0
[   29.375740]  cpu_startup_entry+0x18/0x20
[   29.376034]  start_secondary+0x11f/0x160
[   29.376328]  secondary_startup_64_no_verify+0xb0/0xbb
[   29.376705] Modules linked in:
[   29.376939] Dumping ftrace buffer:
[   29.377187]    (ftrace buffer empty)
[   29.377460] CR2: 0000000000000014
[   29.377712] ---[ end trace 51a473dffb618c47 ]---
[   29.378056] RIP: 0010:via_sdc_isr+0x21f/0x410
[   29.378380] Code: ff ff e8 84 aa d0 fd 66 45 89 7e 28 66 41 f7 c4 00
10 75 56 e8 72 aa d0 fd 66 41 f7 c4 00 c0 74 10 e8 65 aa d0 fd 48 8b 43
18 <c7> 40 14 ac ff ff ff e8 55 aa d0 fd 48 89 df e8 ad fb ff ff e9 77
[   29.379714] RSP: 0018:ffffc90000118e98 EFLAGS: 00010046
[   29.380098] RAX: 0000000000000000 RBX: ffff888107d77880
RCX: 0000000000000000
[   29.380614] RDX: 0000000000000000 RSI: ffffffff835d20bb
RDI: 00000000ffffffff
[   29.381134] RBP: ffffc90000118ed8 R08: 0000000000000001
R09: 0000000000000001
[   29.381653] R10: 0000000000000000 R11: 0000000000000001
R12: 0000000000008600
[   29.382176] R13: ffff888107d779c8 R14: ffffc90009c00200
R15: 0000000000008000
[   29.382697] FS:  0000000000000000(0000) GS:ffff88817bc80000(0000)
knlGS:0000000000000000
[   29.383277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   29.383697] CR2: 0000000000000014 CR3: 0000000005a2e000
CR4: 00000000000006e0
[   29.384223] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[   29.384736] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[   29.385260] Kernel panic - not syncing: Fatal exception in interrupt
[   29.385882] Dumping ftrace buffer:
[   29.386135]    (ftrace buffer empty)
[   29.386401] Kernel Offset: disabled
[   29.386656] Rebooting in 1 seconds..

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Link: https://lore.kernel.org/r/1622727200-15808-1-git-send-email-zheyuma97@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
pull bot pushed a commit that referenced this pull request Jul 3, 2022
Commit a7259df ("memblock: make memblock_find_in_range method
private") changed the API using which memory is reserved for the pKVM
hypervisor. However, memblock_phys_alloc() differs from the original API in
terms of kmemleak semantics -- the old one didn't report the reserved
regions to kmemleak while the new one does. Unfortunately, when protected
KVM is enabled, all kernel accesses to pKVM-private memory result in a
fatal exception, which can now happen because of kmemleak scans:

$ echo scan > /sys/kernel/debug/kmemleak
[   34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290!
[   34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000
[   34.991813] Kernel panic - not syncing: HYP panic:
[   34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
[   34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
[   34.991813] VCPU:0000000000000000
[   34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102
[   34.994059] Hardware name: linux,dummy-virt (DT)
[   34.994452] Call trace:
[   34.994641]  dump_backtrace.part.0+0xcc/0xe0
[   34.994932]  show_stack+0x18/0x6c
[   34.995094]  dump_stack_lvl+0x68/0x84
[   34.995276]  dump_stack+0x18/0x34
[   34.995484]  panic+0x16c/0x354
[   34.995673]  __hyp_pgtable_total_pages+0x0/0x60
[   34.995933]  scan_block+0x74/0x12c
[   34.996129]  scan_gray_list+0xd8/0x19c
[   34.996332]  kmemleak_scan+0x2c8/0x580
[   34.996535]  kmemleak_write+0x340/0x4a0
[   34.996744]  full_proxy_write+0x60/0xbc
[   34.996967]  vfs_write+0xc4/0x2b0
[   34.997136]  ksys_write+0x68/0xf4
[   34.997311]  __arm64_sys_write+0x20/0x2c
[   34.997532]  invoke_syscall+0x48/0x114
[   34.997779]  el0_svc_common.constprop.0+0x44/0xec
[   34.998029]  do_el0_svc+0x2c/0xc0
[   34.998205]  el0_svc+0x2c/0x84
[   34.998421]  el0t_64_sync_handler+0xf4/0x100
[   34.998653]  el0t_64_sync+0x18c/0x190
[   34.999252] SMP: stopping secondary CPUs
[   35.000034] Kernel Offset: disabled
[   35.000261] CPU features: 0x800,00007831,00001086
[   35.000642] Memory Limit: none
[   35.001329] ---[ end Kernel panic - not syncing: HYP panic:
[   35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800
[   35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000
[   35.001329] VCPU:0000000000000000 ]---

Fix this by explicitly excluding the hypervisor's memory pool from
kmemleak like we already do for the hyp BSS.

Cc: Mike Rapoport <rppt@kernel.org>
Fixes: a7259df ("memblock: make memblock_find_in_range method private")
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220616161135.3997786-1-qperret@google.com
pull bot pushed a commit that referenced this pull request Feb 20, 2025
After this patch:

 #102/1   flow_dissector_classification/ipv4:OK
 #102/2   flow_dissector_classification/ipv4_continue_dissect:OK
 #102/3   flow_dissector_classification/ipip:OK
 #102/4   flow_dissector_classification/gre:OK
 #102/5   flow_dissector_classification/port_range:OK
 #102/6   flow_dissector_classification/ipv6:OK
 #102     flow_dissector_classification:OK
 Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250218043210.732959-5-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pull bot pushed a commit that referenced this pull request Mar 27, 2025
When MPOA_cache_impos_rcvd() receives the msg, it can trigger
Null Pointer Dereference Vulnerability if both entry and
holding_time are NULL. Because there is only for the situation
where entry is NULL and holding_time exists, it can be passed
when both entry and holding_time are NULL. If these are NULL,
the entry will be passd to eg_cache_put() as parameter and
it is referenced by entry->use code in it.

kasan log:

[    3.316691] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006:I
[    3.317568] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[    3.318188] CPU: 3 UID: 0 PID: 79 Comm: ex Not tainted 6.14.0-rc2 #102
[    3.318601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[    3.319298] RIP: 0010:eg_cache_remove_entry+0xa5/0x470
[    3.319677] Code: c1 f7 6e fd 48 c7 c7 00 7e 38 b2 e8 95 64 54 fd 48 c7 c7 40 7e 38 b2 48 89 ee e80
[    3.321220] RSP: 0018:ffff88800583f8a8 EFLAGS: 00010006
[    3.321596] RAX: 0000000000000006 RBX: ffff888005989000 RCX: ffffffffaecc2d8e
[    3.322112] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000030
[    3.322643] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6558b88
[    3.323181] R10: 0000000000000003 R11: 203a207972746e65 R12: 1ffff11000b07f15
[    3.323707] R13: dffffc0000000000 R14: ffff888005989000 R15: ffff888005989068
[    3.324185] FS:  000000001b6313c0(0000) GS:ffff88806d380000(0000) knlGS:0000000000000000
[    3.325042] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.325545] CR2: 00000000004b4b40 CR3: 000000000248e000 CR4: 00000000000006f0
[    3.326430] Call Trace:
[    3.326725]  <TASK>
[    3.326927]  ? die_addr+0x3c/0xa0
[    3.327330]  ? exc_general_protection+0x161/0x2a0
[    3.327662]  ? asm_exc_general_protection+0x26/0x30
[    3.328214]  ? vprintk_emit+0x15e/0x420
[    3.328543]  ? eg_cache_remove_entry+0xa5/0x470
[    3.328910]  ? eg_cache_remove_entry+0x9a/0x470
[    3.329294]  ? __pfx_eg_cache_remove_entry+0x10/0x10
[    3.329664]  ? console_unlock+0x107/0x1d0
[    3.329946]  ? __pfx_console_unlock+0x10/0x10
[    3.330283]  ? do_syscall_64+0xa6/0x1a0
[    3.330584]  ? entry_SYSCALL_64_after_hwframe+0x47/0x7f
[    3.331090]  ? __pfx_prb_read_valid+0x10/0x10
[    3.331395]  ? down_trylock+0x52/0x80
[    3.331703]  ? vprintk_emit+0x15e/0x420
[    3.331986]  ? __pfx_vprintk_emit+0x10/0x10
[    3.332279]  ? down_trylock+0x52/0x80
[    3.332527]  ? _printk+0xbf/0x100
[    3.332762]  ? __pfx__printk+0x10/0x10
[    3.333007]  ? _raw_write_lock_irq+0x81/0xe0
[    3.333284]  ? __pfx__raw_write_lock_irq+0x10/0x10
[    3.333614]  msg_from_mpoad+0x1185/0x2750
[    3.333893]  ? __build_skb_around+0x27b/0x3a0
[    3.334183]  ? __pfx_msg_from_mpoad+0x10/0x10
[    3.334501]  ? __alloc_skb+0x1c0/0x310
[    3.334809]  ? __pfx___alloc_skb+0x10/0x10
[    3.335283]  ? _raw_spin_lock+0xe0/0xe0
[    3.335632]  ? finish_wait+0x8d/0x1e0
[    3.335975]  vcc_sendmsg+0x684/0xba0
[    3.336250]  ? __pfx_vcc_sendmsg+0x10/0x10
[    3.336587]  ? __pfx_autoremove_wake_function+0x10/0x10
[    3.337056]  ? fdget+0x176/0x3e0
[    3.337348]  __sys_sendto+0x4a2/0x510
[    3.337663]  ? __pfx___sys_sendto+0x10/0x10
[    3.337969]  ? ioctl_has_perm.constprop.0.isra.0+0x284/0x400
[    3.338364]  ? sock_ioctl+0x1bb/0x5a0
[    3.338653]  ? __rseq_handle_notify_resume+0x825/0xd20
[    3.339017]  ? __pfx_sock_ioctl+0x10/0x10
[    3.339316]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
[    3.339727]  ? selinux_file_ioctl+0xa4/0x260
[    3.340166]  __x64_sys_sendto+0xe0/0x1c0
[    3.340526]  ? syscall_exit_to_user_mode+0x123/0x140
[    3.340898]  do_syscall_64+0xa6/0x1a0
[    3.341170]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[    3.341533] RIP: 0033:0x44a380
[    3.341757] Code: 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c00
[    3.343078] RSP: 002b:00007ffc1d404098 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[    3.343631] RAX: ffffffffffffffda RBX: 00007ffc1d404458 RCX: 000000000044a380
[    3.344306] RDX: 000000000000019c RSI: 00007ffc1d4040b0 RDI: 0000000000000003
[    3.344833] RBP: 00007ffc1d404260 R08: 0000000000000000 R09: 0000000000000000
[    3.345381] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[    3.346015] R13: 00007ffc1d404448 R14: 00000000004c17d0 R15: 0000000000000001
[    3.346503]  </TASK>
[    3.346679] Modules linked in:
[    3.346956] ---[ end trace 0000000000000000 ]---
[    3.347315] RIP: 0010:eg_cache_remove_entry+0xa5/0x470
[    3.347737] Code: c1 f7 6e fd 48 c7 c7 00 7e 38 b2 e8 95 64 54 fd 48 c7 c7 40 7e 38 b2 48 89 ee e80
[    3.349157] RSP: 0018:ffff88800583f8a8 EFLAGS: 00010006
[    3.349517] RAX: 0000000000000006 RBX: ffff888005989000 RCX: ffffffffaecc2d8e
[    3.350103] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000030
[    3.350610] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6558b88
[    3.351246] R10: 0000000000000003 R11: 203a207972746e65 R12: 1ffff11000b07f15
[    3.351785] R13: dffffc0000000000 R14: ffff888005989000 R15: ffff888005989068
[    3.352404] FS:  000000001b6313c0(0000) GS:ffff88806d380000(0000) knlGS:0000000000000000
[    3.353099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.353544] CR2: 00000000004b4b40 CR3: 000000000248e000 CR4: 00000000000006f0
[    3.354072] note: ex[79] exited with irqs disabled
[    3.354458] note: ex[79] exited with preempt_count 1

Signed-off-by: Minjoong Kim <pwn9uin@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250322105200.14981-1-pwn9uin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pull bot pushed a commit that referenced this pull request Aug 21, 2025
Function kvm_io_bus_read() is called in function send_ipi_data(), buffer
size of parameter *val should be at least 8 bytes. Since some emulation
functions like loongarch_ipi_readl() and kvm_eiointc_read() will write
the buffer *val with 8 bytes signed extension regardless parameter len.

Otherwise there will be buffer overflow issue when CONFIG_STACKPROTECTOR
is enabled. The bug report is shown as follows:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: send_ipi_data+0x194/0x1a0 [kvm]
CPU: 11 UID: 107 PID: 2692 Comm: CPU 0/KVM Not tainted 6.17.0-rc1+ #102 PREEMPT(full)
Stack : 9000000005901568 0000000000000000 9000000003af371c 900000013c68c000
        900000013c68f850 900000013c68f858 0000000000000000 900000013c68f998
        900000013c68f990 900000013c68f990 900000013c68f6c0 fffffffffffdb058
        fffffffffffdb0e0 900000013c68f858 911e1d4d39cf0ec2 9000000105657a00
        0000000000000001 fffffffffffffffe 0000000000000578 282049464555206e
        6f73676e6f6f4c20 0000000000000001 00000000086b4000 0000000000000000
        0000000000000000 0000000000000000 9000000005709968 90000000058f9000
        900000013c68fa68 900000013c68fab4 90000000029279f0 900000010153f940
        900000010001f360 0000000000000000 9000000003af3734 000000004390000c
        00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
        ...
Call Trace:
[<9000000003af3734>] show_stack+0x5c/0x180
[<9000000003aed168>] dump_stack_lvl+0x6c/0x9c
[<9000000003ad0ab0>] vpanic+0x108/0x2c4
[<9000000003ad0ca8>] panic+0x3c/0x40
[<9000000004eb0a1c>] __stack_chk_fail+0x14/0x18
[<ffff8000023473f8>] send_ipi_data+0x190/0x1a0 [kvm]
[<ffff8000023313e4>] __kvm_io_bus_write+0xa4/0xe8 [kvm]
[<ffff80000233147c>] kvm_io_bus_write+0x54/0x90 [kvm]
[<ffff80000233f9f8>] kvm_emu_iocsr+0x180/0x310 [kvm]
[<ffff80000233fe08>] kvm_handle_gspr+0x280/0x478 [kvm]
[<ffff8000023443e8>] kvm_handle_exit+0xc0/0x130 [kvm]

Cc: stable@vger.kernel.org
Fixes: daee2f9 ("LoongArch: KVM: Add IPI read and write function")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants