Skip to content

Conversation

PlaidCat
Copy link
Collaborator

@PlaidCat PlaidCat commented Sep 4, 2025

General Process:

Checking Rebuild Commits for Potentially missing commits:

kernel-4.18.0-553.72.1.el8_10

[jmaple@devbox kernel-src-tree]$ cat ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567023
Number of commits in rpm: 61
Number of commits matched with upstream: 54 (88.52%)
Number of commits in upstream but not in rpm: 566969
Number of commits NOT found in upstream: 7 (11.48%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.72.1.el8_10 for kernel-4.18.0-553.72.1.el8_10
Clean Cherry Picks: 35 (64.81%)
Empty Cherry Picks: 19 (35.19%)
_______________________________

__EMPTY COMMITS__________________________
1d6123102e9fbedc8d25bf4731da6d513173e49e Bluetooth: hci_core: Fix use-after-free in vhci_flush()
f80d882edcf242d0256d9e51b09d5fb7a3a0d3b4 gfs2: Get rid of gfs2_glock_queue_put in signal_our_withdraw
deb016c1669002e48c431d6fd32ea1c20ef41756 gfs2: No more self recovery
ae9f3bd8259a0a8f67be2420e66bb05fbb95af48 gfs2: replace sd_aspace with sd_inode
df207de9d9e7a4d92f8567e2c539d9c8c12fd99d udp: Fix memory accounting leak.
180dccb0dba4f5e84a4a70c1be1d34cbb6528b32 blk-mq: fix tag_get wait task can't be awakened
10825410b956dc1ed8c5fbc8bbedaffdadde7f20 blk-mq: Fix wrong wakeup batch configuration which will cause hang
4f8126bb2308066b877859e4b5923ffb54143630 sbitmap: Use single per-bitmap counting to wake up queued tags
976570b4ecd30d3ec6e1b0910da8e5edc591f2b6 sbitmap: Advance the queue index before waking up a queue
26edb30dd1c0c9be11fa676b4f330ada7b794ba6 sbitmap: Try each queue to wake up at least one waiter
b5fcf7871acb7f9a3a8ed341a68bd86aba3e254a sbitmap: correct wake_batch recalculation to avoid potential IO hung
106397376c0369fcc01c58dd189ff925a2724a57 sbitmap: fix batching wakeup
72d04bdcf3f7d7e07d82f9757946f68802a7270a sbitmap: fix io hung due to race on sbitmap_word::cleared
65f666c6203600053478ce8e34a1db269a8701c9 lib/sbitmap: define swap_lock as raw_spinlock_t
4f1731df60f9033669f024d06ae26a6301260b55 blk-mq: fix potential io hang by wrong 'wake_batch'
b313a8c835516bdda85025500be866ac8a74e022 block: Fix lockdep warning in blk_mq_mark_tag_wait
7dde67f2a23ac11962b17128d9a71698430ced95 efi/x86: Support builtin command line
055042bedd4e0af8186266b189b81e73e708b82b efi/libstub: Check return value of efi_parse_options
15aa8fb852f995dd234a57f12dfb989044968bb6 x86/efistub: Omit physical KASLR when memory reservations exist

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
aacraid: fix a buffer overflow

Build

[jmaple@devbox code]$ egrep -B 5 -A 5 "\[TIMER\]|^Starting Build" $(ls -t kbuild* | head -n1)
/mnt/code/kernel-src-tree
Running make mrproper...
  CLEAN   scripts/basic
  CLEAN   scripts/kconfig
[TIMER]{MRPROPER}: 5s
x86_64 architecture detected, copying config
'configs/kernel-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-06a1b8b22bc2"
Making olddefconfig
--
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
--
  LD [M]  sound/usb/usx2y/snd-usb-usx2y.ko
  LD [M]  sound/virtio/virtio_snd.ko
  LD [M]  sound/x86/snd-hdmi-lpe-audio.ko
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 1537s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL arch/x86/crypto/camellia-x86_64.ko
--
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/xen/snd_xen_front.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-06a1b8b22bc2+
[TIMER]{MODULES}: 12s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-06a1b8b22bc2+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 17s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-f349c3d9c12c+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 5s
[TIMER]{BUILD}: 1537s
[TIMER]{MODULES}: 12s
[TIMER]{INSTALL}: 17s
[TIMER]{TOTAL} 1575s
Rebooting in 10 seconds

KselfTest

[jmaple@devbox code]$ ls -rt kselftest.* | tail -n4 | while read line; do echo $line; grep '^ok ' $line | wc -l ; done
kselftest.4.18.0-rocky8_10_rebuild-e252413ceae1+.log
206
kselftest.4.18.0-rocky8_10_rebuild-fe693a356d6f+.log
206
kselftest.4.18.0-rocky8_10_rebuild-f349c3d9c12c+.log
206
kselftest.4.18.0-rocky8_10_rebuild-06a1b8b22bc2+.log
206

jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Niklas Schnelle <schnelle@linux.ibm.com>
commit 4553792

The error event information for PCI error events contains a function
handle for the respective function. This handle is generally captured at
the time the error event was recorded. Due to delays in processing or
cascading issues, it may happen that during firmware recovery multiple
events are generated. When processing these events in order Linux may
already have recovered an affected function making the event information
stale. Fix this by doing an unconditional CLP List PCI function
retrieving the current function handle with the zdev->state_lock held
and ignoring the event if its function handle is stale.

	Cc: stable@vger.kernel.org
Fixes: 4cdf2f4 ("s390/pci: implement minimal PCI error recovery")
	Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
	Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
	Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
	Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
	Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
(cherry picked from commit 4553792)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Niklas Schnelle <schnelle@linux.ibm.com>
commit b97a797

If a device is disabled unblocking load/store on its own is not useful
as a full re-enable of the function is necessary anyway. Note that SCLP
Write Event Data Action Qualifier 0 (Reset) leaves the device disabled
and triggers this case unless the driver already requests a reset.

	Cc: stable@vger.kernel.org
Fixes: 4cdf2f4 ("s390/pci: implement minimal PCI error recovery")
	Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
	Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
	Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
(cherry picked from commit b97a797)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-22097
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author José Expósito <jose.exposito89@gmail.com>
commit ed15511

If the driver initialization fails, the vkms_exit() function might
access an uninitialized or freed default_config pointer and it might
double free it.

Fix both possible errors by initializing default_config only when the
driver initialization succeeded.

	Reported-by: Louis Chauvet <louis.chauvet@bootlin.com>
Closes: https://lore.kernel.org/all/Z5uDHcCmAwiTsGte@louis-chauvet-laptop/
Fixes: 2df7af9 ("drm/vkms: Add vkms_config type")
	Signed-off-by: José Expósito <jose.exposito89@gmail.com>
	Reviewed-by: Thomas Zimmermann <tzimmremann@suse.de>
	Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250212084912.3196-1-jose.exposito89@gmail.com
	Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
(cherry picked from commit ed15511)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Cong Wang <xiyou.wangcong@gmail.com>
commit a7a15f3

est_qlen_notify() deletes its class from its active list with
list_del() when qlen is 0, therefore, it is not idempotent and
not friendly to its callers, like fq_codel_dequeue().

Let's make it idempotent to ease qdisc_tree_reduce_backlog() callers'
life. Also change other list_del()'s to list_del_init() just to be
extra safe.

	Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
	Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250403211033.166059-6-xiyou.wangcong@gmail.com
	Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
	Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit a7a15f3)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-37914
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Victor Nogueira <victor@mojatatu.com>
commit 1a6d0c0

As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of ets, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.

In addition to checking for qlen being zero, this patch checks whether
the class was already added to the active_list (cl_is_active) before
doing the addition to cater for the reentrant case.

[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/

Fixes: 37d9cf1 ("sched: Fix detection of empty queues in child qdiscs")
	Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
	Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-4-victor@mojatatu.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 1a6d0c0)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-38250
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kuniyuki Iwashima <kuniyu@google.com>
commit 1d61231
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/1d612310.failed

syzbot reported use-after-free in vhci_flush() without repro. [0]

From the splat, a thread close()d a vhci file descriptor while
its device was being used by iotcl() on another thread.

Once the last fd refcnt is released, vhci_release() calls
hci_unregister_dev(), hci_free_dev(), and kfree() for struct
vhci_data, which is set to hci_dev->dev->driver_data.

The problem is that there is no synchronisation after unlinking
hdev from hci_dev_list in hci_unregister_dev().  There might be
another thread still accessing the hdev which was fetched before
the unlink operation.

We can use SRCU for such synchronisation.

Let's run hci_dev_reset() under SRCU and wait for its completion
in hci_unregister_dev().

Another option would be to restore hci_dev->destruct(), which was
removed in commit 587ae08 ("Bluetooth: Remove unused
hci-destruct cb").  However, this would not be a good solution, as
we should not run hci_unregister_dev() while there are in-flight
ioctl() requests, which could lead to another data-race KCSAN splat.

Note that other drivers seem to have the same problem, for exmaple,
virtbt_remove().

[0]:
BUG: KASAN: slab-use-after-free in skb_queue_empty_lockless include/linux/skbuff.h:1891 [inline]
BUG: KASAN: slab-use-after-free in skb_queue_purge_reason+0x99/0x360 net/core/skbuff.c:3937
Read of size 8 at addr ffff88807cb8d858 by task syz.1.219/6718

CPU: 1 UID: 0 PID: 6718 Comm: syz.1.219 Not tainted 6.16.0-rc1-syzkaller-00196-g08207f42d3ff #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:408 [inline]
 print_report+0xd2/0x2b0 mm/kasan/report.c:521
 kasan_report+0x118/0x150 mm/kasan/report.c:634
 skb_queue_empty_lockless include/linux/skbuff.h:1891 [inline]
 skb_queue_purge_reason+0x99/0x360 net/core/skbuff.c:3937
 skb_queue_purge include/linux/skbuff.h:3368 [inline]
 vhci_flush+0x44/0x50 drivers/bluetooth/hci_vhci.c:69
 hci_dev_do_reset net/bluetooth/hci_core.c:552 [inline]
 hci_dev_reset+0x420/0x5c0 net/bluetooth/hci_core.c:592
 sock_do_ioctl+0xd9/0x300 net/socket.c:1190
 sock_ioctl+0x576/0x790 net/socket.c:1311
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcf5b98e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcf5c7b9038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fcf5bbb6160 RCX: 00007fcf5b98e929
RDX: 0000000000000000 RSI: 00000000400448cb RDI: 0000000000000009
RBP: 00007fcf5ba10b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fcf5bbb6160 R15: 00007ffd6353d528
 </TASK>

Allocated by task 6535:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:394
 kasan_kmalloc include/linux/kasan.h:260 [inline]
 __kmalloc_cache_noprof+0x230/0x3d0 mm/slub.c:4359
 kmalloc_noprof include/linux/slab.h:905 [inline]
 kzalloc_noprof include/linux/slab.h:1039 [inline]
 vhci_open+0x57/0x360 drivers/bluetooth/hci_vhci.c:635
 misc_open+0x2bc/0x330 drivers/char/misc.c:161
 chrdev_open+0x4c9/0x5e0 fs/char_dev.c:414
 do_dentry_open+0xdf0/0x1970 fs/open.c:964
 vfs_open+0x3b/0x340 fs/open.c:1094
 do_open fs/namei.c:3887 [inline]
 path_openat+0x2ee5/0x3830 fs/namei.c:4046
 do_filp_open+0x1fa/0x410 fs/namei.c:4073
 do_sys_openat2+0x121/0x1c0 fs/open.c:1437
 do_sys_open fs/open.c:1452 [inline]
 __do_sys_openat fs/open.c:1468 [inline]
 __se_sys_openat fs/open.c:1463 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1463
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 6535:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:68
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:576
 poison_slab_object mm/kasan/common.c:247 [inline]
 __kasan_slab_free+0x62/0x70 mm/kasan/common.c:264
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2381 [inline]
 slab_free mm/slub.c:4643 [inline]
 kfree+0x18e/0x440 mm/slub.c:4842
 vhci_release+0xbc/0xd0 drivers/bluetooth/hci_vhci.c:671
 __fput+0x44c/0xa70 fs/file_table.c:465
 task_work_run+0x1d1/0x260 kernel/task_work.c:227
 exit_task_work include/linux/task_work.h:40 [inline]
 do_exit+0x6ad/0x22e0 kernel/exit.c:955
 do_group_exit+0x21c/0x2d0 kernel/exit.c:1104
 __do_sys_exit_group kernel/exit.c:1115 [inline]
 __se_sys_exit_group kernel/exit.c:1113 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1113
 x64_sys_call+0x21ba/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88807cb8d800
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 88 bytes inside of
 freed 1024-byte region [ffff88807cb8d800, ffff88807cb8dc00)

Fixes: bf18c71 ("Bluetooth: vhci: Free driver_data on file release")
	Reported-by: syzbot+2faa4825e556199361f9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f62d64848fc4c7c30cd6
	Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
	Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
	Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
(cherry picked from commit 1d61231)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	net/bluetooth/hci_core.c
jira LE-4066
cve CVE-2025-38380
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Michael J. Ruhl <michael.j.ruhl@intel.com>
commit 3d30048

The i2c_dw_xfer_init() function requires msgs and msg_write_idx from the
dev context to be initialized.

amd_i2c_dw_xfer_quirk() inits msgs and msgs_num, but not msg_write_idx.

This could allow an out of bounds access (of msgs).

Initialize msg_write_idx before calling i2c_dw_xfer_init().

	Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes: 17631e8 ("i2c: designware: Add driver support for AMD NAVI GPU")
	Cc: <stable@vger.kernel.org> # v5.13+
	Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
	Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250627143511.489570-1-michael.j.ruhl@intel.com
(cherry picked from commit 3d30048)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Andreas Gruenbacher <agruenba@redhat.com>
commit e3da6be

Function gfs2_withdraw() tries to synchronize concurrent callers by
atomically setting the SDF_WITHDRAWN flag in the first caller, setting
the SDF_WITHDRAW_IN_PROG flag to indicate that a withdraw is in
progress, performing the actual withdraw, and clearing the
SDF_WITHDRAW_IN_PROG flag when done.  All other callers wait for the
SDF_WITHDRAW_IN_PROG flag to be cleared before returning.

This leaves a small window in which callers can find the SDF_WITHDRAWN
flag set before the SDF_WITHDRAW_IN_PROG flag has been set, causing them
to return prematurely, before the withdraw has been completed.

Fix that by setting the SDF_WITHDRAWN and SDF_WITHDRAW_IN_PROG flags
atomically.

	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit e3da6be)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Andreas Gruenbacher <agruenba@redhat.com>
commit f80d882
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/f80d882e.failed

In function signal_our_withdraw(), we are calling gfs2_glock_queue_put()
in a context in which we are actually allowed to sleep, so replace that
with a simple call to gfs2_glock_put().

	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit f80d882)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	fs/gfs2/util.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Andreas Gruenbacher <agruenba@redhat.com>
commit deb016c
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/deb016c1.failed

When a node withdraws and it turns out that it is the only node that has
the filesystem mounted, gfs2 currently tries to replay the local journal
to bring the filesystem back into a consistent state.  Not only is that
a very bad idea, it has also never worked because gfs2_recover_func()
will refuse to do anything during a withdraw.

However, before even getting to this point, gfs2_recover_func()
dereferences sdp->sd_jdesc->jd_inode.  This was a use-after-free before
commit 04133b6 ("gfs2: Prevent double iput for journal on error")
and is a NULL pointer dereference since then.

Simply get rid of self recovery to fix that.

Fixes: 601ef0d ("gfs2: Force withdraw to replay journals and wait for it to finish")
	Reported-by: Chunjie Zhu <chunjie.zhu@cloud.com>
	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit deb016c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	fs/gfs2/util.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Andreas Gruenbacher <agruenba@redhat.com>
commit 9e88899

inode_to_wb() is used also for filesystems that don't support cgroup
writeback.  For these filesystems inode->i_wb is stable during the
lifetime of the inode (it points to bdi->wb) and there's no need to hold
locks protecting the inode->i_wb dereference.  Improve the warning in
inode_to_wb() to not trigger for these filesystems.

Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com
Fixes: aaa2cac ("writeback: add lockdep annotation to inode_to_wb()")
	Signed-off-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
	Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9e88899)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Andreas Gruenbacher <agruenba@redhat.com>
commit ae9f3bd
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/ae9f3bd8.failed

Currently, sdp->sd_aspace and the per-inode metadata address spaces use
sb->s_bdev->bd_mapping->host as their ->host; folios in those address
spaces will thus appear to be on bdev rather than on gfs2 filesystems.
This is a problem because gfs2 doesn't support cgroup writeback
(SB_I_CGROUPWB), but bdev does.

Fix that by using a "dummy" gfs2 inode as ->host in those address
spaces.  When coming from a folio, folio->mapping->host->i_sb will then
be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in
sb->s_iflags.

Based on a previous version from Bob Peterson from several years ago.
Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure
this out.

Fixes: aaa2cac ("writeback: add lockdep annotation to inode_to_wb()")
	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit ae9f3bd)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	fs/gfs2/glock.c
#	fs/gfs2/ops_fstype.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Chen Ni <nichen@iscas.ac.cn>
commit 4023c3c

free_percpu() checks for NULL pointers internally.
Remove unneeded NULL check here.

	Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 4023c3c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Andrew Price <anprice@redhat.com>
commit 9126d27

When gfs2_sys_fs_add() fails, it sets sb->s_fs_info to NULL on its error
path (see commit 0d51521 ("GFS2: Add kobject release method")).
The intention seems to be to prevent dereferencing sb->s_fs_info once
the object pointed to has been deallocated, but that would be better
achieved by setting the pointer to NULL in free_sbd().

As a consequence, when the call to gfs2_sys_fs_add() fails in
gfs2_fill_super(), sdp = GFS2_SB(inode) will evaluate to NULL in iput()
-> gfs2_drop_inode(), and accessing sdp->sd_flags will be a NULL pointer
dereference.

Fix that by only setting sb->s_fs_info to NULL when actually freeing the
object pointed to in free_sbd().

Fixes: ae9f3bd ("gfs2: replace sd_aspace with sd_inode")
	Reported-by: syzbot+b12826218502df019f9d@syzkaller.appspotmail.com
	Signed-off-by: Andrew Price <anprice@redhat.com>
	Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
(cherry picked from commit 9126d27)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-38200
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kyungwook Boo <bookyungwook@gmail.com>
commit 015bac5

When the device sends a specific input, an integer underflow can occur, leading
to MMIO write access to an invalid page.

Prevent the integer underflow by changing the type of related variables.

	Signed-off-by: Kyungwook Boo <bookyungwook@gmail.com>
Link: https://lore.kernel.org/lkml/ffc91764-1142-4ba2-91b6-8c773f6f7095@gmail.com/T/
	Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
	Reviewed-by: Simon Horman <horms@kernel.org>
	Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
	Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
	Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
(cherry picked from commit 015bac5)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-22058
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kuniyuki Iwashima <kuniyu@amazon.com>
commit df207de
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/df207de9.failed

Matt Dowling reported a weird UDP memory usage issue.

Under normal operation, the UDP memory usage reported in /proc/net/sockstat
remains close to zero.  However, it occasionally spiked to 524,288 pages
and never dropped.  Moreover, the value doubled when the application was
terminated.  Finally, it caused intermittent packet drops.

We can reproduce the issue with the script below [0]:

  1. /proc/net/sockstat reports 0 pages

    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 1 mem 0

  2. Run the script till the report reaches 524,288

    # python3 test.py & sleep 5
    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 3 mem 524288  <-- (INT_MAX + 1) >> PAGE_SHIFT

  3. Kill the socket and confirm the number never drops

    # pkill python3 && sleep 5
    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 1 mem 524288

  4. (necessary since v6.0) Trigger proto_memory_pcpu_drain()

    # python3 test.py & sleep 1 && pkill python3

  5. The number doubles

    # cat /proc/net/sockstat | grep UDP:
    UDP: inuse 1 mem 1048577

The application set INT_MAX to SO_RCVBUF, which triggered an integer
overflow in udp_rmem_release().

When a socket is close()d, udp_destruct_common() purges its receive
queue and sums up skb->truesize in the queue.  This total is calculated
and stored in a local unsigned integer variable.

The total size is then passed to udp_rmem_release() to adjust memory
accounting.  However, because the function takes a signed integer
argument, the total size can wrap around, causing an overflow.

Then, the released amount is calculated as follows:

  1) Add size to sk->sk_forward_alloc.
  2) Round down sk->sk_forward_alloc to the nearest lower multiple of
      PAGE_SIZE and assign it to amount.
  3) Subtract amount from sk->sk_forward_alloc.
  4) Pass amount >> PAGE_SHIFT to __sk_mem_reduce_allocated().

When the issue occurred, the total in udp_destruct_common() was 2147484480
(INT_MAX + 833), which was cast to -2147482816 in udp_rmem_release().

At 1) sk->sk_forward_alloc is changed from 3264 to -2147479552, and
2) sets -2147479552 to amount.  3) reverts the wraparound, so we don't
see a warning in inet_sock_destruct().  However, udp_memory_allocated
ends up doubling at 4).

Since commit 3cd3399 ("net: implement per-cpu reserves for
memory_allocated"), memory usage no longer doubles immediately after
a socket is close()d because __sk_mem_reduce_allocated() caches the
amount in udp_memory_per_cpu_fw_alloc.  However, the next time a UDP
socket receives a packet, the subtraction takes effect, causing UDP
memory usage to double.

This issue makes further memory allocation fail once the socket's
sk->sk_rmem_alloc exceeds net.ipv4.udp_rmem_min, resulting in packet
drops.

To prevent this issue, let's use unsigned int for the calculation and
call sk_forward_alloc_add() only once for the small delta.

Note that first_packet_length() also potentially has the same problem.

[0]:
from socket import *

SO_RCVBUFFORCE = 33
INT_MAX = (2 ** 31) - 1

s = socket(AF_INET, SOCK_DGRAM)
s.bind(('', 0))
s.setsockopt(SOL_SOCKET, SO_RCVBUFFORCE, INT_MAX)

c = socket(AF_INET, SOCK_DGRAM)
c.connect(s.getsockname())

data = b'a' * 100

while True:
    c.send(data)

Fixes: f970bd9 ("udp: implement memory accounting helpers")
	Reported-by: Matt Dowling <madowlin@amazon.com>
	Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
	Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250401184501.67377-3-kuniyu@amazon.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit df207de)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	net/ipv4/udp.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Laibin Qiu <qiulaibin@huawei.com>
commit 180dccb
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/180dccb0.failed

In case of shared tags, there might be more than one hctx which
allocates from the same tags, and each hctx is limited to allocate at
most:
        hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U);

tag idle detection is lazy, and may be delayed for 30sec, so there
could be just one real active hctx(queue) but all others are actually
idle and still accounted as active because of the lazy idle detection.
Then if wake_batch is > hctx_max_depth, driver tag allocation may wait
forever on this real active hctx.

Fix this by recalculating wake_batch when inc or dec active_queues.

Fixes: 0d2602c ("blk-mq: improve support for shared tags maps")
	Suggested-by: Ming Lei <ming.lei@redhat.com>
	Suggested-by: John Garry <john.garry@huawei.com>
	Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
	Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220113025536.1479653-1-qiulaibin@huawei.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 180dccb)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	block/blk-mq-tag.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Laibin Qiu <qiulaibin@huawei.com>
commit 1082541
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/10825410.failed

Commit 180dccb ("blk-mq: fix tag_get wait task can't be
awakened") will recalculate wake_batch when incrementing or decrementing
active_queues to avoid wake_batch > hctx_max_depth. At the same time, in
order to not affect performance as much as possible, the minimum wakeup
batch is set to 4. But when the QD is small (such as QD=1), if inc or dec
active_queues increases wakeup batch, that can lead to a hang:

Fix this problem with the following strategies:
QD          :  >= 32 | < 32
---------------------------------
wakeup batch:  8~4   | 3~1

Fixes: 180dccb ("blk-mq: fix tag_get wait task can't be awakened")
Link: https://lore.kernel.org/linux-block/78cafe94-a787-e006-8851-69906f0c2128@huawei.com/T/#t
	Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
	Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
	Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Link: https://lore.kernel.org/r/20220127100047.1763746-1-qiulaibin@huawei.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 1082541)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	lib/sbitmap.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Gabriel Krisman Bertazi <krisman@suse.de>
commit 4f8126b
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/4f8126bb.failed

sbitmap suffers from code complexity, as demonstrated by recent fixes,
and eventual lost wake ups on nested I/O completion.  The later happens,
from what I understand, due to the non-atomic nature of the updates to
wait_cnt, which needs to be subtracted and eventually reset when equal
to zero.  This two step process can eventually miss an update when a
nested completion happens to interrupt the CPU in between the wait_cnt
updates.  This is very hard to fix, as shown by the recent changes to
this code.

The code complexity arises mostly from the corner cases to avoid missed
wakes in this scenario.  In addition, the handling of wake_batch
recalculation plus the synchronization with sbq_queue_wake_up is
non-trivial.

This patchset implements the idea originally proposed by Jan [1], which
removes the need for the two-step updates of wait_cnt.  This is done by
tracking the number of completions and wakeups in always increasing,
per-bitmap counters.  Instead of having to reset the wait_cnt when it
reaches zero, we simply keep counting, and attempt to wake up N threads
in a single wait queue whenever there is enough space for a batch.
Waking up less than batch_wake shouldn't be a problem, because we
haven't changed the conditions for wake up, and the existing batch
calculation guarantees at least enough remaining completions to wake up
a batch for each queue at any time.

Performance-wise, one should expect very similar performance to the
original algorithm for the case where there is no queueing.  In both the
old algorithm and this implementation, the first thing is to check
ws_active, which bails out if there is no queueing to be managed. In the
new code, we took care to avoid accounting completions and wakeups when
there is no queueing, to not pay the cost of atomic operations
unnecessarily, since it doesn't skew the numbers.

For more interesting cases, where there is queueing, we need to take
into account the cross-communication of the atomic operations.  I've
been benchmarking by running parallel fio jobs against a single hctx
nullb in different hardware queue depth scenarios, and verifying both
IOPS and queueing.

Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
varying only the hardware queue length per test.

queue size 2                 4                 8                 16                 32                 64
6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)

The following is a similar experiment, ran against a nullb with a single
bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
parallel fio jobs operating on the same device

queue size 2 	             4                 8              	16             	    32		       64
6.1-rc2	   1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K) 	6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
patched	   1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K) 	6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)

It has also survived blktests and a 12h-stress run against nullb. I also
ran the code against nvme and a scsi SSD, and I didn't observe
performance regression in those. If there are other tests you think I
should run, please let me know and I will follow up with results.

[1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/

	Cc: Hugh Dickins <hughd@google.com>
	Cc: Keith Busch <kbusch@kernel.org>
	Cc: Liu Song <liusong@linux.alibaba.com>
	Suggested-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 4f8126b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	lib/sbitmap.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Gabriel Krisman Bertazi <krisman@suse.de>
commit 976570b
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/976570b4.failed

When a queue is awaken, the wake_index written by sbq_wake_ptr currently
keeps pointing to the same queue.  On the next wake up, it will thus
retry the same queue, which is unfair to other queues, and can lead to
starvation.  This patch, moves the index update to happen before the
queue is returned, such that it will now try a different queue first on
the next wake up, improving fairness.

Fixes: 4f8126b ("sbitmap: Use single per-bitmap counting to wake up queued tags")
	Reported-by: Jan Kara <jack@suse.cz>
	Reviewed-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 976570b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	lib/sbitmap.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Pavel Begunkov <asml.silence@gmail.com>
commit 016190a

Statements in the loop's body and before it are identical.
Use do-while to not repeat it.

	Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Cc: Linus Torvalds <torvalds@linux-foundation.org>
	Cc: Peter Zijlstra <peterz@infradead.org>
	Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 016190a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Gabriel Krisman Bertazi <krisman@suse.de>
commit ee7dc86

Sbitmap code will need to know how many waiters were actually woken for
its batched wakeups implementation.  Return the number of woken
exclusive waiters from __wake_up() to facilitate that.

	Suggested-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
	Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221115224553.23594-3-krisman@suse.de
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit ee7dc86)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Gabriel Krisman Bertazi <krisman@suse.de>
commit 26edb30
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/26edb30d.failed

Jan reported the new algorithm as merged might be problematic if the
queue being awaken becomes empty between the waitqueue_active inside
sbq_wake_ptr check and the wake up.  If that happens, wake_up_nr will
not wake up any waiter and we loose too many wake ups.  In order to
guarantee progress, we need to wake up at least one waiter here, if
there are any.  This now requires trying to wake up from every queue.

Instead of walking through all the queues with sbq_wake_ptr, this call
moves the wake up inside that function.  In a previous version of the
patch, I found that updating wake_index several times when walking
through queues had a measurable overhead.  This ensures we only update
it once, at the end.

Fixes: 4f8126b ("sbitmap: Use single per-bitmap counting to wake up queued tags")
	Reported-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
	Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 26edb30)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	lib/sbitmap.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Jens Axboe <axboe@kernel.dk>
commit 9672b0d

The block layer tag allocation batching still calls into sbitmap to get
each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
which returns a mask of tags all at once, along with an offset for
those tags.

An example return would be 0xff, where bits 0..7 are set, with
tag_offset == 128. The valid tags in this case would be 128..135.

A batch is specific to an individual sbitmap_map, hence it cannot be
larger than that. The requested number of tags is automatically reduced
to the max that can be satisfied with a single map.

On failure, 0 is returned. Caller should fall back to single tag
allocation at that point/

	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 9672b0d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Ming Lei <ming.lei@redhat.com>
commit 3301bc5

Only the last sbitmap_word can have different depth, and all the others
must have same depth of 1U << sb->shift, so not necessary to store it in
sbitmap_word, and it can be retrieved easily and efficiently by adding
one internal helper of __map_depth(sb, index).

Remove 'depth' field from sbitmap_word, then the annotation of
____cacheline_aligned_in_smp for 'word' isn't needed any more.

Not see performance effect when running high parallel IOPS test on
null_blk.

This way saves us one cacheline(usually 64 words) per each sbitmap_word.

	Cc: Martin Wilck <martin.wilck@suse.com>
	Signed-off-by: Ming Lei <ming.lei@redhat.com>
	Reviewed-by: Martin Wilck <mwilck@suse.com>
	Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 3301bc5)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author wuchi <wuchi.zero@gmail.com>
commit fbb564a

1. Getting next index before continue branch.
2. Checking free bits when setting the target bits. Otherwise,
it may reuse the busying bits.

	Signed-off-by: wuchi <wuchi.zero@gmail.com>
	Reviewed-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20220605145835.26916-1-wuchi.zero@gmail.com
Fixes: 9672b0d ("sbitmap: add __sbitmap_queue_get_batch()")
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit fbb564a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Liu Song <liusong@linux.alibaba.com>
commit ddbfc34

If "nr + nr_tags <= map_depth", then the value of nr_tags will not be
greater than map_depth, so no additional comparison is required.

	Signed-off-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1661483653-27326-1-git-send-email-liusong@linux.alibaba.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit ddbfc34)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Uros Bizjak <ubizjak@gmail.com>
commit c35227d

Use atomic_long_try_cmpxchg instead of
atomic_long_cmpxchg (*ptr, old, new) == old in __sbitmap_queue_get_batch.
x86 CMPXCHG instruction returns success in ZF flag, so this change
saves a compare after cmpxchg (and related move instruction in front
of cmpxchg).

Also, atomic_long_cmpxchg implicitly assigns old *ptr value to "old"
when cmpxchg fails, enabling further code simplifications, e.g.
an extra memory read can be avoided in the loop.

No functional change intended.

	Cc: Jens Axboe <axboe@kernel.dk>
	Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20220908151200.9993-1-ubizjak@gmail.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit c35227d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…t_shallow

jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kemeng Shi <shikemeng@huaweicloud.com>
commit f1591a8

Updates to alloc_hint in the loop in __sbitmap_get_shallow() are mostly
pointless and equivalent to setting alloc_hint to zero (because
SB_NR_TO_BIT() considers only low sb->shift bits from alloc_hint). So
simplify the logic.

	Reviewed-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230116205059.3821738-2-shikemeng@huaweicloud.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit f1591a8)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kemeng Shi <shikemeng@huaweicloud.com>
commit 903e86f

Commit fbb564a ("lib/sbitmap: Fix invalid loop in
__sbitmap_queue_get_batch()") mentioned that "Checking free bits when
setting the target bits. Otherwise, it may reuse the busying bits."
This commit add check to make sure all masked bits in word before
cmpxchg is zero. Then the existing check after cmpxchg to check any
zero bit is existing in masked bits in word is redundant.

Actually, old value of word before cmpxchg is stored in val and we
will filter out busy bits in val by "(get_mask & ~val)" after cmpxchg.
So we will not reuse busy bits methioned in commit fbb564a
("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert
new-added check to remove redundant check.

Fixes: fbb564a ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()")
	Reviewed-by: Jan Kara <jack@suse.cz>
	Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 903e86f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Ming Lei <ming.lei@redhat.com>
commit 65f666c
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/65f666c6.failed

When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run
with preempt disabled. In RT kernel, spin_lock() can sleep, then warning
of "BUG: sleeping function called from invalid context" can be triggered.

Fix it by replacing it with raw_spin_lock.

	Cc: Yang Yang <yang.yang@vivo.com>
Fixes: 72d04bd ("sbitmap: fix io hung due to race on sbitmap_word::cleared")
	Signed-off-by: Ming Lei <ming.lei@redhat.com>
	Reviewed-by: Yang Yang <yang.yang@vivo.com>
Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 65f666c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	include/linux/sbitmap.h
#	lib/sbitmap.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Yu Kuai <yukuai3@huawei.com>
commit 4f1731d
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/4f1731df.failed

In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:

t1:			t2:
_blk_mq_tag_busy	blk_mq_tag_busy
inc active_queues
// assume 1->2
			inc active_queues
			// 2 -> 3
			blk_mq_update_wake_batch
			// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */

Fix this problem by protecting them wih 'tags->lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.

Fixes: 180dccb ("blk-mq: fix tag_get wait task can't be awakened")
	Signed-off-by: Yu Kuai <yukuai3@huawei.com>
	Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 4f1731d)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	block/blk-mq-tag.c
#	block/blk-mq.h
#	include/linux/blk-mq.h
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Li Lingfeng <lilingfeng3@huawei.com>
commit b313a8c
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/b313a8c8.failed

Lockdep reported a warning in Linux version 6.6:

[  414.344659] ================================
[  414.345155] WARNING: inconsistent lock state
[  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[  414.346221] --------------------------------
[  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.351204] {IN-SOFTIRQ-W} state was registered at:
[  414.351751]   lock_acquire+0x18d/0x460
[  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
[  414.352769]   __wake_up_common_lock+0x22/0x60
[  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
[  414.353829]   sbitmap_queue_clear+0xdd/0x270
[  414.354338]   blk_mq_put_tag+0xdf/0x170
[  414.354807]   __blk_mq_free_request+0x381/0x4d0
[  414.355335]   blk_mq_free_request+0x28b/0x3e0
[  414.355847]   __blk_mq_end_request+0x242/0xc30
[  414.356367]   scsi_end_request+0x2c1/0x830
[  414.345155] WARNING: inconsistent lock state
[  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
[  414.346221] --------------------------------
[  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
[  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.351204] {IN-SOFTIRQ-W} state was registered at:
[  414.351751]   lock_acquire+0x18d/0x460
[  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
[  414.352769]   __wake_up_common_lock+0x22/0x60
[  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
[  414.353829]   sbitmap_queue_clear+0xdd/0x270
[  414.354338]   blk_mq_put_tag+0xdf/0x170
[  414.354807]   __blk_mq_free_request+0x381/0x4d0
[  414.355335]   blk_mq_free_request+0x28b/0x3e0
[  414.355847]   __blk_mq_end_request+0x242/0xc30
[  414.356367]   scsi_end_request+0x2c1/0x830
[  414.356863]   scsi_io_completion+0x177/0x1610
[  414.357379]   scsi_complete+0x12f/0x260
[  414.357856]   blk_complete_reqs+0xba/0xf0
[  414.358338]   __do_softirq+0x1b0/0x7a2
[  414.358796]   irq_exit_rcu+0x14b/0x1a0
[  414.359262]   sysvec_call_function_single+0xaf/0xc0
[  414.359828]   asm_sysvec_call_function_single+0x1a/0x20
[  414.360426]   default_idle+0x1e/0x30
[  414.360873]   default_idle_call+0x9b/0x1f0
[  414.361390]   do_idle+0x2d2/0x3e0
[  414.361819]   cpu_startup_entry+0x55/0x60
[  414.362314]   start_secondary+0x235/0x2b0
[  414.362809]   secondary_startup_64_no_verify+0x18f/0x19b
[  414.363413] irq event stamp: 428794
[  414.363825] hardirqs last  enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200
[  414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50
[  414.365629] softirqs last  enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2
[  414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0
[  414.367425]
               other info that might help us debug this:
[  414.368194]  Possible unsafe locking scenario:
[  414.368900]        CPU0
[  414.369225]        ----
[  414.369548]   lock(&sbq->ws[i].wait);
[  414.370000]   <Interrupt>
[  414.370342]     lock(&sbq->ws[i].wait);
[  414.370802]
                *** DEADLOCK ***
[  414.371569] 5 locks held by kworker/u10:3/1152:
[  414.372088]  #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0
[  414.373180]  #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0
[  414.374384]  #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00
[  414.375342]  #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
[  414.376377]  #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0
[  414.378607]
               stack backtrace:
[  414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6
[  414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  414.381177] Workqueue: writeback wb_workfn (flush-253:0)
[  414.381805] Call Trace:
[  414.382136]  <TASK>
[  414.382429]  dump_stack_lvl+0x91/0xf0
[  414.382884]  mark_lock_irq+0xb3b/0x1260
[  414.383367]  ? __pfx_mark_lock_irq+0x10/0x10
[  414.383889]  ? stack_trace_save+0x8e/0xc0
[  414.384373]  ? __pfx_stack_trace_save+0x10/0x10
[  414.384903]  ? graph_lock+0xcf/0x410
[  414.385350]  ? save_trace+0x3d/0xc70
[  414.385808]  mark_lock.part.20+0x56d/0xa90
[  414.386317]  mark_held_locks+0xb0/0x110
[  414.386791]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  414.387320]  lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.387901]  ? _raw_spin_unlock_irq+0x28/0x50
[  414.388422]  trace_hardirqs_on+0x58/0x100
[  414.388917]  _raw_spin_unlock_irq+0x28/0x50
[  414.389422]  __blk_mq_tag_busy+0x1d6/0x2a0
[  414.389920]  __blk_mq_get_driver_tag+0x761/0x9f0
[  414.390899]  blk_mq_dispatch_rq_list+0x1780/0x1ee0
[  414.391473]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[  414.392070]  ? sbitmap_get+0x2b8/0x450
[  414.392533]  ? __blk_mq_get_driver_tag+0x210/0x9f0
[  414.393095]  __blk_mq_sched_dispatch_requests+0xd99/0x1690
[  414.393730]  ? elv_attempt_insert_merge+0x1b1/0x420
[  414.394302]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[  414.394970]  ? lock_acquire+0x18d/0x460
[  414.395456]  ? blk_mq_run_hw_queue+0x637/0xa00
[  414.395986]  ? __pfx_lock_acquire+0x10/0x10
[  414.396499]  blk_mq_sched_dispatch_requests+0x109/0x190
[  414.397100]  blk_mq_run_hw_queue+0x66e/0xa00
[  414.397616]  blk_mq_flush_plug_list.part.17+0x614/0x2030
[  414.398244]  ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10
[  414.398897]  ? writeback_sb_inodes+0x241/0xcc0
[  414.399429]  blk_mq_flush_plug_list+0x65/0x80
[  414.399957]  __blk_flush_plug+0x2f1/0x530
[  414.400458]  ? __pfx___blk_flush_plug+0x10/0x10
[  414.400999]  blk_finish_plug+0x59/0xa0
[  414.401467]  wb_writeback+0x7cc/0x920
[  414.401935]  ? __pfx_wb_writeback+0x10/0x10
[  414.402442]  ? mark_held_locks+0xb0/0x110
[  414.402931]  ? __pfx_do_raw_spin_lock+0x10/0x10
[  414.403462]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.404062]  wb_workfn+0x2b3/0xcf0
[  414.404500]  ? __pfx_wb_workfn+0x10/0x10
[  414.404989]  process_scheduled_works+0x432/0x13f0
[  414.405546]  ? __pfx_process_scheduled_works+0x10/0x10
[  414.406139]  ? do_raw_spin_lock+0x101/0x2a0
[  414.406641]  ? assign_work+0x19b/0x240
[  414.407106]  ? lock_is_held_type+0x9d/0x110
[  414.407604]  worker_thread+0x6f2/0x1160
[  414.408075]  ? __kthread_parkme+0x62/0x210
[  414.408572]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
[  414.409168]  ? __kthread_parkme+0x13c/0x210
[  414.409678]  ? __pfx_worker_thread+0x10/0x10
[  414.410191]  kthread+0x33c/0x440
[  414.410602]  ? __pfx_kthread+0x10/0x10
[  414.411068]  ret_from_fork+0x4d/0x80
[  414.411526]  ? __pfx_kthread+0x10/0x10
[  414.411993]  ret_from_fork_asm+0x1b/0x30
[  414.412489]  </TASK>

When interrupt is turned on while a lock holding by spin_lock_irq it
throws a warning because of potential deadlock.

blk_mq_prep_dispatch_rq
 blk_mq_get_driver_tag
  __blk_mq_get_driver_tag
   __blk_mq_alloc_driver_tag
    blk_mq_tag_busy -> tag is already busy
    // failed to get driver tag
 blk_mq_mark_tag_wait
  spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait)
  __add_wait_queue(wq, wait) -> wait queue active
  blk_mq_get_driver_tag
  __blk_mq_tag_busy
-> 1) tag must be idle, which means there can't be inflight IO
   spin_lock_irq(&tags->lock) -> lock B (hctx->tags)
   spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally
-> 2) context must be preempt by IO interrupt to trigger deadlock.

As shown above, the deadlock is not possible in theory, but the warning
still need to be fixed.

Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.

Fixes: 4f1731d ("blk-mq: fix potential io hang by wrong 'wake_batch'")
	Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
	Reviewed-by: Ming Lei <ming.lei@redhat.com>
	Reviewed-by: Yu Kuai <yukuai3@huawei.com>
	Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit b313a8c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	block/blk-mq-tag.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kemeng Shi <shikemeng@huaweicloud.com>
commit 5c7fa5c

After commit 1063973 ("sbitmap: fix batching wakeup"), we may wake
up more than one queue for each batch. Just remove stale comment that
we wake up only one queue for each batch.

	Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20240115145626.665562-1-shikemeng@huaweicloud.com
	Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 5c7fa5c)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-38464
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Kuniyuki Iwashima <kuniyu@google.com>
commit 667eeab

syzbot reported a null-ptr-deref in tipc_conn_close() during netns
dismantle. [0]

tipc_topsrv_stop() iterates tipc_net(net)->topsrv->conn_idr and calls
tipc_conn_close() for each tipc_conn.

The problem is that tipc_conn_close() is called after releasing the
IDR lock.

At the same time, there might be tipc_conn_recv_work() running and it
could call tipc_conn_close() for the same tipc_conn and release its
last ->kref.

Once we release the IDR lock in tipc_topsrv_stop(), there is no
guarantee that the tipc_conn is alive.

Let's hold the ref before releasing the lock and put the ref after
tipc_conn_close() in tipc_topsrv_stop().

[0]:
BUG: KASAN: use-after-free in tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165
Read of size 8 at addr ffff888099305a08 by task kworker/u4:3/435

CPU: 0 PID: 435 Comm: kworker/u4:3 Not tainted 4.19.204-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1fc/0x2ef lib/dump_stack.c:118
 print_address_description.cold+0x54/0x219 mm/kasan/report.c:256
 kasan_report_error.cold+0x8a/0x1b9 mm/kasan/report.c:354
 kasan_report mm/kasan/report.c:412 [inline]
 __asan_report_load8_noabort+0x88/0x90 mm/kasan/report.c:433
 tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165
 tipc_topsrv_stop net/tipc/topsrv.c:701 [inline]
 tipc_topsrv_exit_net+0x27b/0x5c0 net/tipc/topsrv.c:722
 ops_exit_list+0xa5/0x150 net/core/net_namespace.c:153
 cleanup_net+0x3b4/0x8b0 net/core/net_namespace.c:553
 process_one_work+0x864/0x1570 kernel/workqueue.c:2153
 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296
 kthread+0x33f/0x460 kernel/kthread.c:259
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415

Allocated by task 23:
 kmem_cache_alloc_trace+0x12f/0x380 mm/slab.c:3625
 kmalloc include/linux/slab.h:515 [inline]
 kzalloc include/linux/slab.h:709 [inline]
 tipc_conn_alloc+0x43/0x4f0 net/tipc/topsrv.c:192
 tipc_topsrv_accept+0x1b5/0x280 net/tipc/topsrv.c:470
 process_one_work+0x864/0x1570 kernel/workqueue.c:2153
 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296
 kthread+0x33f/0x460 kernel/kthread.c:259
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415

Freed by task 23:
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xcc/0x210 mm/slab.c:3822
 tipc_conn_kref_release net/tipc/topsrv.c:150 [inline]
 kref_put include/linux/kref.h:70 [inline]
 conn_put+0x2cd/0x3a0 net/tipc/topsrv.c:155
 process_one_work+0x864/0x1570 kernel/workqueue.c:2153
 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296
 kthread+0x33f/0x460 kernel/kthread.c:259
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415

The buggy address belongs to the object at ffff888099305a00
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 8 bytes inside of
 512-byte region [ffff888099305a00, ffff888099305c00)
The buggy address belongs to the page:
page:ffffea000264c140 count:1 mapcount:0 mapping:ffff88813bff0940 index:0x0
flags: 0xfff00000000100(slab)
raw: 00fff00000000100 ffffea00028b6b88 ffffea0002cd2b08 ffff88813bff0940
raw: 0000000000000000 ffff888099305000 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888099305900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888099305980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888099305a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 ffff888099305a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888099305b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: c5fa7b3 ("tipc: introduce new TIPC server infrastructure")
	Reported-by: syzbot+d333febcf8f4bc5f6110@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=27169a847a70550d17be
	Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
	Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20250702014350.692213-1-kuniyu@google.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 667eeab)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Arvind Sankar <nivedita@alum.mit.edu>
commit 7dde67f
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/7dde67f2.failed

Add support for the x86 CMDLINE_BOOL and CMDLINE_OVERRIDE configuration
options.

	Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200430182843.2510180-11-nivedita@alum.mit.edu
	Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
(cherry picked from commit 7dde67f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	drivers/firmware/efi/libstub/x86-stub.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Arvind Sankar <nivedita@alum.mit.edu>
commit 055042b
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/055042be.failed

efi_parse_options can fail if it is unable to allocate space for a copy
of the command line. Check the return value to make sure it succeeded.

	Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200430182843.2510180-12-nivedita@alum.mit.edu
	Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
(cherry picked from commit 055042b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	drivers/firmware/efi/libstub/arm-stub.c
#	drivers/firmware/efi/libstub/x86-stub.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Ard Biesheuvel <ardb@kernel.org>
commit 15aa8fb
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/15aa8fb8.failed

The legacy decompressor has elaborate logic to ensure that the
randomized physical placement of the decompressed kernel image does not
conflict with any memory reservations, including ones specified on the
command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are
taken into account by the kernel proper at a later stage.

When booting in EFI mode, it is the firmware's job to ensure that the
chosen range does not conflict with any memory reservations that it
knows about, and this is trivially achieved by using the firmware's
memory allocation APIs.

That leaves reservations specified on the command line, though, which
the firmware knows nothing about, as these regions have no other special
significance to the platform. Since commit

  a1b87d5 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")

these reservations are not taken into account when randomizing the
physical placement, which may result in conflicts where the memory
cannot be reserved by the kernel proper because its own executable image
resides there.

To avoid having to duplicate or reuse the existing complicated logic,
disable physical KASLR entirely when such overrides are specified. These
are mostly diagnostic tools or niche features, and physical KASLR (as
opposed to virtual KASLR, which is much more important as it affects the
memory addresses observed by code executing in the kernel) is something
we can live without.

Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com
	Reported-by: Ben Chaney <bchaney@akamai.com>
Fixes: a1b87d5 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
	Cc:  <stable@vger.kernel.org> # v6.1+
	Reviewed-by: Kees Cook <keescook@chromium.org>
	Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
(cherry picked from commit 15aa8fb)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>

# Conflicts:
#	arch/x86/boot/compressed/eboot.c
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Matthew Wilcox (Oracle) <willy@infradead.org>
commit b0b598e

The original problem of the overly long list of waiters on a locked page
was solved properly by commit 9a1ea43 ("mm:
put_and_wait_on_page_locked() while page is migrated").  In the meantime,
using bookmarks for the writeback bit can cause livelocks, so we need to
stop using them.

Link: https://lkml.kernel.org/r/20231010035829.544242-1-willy@infradead.org
	Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
	Cc: Bin Lai <sclaibin@gmail.com>
	Cc: Benjamin Segall <bsegall@google.com>
	Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
	Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
	Cc: Ingo Molnar <mingo@redhat.com>
	Cc: Juri Lelli <juri.lelli@redhat.com>
	Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
	Cc: Mel Gorman <mgorman@suse.de>
	Cc: Peter Zijlstra <peterz@infradead.org>
	Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
	Cc: Valentin Schneider <vschneid@redhat.com>
	Cc: Vincent Guittot <vincent.guittot@linaro.org>

	Cc: Ingo Molnar <mingo@kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b0b598e)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Victor Nogueira <victor@mojatatu.com>
commit ffdde7b

Lion's patch [1] revealed an ancient bug in the qdisc API.
Whenever a user creates/modifies a qdisc specifying as a parent another
qdisc, the qdisc API will, during grafting, detect that the user is
not trying to attach to a class and reject. However grafting is
performed after qdisc_create (and thus the qdiscs' init callback) is
executed. In qdiscs that eventually call qdisc_tree_reduce_backlog
during init or change (such as fq, hhf, choke, etc), an issue
arises. For example, executing the following commands:

sudo tc qdisc add dev lo root handle a: htb default 2
sudo tc qdisc add dev lo parent a: handle beef fq

Qdiscs such as fq, hhf, choke, etc unconditionally invoke
qdisc_tree_reduce_backlog() in their control path init() or change() which
then causes a failure to find the child class; however, that does not stop
the unconditional invocation of the assumed child qdisc's qlen_notify with
a null class. All these qdiscs make the assumption that class is non-null.

The solution is ensure that qdisc_leaf() which looks up the parent
class, and is invoked prior to qdisc_create(), should return failure on
not finding the class.
In this patch, we leverage qdisc_leaf to return ERR_PTRs whenever the
parentid doesn't correspond to a class, so that we can detect it
earlier on and abort before qdisc_create is called.

[1] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Fixes: 5e50da0 ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs")
	Reported-by: syzbot+d8b58d7b0ad89a678a16@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68663c93.a70a0220.5d25f.0857.GAE@google.com/
	Reported-by: syzbot+5eccb463fa89309d8bdc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68663c94.a70a0220.5d25f.0858.GAE@google.com/
	Reported-by: syzbot+1261670bbdefc5485a06@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0013.GAE@google.com/
	Reported-by: syzbot+15b96fc3aac35468fe77@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0014.GAE@google.com/
	Reported-by: syzbot+4dadc5aecf80324d5a51@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68679e81.a70a0220.29cf51.0016.GAE@google.com/
	Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
	Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
	Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250707210801.372995-1-victor@mojatatu.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit ffdde7b)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Anumula Murali Mohan Reddy <anumula@chelsio.com>
commit 356983f

t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl()
uses port number to get mac addr, this leads to error when an attempt
to set MAC address on VF's of PF2 and PF3.
This patch fixes the issue by using port number to set mac address.

Fixes: e0cdac6 ("cxgb4vf: configure ports accessible by the VF")
	Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
	Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
	Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 356983f)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-38477
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Xiang Mei <xmei5@asu.edu>
commit 5e28d5a

A race condition can occur when 'agg' is modified in qfq_change_agg
(called during qfq_enqueue) while other threads access it
concurrently. For example, qfq_dump_class may trigger a NULL
dereference, and qfq_delete_class may cause a use-after-free.

This patch addresses the issue by:

1. Moved qfq_destroy_class into the critical section.

2. Added sch_tree_lock protection to qfq_dump_class and
qfq_dump_class_stats.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Signed-off-by: Xiang Mei <xmei5@asu.edu>
	Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
	Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5e28d5a)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
… qfq_delete_class

jira LE-4066
cve CVE-2025-38477
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Xiang Mei <xmei5@asu.edu>
commit cf074ec

might_sleep could be trigger in the atomic context in qfq_delete_class.

qfq_destroy_class was moved into atomic context locked
by sch_tree_lock to avoid a race condition bug on
qfq_aggregate. However, might_sleep could be triggered by
qfq_destroy_class, which introduced sleeping in atomic context (path:
qfq_destroy_class->qdisc_put->__qdisc_destroy->lockdep_unregister_key
->might_sleep).

Considering the race is on the qfq_aggregate objects, keeping
qfq_rm_from_agg in the lock but moving the left part out can solve
this issue.

Fixes: 5e28d5a ("net/sched: sch_qfq: Fix race condition on qfq_aggregate")
	Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
	Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/4a04e0cc-a64b-44e7-9213-2880ed641d77@sabinyo.mountain
	Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
	Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250717230128.159766-1-xmei5@asu.edu
	Signed-off-by: Paolo Abeni <pabeni@redhat.com>

(cherry picked from commit cf074ec)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2024-42285
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Bart Van Assche <bvanassche@acm.org>
commit aee2424

iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with
an existing struct iw_cm_id (cm_id) as follows:

        conn_id->cm_id.iw = cm_id;
        cm_id->context = conn_id;
        cm_id->cm_handler = cma_iw_handler;

rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make
sure that cm_work_handler() does not trigger a use-after-free by only
freeing of the struct rdma_id_private after all pending work has finished.

	Cc: stable@vger.kernel.org
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
	Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
	Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
	Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit aee2424)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2024-47696
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Zhu Yanjun <yanjun.zhu@linux.dev>
commit 86dfdd8

In the commit aee2424 ("RDMA/iwcm: Fix a use-after-free related to
destroying CM IDs"), the function flush_workqueue is invoked to flush the
work queue iwcm_wq.

But at that time, the work queue iwcm_wq was created via the function
alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM.

Because the current process is trying to flush the whole iwcm_wq, if
iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current
process is not reclaiming memory or running on a workqueue which doesn't
have the flag WQ_MEM_RECLAIM as that can break forward-progress guarantee
leading to a deadlock.

The call trace is as below:

[  125.350876][ T1430] Call Trace:
[  125.356281][ T1430]  <TASK>
[ 125.361285][ T1430] ? __warn (kernel/panic.c:693)
[ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970)
[ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151)
[ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm
[ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910)
[ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
[ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm
[ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma
[ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma
[ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231)
[ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393)
[ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339)
[ 125.531837][ T1430] kthread (kernel/kthread.c:389)
[ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147)
[ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[  125.566487][ T1430]  </TASK>
[  125.566488][ T1430] ---[ end trace 0000000000000000 ]---

Fixes: aee2424 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs")
Link: https://patch.msgid.link/r/20240820113336.19860-1-yanjun.zhu@linux.dev
	Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
	Tested-by: kernel test robot <oliver.sang@intel.com>
	Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
	Reviewed-by: Bart Van Assche <bvanassche@acm.org>
	Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
(cherry picked from commit 86dfdd8)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-38211
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
commit 6883b68

The commit 59c68ac ("iw_cm: free cm_id resources on the last
deref") simplified cm_id resource management by freeing cm_id once all
references to the cm_id were removed. The references are removed either
upon completion of iw_cm event handlers or when the application destroys
the cm_id. This commit introduced the use-after-free condition where
cm_id_private object could still be in use by event handler works during
the destruction of cm_id. The commit aee2424 ("RDMA/iwcm: Fix a
use-after-free related to destroying CM IDs") addressed this use-after-
free by flushing all pending works at the cm_id destruction.

However, still another use-after-free possibility remained. It happens
with the work objects allocated for each cm_id_priv within
alloc_work_entries() during cm_id creation, and subsequently freed in
dealloc_work_entries() once all references to the cm_id are removed.
If the cm_id's last reference is decremented in the event handler work,
the work object for the work itself gets removed, and causes the use-
after-free BUG below:

  BUG: KASAN: slab-use-after-free in __pwq_activate_work+0x1ff/0x250
  Read of size 8 at addr ffff88811f9cf800 by task kworker/u16:1/147091

  CPU: 2 UID: 0 PID: 147091 Comm: kworker/u16:1 Not tainted 6.15.0-rc2+ #27 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
  Workqueue:  0x0 (iw_cm_wq)
  Call Trace:
   <TASK>
   dump_stack_lvl+0x6a/0x90
   print_report+0x174/0x554
   ? __virt_addr_valid+0x208/0x430
   ? __pwq_activate_work+0x1ff/0x250
   kasan_report+0xae/0x170
   ? __pwq_activate_work+0x1ff/0x250
   __pwq_activate_work+0x1ff/0x250
   pwq_dec_nr_in_flight+0x8c5/0xfb0
   process_one_work+0xc11/0x1460
   ? __pfx_process_one_work+0x10/0x10
   ? assign_work+0x16c/0x240
   worker_thread+0x5ef/0xfd0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x3b0/0x770
   ? __pfx_kthread+0x10/0x10
   ? rcu_is_watching+0x11/0xb0
   ? _raw_spin_unlock_irq+0x24/0x50
   ? rcu_is_watching+0x11/0xb0
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x70
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

  Allocated by task 147416:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   __kasan_kmalloc+0xa6/0xb0
   alloc_work_entries+0xa9/0x260 [iw_cm]
   iw_cm_connect+0x23/0x4a0 [iw_cm]
   rdma_connect_locked+0xbfd/0x1920 [rdma_cm]
   nvme_rdma_cm_handler+0x8e5/0x1b60 [nvme_rdma]
   cma_cm_event_handler+0xae/0x320 [rdma_cm]
   cma_work_handler+0x106/0x1b0 [rdma_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Freed by task 147091:
   kasan_save_stack+0x2c/0x50
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x60
   __kasan_slab_free+0x4b/0x70
   kfree+0x13a/0x4b0
   dealloc_work_entries+0x125/0x1f0 [iw_cm]
   iwcm_deref_id+0x6f/0xa0 [iw_cm]
   cm_work_handler+0x136/0x1ba0 [iw_cm]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

  Last potentially related work creation:
   kasan_save_stack+0x2c/0x50
   kasan_record_aux_stack+0xa3/0xb0
   __queue_work+0x2ff/0x1390
   queue_work_on+0x67/0xc0
   cm_event_handler+0x46a/0x820 [iw_cm]
   siw_cm_upcall+0x330/0x650 [siw]
   siw_cm_work_handler+0x6b9/0x2b20 [siw]
   process_one_work+0x84f/0x1460
   worker_thread+0x5ef/0xfd0
   kthread+0x3b0/0x770
   ret_from_fork+0x30/0x70
   ret_from_fork_asm+0x1a/0x30

This BUG is reproducible by repeating the blktests test case nvme/061
for the rdma transport and the siw driver.

To avoid the use-after-free of cm_id_private work objects, ensure that
the last reference to the cm_id is decremented not in the event handler
works, but in the cm_id destruction context. For that purpose, move
iwcm_deref_id() call from destroy_cm_id() to the callers of
destroy_cm_id(). In iw_destroy_cm_id(), call iwcm_deref_id() after
flushing the pending works.

During the fix work, I noticed that iw_destroy_cm_id() is called from
cm_work_handler() and process_event() context. However, the comment of
iw_destroy_cm_id() notes that the function "cannot be called by the
event thread". Drop the false comment.

Closes: https://lore.kernel.org/linux-rdma/r5676e754sv35aq7cdsqrlnvyhiq5zktteaurl7vmfih35efko@z6lay7uypy3c/
Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref")
	Cc: stable@vger.kernel.org
	Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://patch.msgid.link/20250510101036.1756439-1-shinichiro.kawasaki@wdc.com
	Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
	Signed-off-by: Leon Romanovsky <leon@kernel.org>
(cherry picked from commit 6883b68)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
… counter

jira LE-4066
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Thomas Gleixner <tglx@linutronix.de>
commit f944ffc

For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.

This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.

So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period.  As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.

Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.

Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6 ("kernel/watchdog: Prevent false positives with turbo modes")
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
	Cc: Arjan van de Ven <arjan@linux.intel.com>
	Cc: Peter Zijlstra <peterz@infradead.org>
	Cc: <stable@vger.kernel.org>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit f944ffc)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066
cve CVE-2025-38332
Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10
commit-author Daniel Wagner <wagi@kernel.org>
commit ae82eaf

The strlcat() with FORTIFY support is triggering a panic because it
thinks the target buffer will overflow although the correct target
buffer size is passed in.

Anyway, instead of memset() with 0 followed by a strlcat(), just use
memcpy() and ensure that the resulting buffer is NULL terminated.

BIOSVersion is only used for the lpfc_printf_log() which expects a
properly terminated string.

	Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20250409-fix-lpfc-bios-str-v1-1-05dac9e51e13@kernel.org
	Reviewed-by: Justin Tee <justin.tee@broadcom.com>
	Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ae82eaf)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..kernel-mainline: 567023
Number of commits in rpm: 61
Number of commits matched with upstream: 54 (88.52%)
Number of commits in upstream but not in rpm: 566969
Number of commits NOT found in upstream: 7 (11.48%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.72.1.el8_10 for kernel-4.18.0-553.72.1.el8_10
Clean Cherry Picks: 35 (64.81%)
Empty Cherry Picks: 19 (35.19%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat PlaidCat merged commit 06a1b8b into rocky8_10 Sep 5, 2025
2 checks passed
@PlaidCat PlaidCat deleted the rocky8_10_rebuild branch September 5, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants