-
Notifications
You must be signed in to change notification settings - Fork 12
[rocky8_10] History rebuild for kernel-4.18.0-553.72.1.el8_10 #551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Niklas Schnelle <schnelle@linux.ibm.com> commit 4553792 The error event information for PCI error events contains a function handle for the respective function. This handle is generally captured at the time the error event was recorded. Due to delays in processing or cascading issues, it may happen that during firmware recovery multiple events are generated. When processing these events in order Linux may already have recovered an affected function making the event information stale. Fix this by doing an unconditional CLP List PCI function retrieving the current function handle with the zdev->state_lock held and ignoring the event if its function handle is stale. Cc: stable@vger.kernel.org Fixes: 4cdf2f4 ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> (cherry picked from commit 4553792) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Niklas Schnelle <schnelle@linux.ibm.com> commit b97a797 If a device is disabled unblocking load/store on its own is not useful as a full re-enable of the function is necessary anyway. Note that SCLP Write Event Data Action Qualifier 0 (Reset) leaves the device disabled and triggers this case unless the driver already requests a reset. Cc: stable@vger.kernel.org Fixes: 4cdf2f4 ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> (cherry picked from commit b97a797) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-22097 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author José Expósito <jose.exposito89@gmail.com> commit ed15511 If the driver initialization fails, the vkms_exit() function might access an uninitialized or freed default_config pointer and it might double free it. Fix both possible errors by initializing default_config only when the driver initialization succeeded. Reported-by: Louis Chauvet <louis.chauvet@bootlin.com> Closes: https://lore.kernel.org/all/Z5uDHcCmAwiTsGte@louis-chauvet-laptop/ Fixes: 2df7af9 ("drm/vkms: Add vkms_config type") Signed-off-by: José Expósito <jose.exposito89@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmremann@suse.de> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250212084912.3196-1-jose.exposito89@gmail.com Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com> (cherry picked from commit ed15511) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Cong Wang <xiyou.wangcong@gmail.com> commit a7a15f3 est_qlen_notify() deletes its class from its active list with list_del() when qlen is 0, therefore, it is not idempotent and not friendly to its callers, like fq_codel_dequeue(). Let's make it idempotent to ease qdisc_tree_reduce_backlog() callers' life. Also change other list_del()'s to list_del_init() just to be extra safe. Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250403211033.166059-6-xiyou.wangcong@gmail.com Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit a7a15f3) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-37914 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Victor Nogueira <victor@mojatatu.com> commit 1a6d0c0 As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of ets, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. In addition to checking for qlen being zero, this patch checks whether the class was already added to the active_list (cl_is_active) before doing the addition to cater for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250425220710.3964791-4-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 1a6d0c0) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-38250 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kuniyuki Iwashima <kuniyu@google.com> commit 1d61231 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/1d612310.failed syzbot reported use-after-free in vhci_flush() without repro. [0] From the splat, a thread close()d a vhci file descriptor while its device was being used by iotcl() on another thread. Once the last fd refcnt is released, vhci_release() calls hci_unregister_dev(), hci_free_dev(), and kfree() for struct vhci_data, which is set to hci_dev->dev->driver_data. The problem is that there is no synchronisation after unlinking hdev from hci_dev_list in hci_unregister_dev(). There might be another thread still accessing the hdev which was fetched before the unlink operation. We can use SRCU for such synchronisation. Let's run hci_dev_reset() under SRCU and wait for its completion in hci_unregister_dev(). Another option would be to restore hci_dev->destruct(), which was removed in commit 587ae08 ("Bluetooth: Remove unused hci-destruct cb"). However, this would not be a good solution, as we should not run hci_unregister_dev() while there are in-flight ioctl() requests, which could lead to another data-race KCSAN splat. Note that other drivers seem to have the same problem, for exmaple, virtbt_remove(). [0]: BUG: KASAN: slab-use-after-free in skb_queue_empty_lockless include/linux/skbuff.h:1891 [inline] BUG: KASAN: slab-use-after-free in skb_queue_purge_reason+0x99/0x360 net/core/skbuff.c:3937 Read of size 8 at addr ffff88807cb8d858 by task syz.1.219/6718 CPU: 1 UID: 0 PID: 6718 Comm: syz.1.219 Not tainted 6.16.0-rc1-syzkaller-00196-g08207f42d3ff #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xd2/0x2b0 mm/kasan/report.c:521 kasan_report+0x118/0x150 mm/kasan/report.c:634 skb_queue_empty_lockless include/linux/skbuff.h:1891 [inline] skb_queue_purge_reason+0x99/0x360 net/core/skbuff.c:3937 skb_queue_purge include/linux/skbuff.h:3368 [inline] vhci_flush+0x44/0x50 drivers/bluetooth/hci_vhci.c:69 hci_dev_do_reset net/bluetooth/hci_core.c:552 [inline] hci_dev_reset+0x420/0x5c0 net/bluetooth/hci_core.c:592 sock_do_ioctl+0xd9/0x300 net/socket.c:1190 sock_ioctl+0x576/0x790 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fcf5b98e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fcf5c7b9038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fcf5bbb6160 RCX: 00007fcf5b98e929 RDX: 0000000000000000 RSI: 00000000400448cb RDI: 0000000000000009 RBP: 00007fcf5ba10b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fcf5bbb6160 R15: 00007ffd6353d528 </TASK> Allocated by task 6535: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __kmalloc_cache_noprof+0x230/0x3d0 mm/slub.c:4359 kmalloc_noprof include/linux/slab.h:905 [inline] kzalloc_noprof include/linux/slab.h:1039 [inline] vhci_open+0x57/0x360 drivers/bluetooth/hci_vhci.c:635 misc_open+0x2bc/0x330 drivers/char/misc.c:161 chrdev_open+0x4c9/0x5e0 fs/char_dev.c:414 do_dentry_open+0xdf0/0x1970 fs/open.c:964 vfs_open+0x3b/0x340 fs/open.c:1094 do_open fs/namei.c:3887 [inline] path_openat+0x2ee5/0x3830 fs/namei.c:4046 do_filp_open+0x1fa/0x410 fs/namei.c:4073 do_sys_openat2+0x121/0x1c0 fs/open.c:1437 do_sys_open fs/open.c:1452 [inline] __do_sys_openat fs/open.c:1468 [inline] __se_sys_openat fs/open.c:1463 [inline] __x64_sys_openat+0x138/0x170 fs/open.c:1463 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 6535: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:576 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x62/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2381 [inline] slab_free mm/slub.c:4643 [inline] kfree+0x18e/0x440 mm/slub.c:4842 vhci_release+0xbc/0xd0 drivers/bluetooth/hci_vhci.c:671 __fput+0x44c/0xa70 fs/file_table.c:465 task_work_run+0x1d1/0x260 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x6ad/0x22e0 kernel/exit.c:955 do_group_exit+0x21c/0x2d0 kernel/exit.c:1104 __do_sys_exit_group kernel/exit.c:1115 [inline] __se_sys_exit_group kernel/exit.c:1113 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1113 x64_sys_call+0x21ba/0x21c0 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at ffff88807cb8d800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 88 bytes inside of freed 1024-byte region [ffff88807cb8d800, ffff88807cb8dc00) Fixes: bf18c71 ("Bluetooth: vhci: Free driver_data on file release") Reported-by: syzbot+2faa4825e556199361f9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f62d64848fc4c7c30cd6 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> (cherry picked from commit 1d61231) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # net/bluetooth/hci_core.c
jira LE-4066 cve CVE-2025-38380 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Michael J. Ruhl <michael.j.ruhl@intel.com> commit 3d30048 The i2c_dw_xfer_init() function requires msgs and msg_write_idx from the dev context to be initialized. amd_i2c_dw_xfer_quirk() inits msgs and msgs_num, but not msg_write_idx. This could allow an out of bounds access (of msgs). Initialize msg_write_idx before calling i2c_dw_xfer_init(). Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Fixes: 17631e8 ("i2c: designware: Add driver support for AMD NAVI GPU") Cc: <stable@vger.kernel.org> # v5.13+ Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250627143511.489570-1-michael.j.ruhl@intel.com (cherry picked from commit 3d30048) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Andreas Gruenbacher <agruenba@redhat.com> commit e3da6be Function gfs2_withdraw() tries to synchronize concurrent callers by atomically setting the SDF_WITHDRAWN flag in the first caller, setting the SDF_WITHDRAW_IN_PROG flag to indicate that a withdraw is in progress, performing the actual withdraw, and clearing the SDF_WITHDRAW_IN_PROG flag when done. All other callers wait for the SDF_WITHDRAW_IN_PROG flag to be cleared before returning. This leaves a small window in which callers can find the SDF_WITHDRAWN flag set before the SDF_WITHDRAW_IN_PROG flag has been set, causing them to return prematurely, before the withdraw has been completed. Fix that by setting the SDF_WITHDRAWN and SDF_WITHDRAW_IN_PROG flags atomically. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit e3da6be) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Andreas Gruenbacher <agruenba@redhat.com> commit f80d882 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/f80d882e.failed In function signal_our_withdraw(), we are calling gfs2_glock_queue_put() in a context in which we are actually allowed to sleep, so replace that with a simple call to gfs2_glock_put(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit f80d882) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # fs/gfs2/util.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Andreas Gruenbacher <agruenba@redhat.com> commit deb016c Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/deb016c1.failed When a node withdraws and it turns out that it is the only node that has the filesystem mounted, gfs2 currently tries to replay the local journal to bring the filesystem back into a consistent state. Not only is that a very bad idea, it has also never worked because gfs2_recover_func() will refuse to do anything during a withdraw. However, before even getting to this point, gfs2_recover_func() dereferences sdp->sd_jdesc->jd_inode. This was a use-after-free before commit 04133b6 ("gfs2: Prevent double iput for journal on error") and is a NULL pointer dereference since then. Simply get rid of self recovery to fix that. Fixes: 601ef0d ("gfs2: Force withdraw to replay journals and wait for it to finish") Reported-by: Chunjie Zhu <chunjie.zhu@cloud.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit deb016c) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # fs/gfs2/util.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Andreas Gruenbacher <agruenba@redhat.com> commit 9e88899 inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cac ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 9e88899) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Andreas Gruenbacher <agruenba@redhat.com> commit ae9f3bd Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/ae9f3bd8.failed Currently, sdp->sd_aspace and the per-inode metadata address spaces use sb->s_bdev->bd_mapping->host as their ->host; folios in those address spaces will thus appear to be on bdev rather than on gfs2 filesystems. This is a problem because gfs2 doesn't support cgroup writeback (SB_I_CGROUPWB), but bdev does. Fix that by using a "dummy" gfs2 inode as ->host in those address spaces. When coming from a folio, folio->mapping->host->i_sb will then be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in sb->s_iflags. Based on a previous version from Bob Peterson from several years ago. Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure this out. Fixes: aaa2cac ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit ae9f3bd) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # fs/gfs2/glock.c # fs/gfs2/ops_fstype.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Chen Ni <nichen@iscas.ac.cn> commit 4023c3c free_percpu() checks for NULL pointers internally. Remove unneeded NULL check here. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit 4023c3c) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Andrew Price <anprice@redhat.com> commit 9126d27 When gfs2_sys_fs_add() fails, it sets sb->s_fs_info to NULL on its error path (see commit 0d51521 ("GFS2: Add kobject release method")). The intention seems to be to prevent dereferencing sb->s_fs_info once the object pointed to has been deallocated, but that would be better achieved by setting the pointer to NULL in free_sbd(). As a consequence, when the call to gfs2_sys_fs_add() fails in gfs2_fill_super(), sdp = GFS2_SB(inode) will evaluate to NULL in iput() -> gfs2_drop_inode(), and accessing sdp->sd_flags will be a NULL pointer dereference. Fix that by only setting sb->s_fs_info to NULL when actually freeing the object pointed to in free_sbd(). Fixes: ae9f3bd ("gfs2: replace sd_aspace with sd_inode") Reported-by: syzbot+b12826218502df019f9d@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit 9126d27) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-38200 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kyungwook Boo <bookyungwook@gmail.com> commit 015bac5 When the device sends a specific input, an integer underflow can occur, leading to MMIO write access to an invalid page. Prevent the integer underflow by changing the type of related variables. Signed-off-by: Kyungwook Boo <bookyungwook@gmail.com> Link: https://lore.kernel.org/lkml/ffc91764-1142-4ba2-91b6-8c773f6f7095@gmail.com/T/ Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> (cherry picked from commit 015bac5) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-22058 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kuniyuki Iwashima <kuniyu@amazon.com> commit df207de Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/df207de9.failed Matt Dowling reported a weird UDP memory usage issue. Under normal operation, the UDP memory usage reported in /proc/net/sockstat remains close to zero. However, it occasionally spiked to 524,288 pages and never dropped. Moreover, the value doubled when the application was terminated. Finally, it caused intermittent packet drops. We can reproduce the issue with the script below [0]: 1. /proc/net/sockstat reports 0 pages # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 0 2. Run the script till the report reaches 524,288 # python3 test.py & sleep 5 # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 524288 <-- (INT_MAX + 1) >> PAGE_SHIFT 3. Kill the socket and confirm the number never drops # pkill python3 && sleep 5 # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 524288 4. (necessary since v6.0) Trigger proto_memory_pcpu_drain() # python3 test.py & sleep 1 && pkill python3 5. The number doubles # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 1048577 The application set INT_MAX to SO_RCVBUF, which triggered an integer overflow in udp_rmem_release(). When a socket is close()d, udp_destruct_common() purges its receive queue and sums up skb->truesize in the queue. This total is calculated and stored in a local unsigned integer variable. The total size is then passed to udp_rmem_release() to adjust memory accounting. However, because the function takes a signed integer argument, the total size can wrap around, causing an overflow. Then, the released amount is calculated as follows: 1) Add size to sk->sk_forward_alloc. 2) Round down sk->sk_forward_alloc to the nearest lower multiple of PAGE_SIZE and assign it to amount. 3) Subtract amount from sk->sk_forward_alloc. 4) Pass amount >> PAGE_SHIFT to __sk_mem_reduce_allocated(). When the issue occurred, the total in udp_destruct_common() was 2147484480 (INT_MAX + 833), which was cast to -2147482816 in udp_rmem_release(). At 1) sk->sk_forward_alloc is changed from 3264 to -2147479552, and 2) sets -2147479552 to amount. 3) reverts the wraparound, so we don't see a warning in inet_sock_destruct(). However, udp_memory_allocated ends up doubling at 4). Since commit 3cd3399 ("net: implement per-cpu reserves for memory_allocated"), memory usage no longer doubles immediately after a socket is close()d because __sk_mem_reduce_allocated() caches the amount in udp_memory_per_cpu_fw_alloc. However, the next time a UDP socket receives a packet, the subtraction takes effect, causing UDP memory usage to double. This issue makes further memory allocation fail once the socket's sk->sk_rmem_alloc exceeds net.ipv4.udp_rmem_min, resulting in packet drops. To prevent this issue, let's use unsigned int for the calculation and call sk_forward_alloc_add() only once for the small delta. Note that first_packet_length() also potentially has the same problem. [0]: from socket import * SO_RCVBUFFORCE = 33 INT_MAX = (2 ** 31) - 1 s = socket(AF_INET, SOCK_DGRAM) s.bind(('', 0)) s.setsockopt(SOL_SOCKET, SO_RCVBUFFORCE, INT_MAX) c = socket(AF_INET, SOCK_DGRAM) c.connect(s.getsockname()) data = b'a' * 100 while True: c.send(data) Fixes: f970bd9 ("udp: implement memory accounting helpers") Reported-by: Matt Dowling <madowlin@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250401184501.67377-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit df207de) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # net/ipv4/udp.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Laibin Qiu <qiulaibin@huawei.com> commit 180dccb Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/180dccb0.failed In case of shared tags, there might be more than one hctx which allocates from the same tags, and each hctx is limited to allocate at most: hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U); tag idle detection is lazy, and may be delayed for 30sec, so there could be just one real active hctx(queue) but all others are actually idle and still accounted as active because of the lazy idle detection. Then if wake_batch is > hctx_max_depth, driver tag allocation may wait forever on this real active hctx. Fix this by recalculating wake_batch when inc or dec active_queues. Fixes: 0d2602c ("blk-mq: improve support for shared tags maps") Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220113025536.1479653-1-qiulaibin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 180dccb) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # block/blk-mq-tag.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Laibin Qiu <qiulaibin@huawei.com> commit 1082541 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/10825410.failed Commit 180dccb ("blk-mq: fix tag_get wait task can't be awakened") will recalculate wake_batch when incrementing or decrementing active_queues to avoid wake_batch > hctx_max_depth. At the same time, in order to not affect performance as much as possible, the minimum wakeup batch is set to 4. But when the QD is small (such as QD=1), if inc or dec active_queues increases wakeup batch, that can lead to a hang: Fix this problem with the following strategies: QD : >= 32 | < 32 --------------------------------- wakeup batch: 8~4 | 3~1 Fixes: 180dccb ("blk-mq: fix tag_get wait task can't be awakened") Link: https://lore.kernel.org/linux-block/78cafe94-a787-e006-8851-69906f0c2128@huawei.com/T/#t Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Link: https://lore.kernel.org/r/20220127100047.1763746-1-qiulaibin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 1082541) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # lib/sbitmap.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Gabriel Krisman Bertazi <krisman@suse.de> commit 4f8126b Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/4f8126bb.failed sbitmap suffers from code complexity, as demonstrated by recent fixes, and eventual lost wake ups on nested I/O completion. The later happens, from what I understand, due to the non-atomic nature of the updates to wait_cnt, which needs to be subtracted and eventually reset when equal to zero. This two step process can eventually miss an update when a nested completion happens to interrupt the CPU in between the wait_cnt updates. This is very hard to fix, as shown by the recent changes to this code. The code complexity arises mostly from the corner cases to avoid missed wakes in this scenario. In addition, the handling of wake_batch recalculation plus the synchronization with sbq_queue_wake_up is non-trivial. This patchset implements the idea originally proposed by Jan [1], which removes the need for the two-step updates of wait_cnt. This is done by tracking the number of completions and wakeups in always increasing, per-bitmap counters. Instead of having to reset the wait_cnt when it reaches zero, we simply keep counting, and attempt to wake up N threads in a single wait queue whenever there is enough space for a batch. Waking up less than batch_wake shouldn't be a problem, because we haven't changed the conditions for wake up, and the existing batch calculation guarantees at least enough remaining completions to wake up a batch for each queue at any time. Performance-wise, one should expect very similar performance to the original algorithm for the case where there is no queueing. In both the old algorithm and this implementation, the first thing is to check ws_active, which bails out if there is no queueing to be managed. In the new code, we took care to avoid accounting completions and wakeups when there is no queueing, to not pay the cost of atomic operations unnecessarily, since it doesn't skew the numbers. For more interesting cases, where there is queueing, we need to take into account the cross-communication of the atomic operations. I've been benchmarking by running parallel fio jobs against a single hctx nullb in different hardware queue depth scenarios, and verifying both IOPS and queueing. Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel jobs. fio was issuing fixed-size randwrites with qd=64 against nullb, varying only the hardware queue length per test. queue size 2 4 8 16 32 64 6.1-rc2 1681.1K (1.6K) 2633.0K (12.7K) 6940.8K (16.3K) 8172.3K (617.5K) 8391.7K (367.1K) 8606.1K (351.2K) patched 1721.8K (15.1K) 3016.7K (3.8K) 7543.0K (89.4K) 8132.5K (303.4K) 8324.2K (230.6K) 8401.8K (284.7K) The following is a similar experiment, ran against a nullb with a single bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40 parallel fio jobs operating on the same device queue size 2 4 8 16 32 64 6.1-rc2 1081.0K (2.3K) 957.2K (1.5K) 1699.1K (5.7K) 6178.2K (124.6K) 12227.9K (37.7K) 13286.6K (92.9K) patched 1081.8K (2.8K) 1316.5K (5.4K) 2364.4K (1.8K) 6151.4K (20.0K) 11893.6K (17.5K) 12385.6K (18.4K) It has also survived blktests and a 12h-stress run against nullb. I also ran the code against nvme and a scsi SSD, and I didn't observe performance regression in those. If there are other tests you think I should run, please let me know and I will follow up with results. [1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/ Cc: Hugh Dickins <hughd@google.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Liu Song <liusong@linux.alibaba.com> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 4f8126b) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # lib/sbitmap.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Gabriel Krisman Bertazi <krisman@suse.de> commit 976570b Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/976570b4.failed When a queue is awaken, the wake_index written by sbq_wake_ptr currently keeps pointing to the same queue. On the next wake up, it will thus retry the same queue, which is unfair to other queues, and can lead to starvation. This patch, moves the index update to happen before the queue is returned, such that it will now try a different queue first on the next wake up, improving fairness. Fixes: 4f8126b ("sbitmap: Use single per-bitmap counting to wake up queued tags") Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 976570b) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # lib/sbitmap.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Pavel Begunkov <asml.silence@gmail.com> commit 016190a Statements in the loop's body and before it are identical. Use do-while to not repeat it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 016190a) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Gabriel Krisman Bertazi <krisman@suse.de> commit ee7dc86 Sbitmap code will need to know how many waiters were actually woken for its batched wakeups implementation. Return the number of woken exclusive waiters from __wake_up() to facilitate that. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221115224553.23594-3-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit ee7dc86) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Gabriel Krisman Bertazi <krisman@suse.de> commit 26edb30 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/26edb30d.failed Jan reported the new algorithm as merged might be problematic if the queue being awaken becomes empty between the waitqueue_active inside sbq_wake_ptr check and the wake up. If that happens, wake_up_nr will not wake up any waiter and we loose too many wake ups. In order to guarantee progress, we need to wake up at least one waiter here, if there are any. This now requires trying to wake up from every queue. Instead of walking through all the queues with sbq_wake_ptr, this call moves the wake up inside that function. In a previous version of the patch, I found that updating wake_index several times when walking through queues had a measurable overhead. This ensures we only update it once, at the end. Fixes: 4f8126b ("sbitmap: Use single per-bitmap counting to wake up queued tags") Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 26edb30) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # lib/sbitmap.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Jens Axboe <axboe@kernel.dk> commit 9672b0d The block layer tag allocation batching still calls into sbitmap to get each tag, but we can improve on that. Add __sbitmap_queue_get_batch(), which returns a mask of tags all at once, along with an offset for those tags. An example return would be 0xff, where bits 0..7 are set, with tag_offset == 128. The valid tags in this case would be 128..135. A batch is specific to an individual sbitmap_map, hence it cannot be larger than that. The requested number of tags is automatically reduced to the max that can be satisfied with a single map. On failure, 0 is returned. Caller should fall back to single tag allocation at that point/ Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 9672b0d) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Ming Lei <ming.lei@redhat.com> commit 3301bc5 Only the last sbitmap_word can have different depth, and all the others must have same depth of 1U << sb->shift, so not necessary to store it in sbitmap_word, and it can be retrieved easily and efficiently by adding one internal helper of __map_depth(sb, index). Remove 'depth' field from sbitmap_word, then the annotation of ____cacheline_aligned_in_smp for 'word' isn't needed any more. Not see performance effect when running high parallel IOPS test on null_blk. This way saves us one cacheline(usually 64 words) per each sbitmap_word. Cc: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 3301bc5) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author wuchi <wuchi.zero@gmail.com> commit fbb564a 1. Getting next index before continue branch. 2. Checking free bits when setting the target bits. Otherwise, it may reuse the busying bits. Signed-off-by: wuchi <wuchi.zero@gmail.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20220605145835.26916-1-wuchi.zero@gmail.com Fixes: 9672b0d ("sbitmap: add __sbitmap_queue_get_batch()") Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit fbb564a) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Liu Song <liusong@linux.alibaba.com> commit ddbfc34 If "nr + nr_tags <= map_depth", then the value of nr_tags will not be greater than map_depth, so no additional comparison is required. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/1661483653-27326-1-git-send-email-liusong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit ddbfc34) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Uros Bizjak <ubizjak@gmail.com> commit c35227d Use atomic_long_try_cmpxchg instead of atomic_long_cmpxchg (*ptr, old, new) == old in __sbitmap_queue_get_batch. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_long_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications, e.g. an extra memory read can be avoided in the loop. No functional change intended. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20220908151200.9993-1-ubizjak@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit c35227d) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
…t_shallow jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kemeng Shi <shikemeng@huaweicloud.com> commit f1591a8 Updates to alloc_hint in the loop in __sbitmap_get_shallow() are mostly pointless and equivalent to setting alloc_hint to zero (because SB_NR_TO_BIT() considers only low sb->shift bits from alloc_hint). So simplify the logic. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-2-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit f1591a8) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kemeng Shi <shikemeng@huaweicloud.com> commit 903e86f Commit fbb564a ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") mentioned that "Checking free bits when setting the target bits. Otherwise, it may reuse the busying bits." This commit add check to make sure all masked bits in word before cmpxchg is zero. Then the existing check after cmpxchg to check any zero bit is existing in masked bits in word is redundant. Actually, old value of word before cmpxchg is stored in val and we will filter out busy bits in val by "(get_mask & ~val)" after cmpxchg. So we will not reuse busy bits methioned in commit fbb564a ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert new-added check to remove redundant check. Fixes: fbb564a ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 903e86f) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Ming Lei <ming.lei@redhat.com> commit 65f666c Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/65f666c6.failed When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run with preempt disabled. In RT kernel, spin_lock() can sleep, then warning of "BUG: sleeping function called from invalid context" can be triggered. Fix it by replacing it with raw_spin_lock. Cc: Yang Yang <yang.yang@vivo.com> Fixes: 72d04bd ("sbitmap: fix io hung due to race on sbitmap_word::cleared") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yang Yang <yang.yang@vivo.com> Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 65f666c) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # include/linux/sbitmap.h # lib/sbitmap.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Yu Kuai <yukuai3@huawei.com> commit 4f1731d Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/4f1731df.failed In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating 'wake_batch' is not atomic: t1: t2: _blk_mq_tag_busy blk_mq_tag_busy inc active_queues // assume 1->2 inc active_queues // 2 -> 3 blk_mq_update_wake_batch // calculate based on 3 blk_mq_update_wake_batch /* calculate based on 2, while active_queues is actually 3. */ Fix this problem by protecting them wih 'tags->lock', this is not a hot path, so performance should not be concerned. And now that all writers are inside the lock, switch 'actives_queues' from atomic to unsigned int. Fixes: 180dccb ("blk-mq: fix tag_get wait task can't be awakened") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 4f1731d) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # block/blk-mq-tag.c # block/blk-mq.h # include/linux/blk-mq.h
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Li Lingfeng <lilingfeng3@huawei.com> commit b313a8c Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/b313a8c8.failed Lockdep reported a warning in Linux version 6.6: [ 414.344659] ================================ [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.356863] scsi_io_completion+0x177/0x1610 [ 414.357379] scsi_complete+0x12f/0x260 [ 414.357856] blk_complete_reqs+0xba/0xf0 [ 414.358338] __do_softirq+0x1b0/0x7a2 [ 414.358796] irq_exit_rcu+0x14b/0x1a0 [ 414.359262] sysvec_call_function_single+0xaf/0xc0 [ 414.359828] asm_sysvec_call_function_single+0x1a/0x20 [ 414.360426] default_idle+0x1e/0x30 [ 414.360873] default_idle_call+0x9b/0x1f0 [ 414.361390] do_idle+0x2d2/0x3e0 [ 414.361819] cpu_startup_entry+0x55/0x60 [ 414.362314] start_secondary+0x235/0x2b0 [ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b [ 414.363413] irq event stamp: 428794 [ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200 [ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50 [ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2 [ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0 [ 414.367425] other info that might help us debug this: [ 414.368194] Possible unsafe locking scenario: [ 414.368900] CPU0 [ 414.369225] ---- [ 414.369548] lock(&sbq->ws[i].wait); [ 414.370000] <Interrupt> [ 414.370342] lock(&sbq->ws[i].wait); [ 414.370802] *** DEADLOCK *** [ 414.371569] 5 locks held by kworker/u10:3/1152: [ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0 [ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0 [ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00 [ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0 [ 414.378607] stack backtrace: [ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6 [ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 414.381177] Workqueue: writeback wb_workfn (flush-253:0) [ 414.381805] Call Trace: [ 414.382136] <TASK> [ 414.382429] dump_stack_lvl+0x91/0xf0 [ 414.382884] mark_lock_irq+0xb3b/0x1260 [ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10 [ 414.383889] ? stack_trace_save+0x8e/0xc0 [ 414.384373] ? __pfx_stack_trace_save+0x10/0x10 [ 414.384903] ? graph_lock+0xcf/0x410 [ 414.385350] ? save_trace+0x3d/0xc70 [ 414.385808] mark_lock.part.20+0x56d/0xa90 [ 414.386317] mark_held_locks+0xb0/0x110 [ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50 [ 414.388422] trace_hardirqs_on+0x58/0x100 [ 414.388917] _raw_spin_unlock_irq+0x28/0x50 [ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0 [ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0 [ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0 [ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 [ 414.392070] ? sbitmap_get+0x2b8/0x450 [ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0 [ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690 [ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420 [ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 [ 414.394970] ? lock_acquire+0x18d/0x460 [ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00 [ 414.395986] ? __pfx_lock_acquire+0x10/0x10 [ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190 [ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00 [ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030 [ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10 [ 414.398897] ? writeback_sb_inodes+0x241/0xcc0 [ 414.399429] blk_mq_flush_plug_list+0x65/0x80 [ 414.399957] __blk_flush_plug+0x2f1/0x530 [ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10 [ 414.400999] blk_finish_plug+0x59/0xa0 [ 414.401467] wb_writeback+0x7cc/0x920 [ 414.401935] ? __pfx_wb_writeback+0x10/0x10 [ 414.402442] ? mark_held_locks+0xb0/0x110 [ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.404062] wb_workfn+0x2b3/0xcf0 [ 414.404500] ? __pfx_wb_workfn+0x10/0x10 [ 414.404989] process_scheduled_works+0x432/0x13f0 [ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10 [ 414.406139] ? do_raw_spin_lock+0x101/0x2a0 [ 414.406641] ? assign_work+0x19b/0x240 [ 414.407106] ? lock_is_held_type+0x9d/0x110 [ 414.407604] worker_thread+0x6f2/0x1160 [ 414.408075] ? __kthread_parkme+0x62/0x210 [ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.409168] ? __kthread_parkme+0x13c/0x210 [ 414.409678] ? __pfx_worker_thread+0x10/0x10 [ 414.410191] kthread+0x33c/0x440 [ 414.410602] ? __pfx_kthread+0x10/0x10 [ 414.411068] ret_from_fork+0x4d/0x80 [ 414.411526] ? __pfx_kthread+0x10/0x10 [ 414.411993] ret_from_fork_asm+0x1b/0x30 [ 414.412489] </TASK> When interrupt is turned on while a lock holding by spin_lock_irq it throws a warning because of potential deadlock. blk_mq_prep_dispatch_rq blk_mq_get_driver_tag __blk_mq_get_driver_tag __blk_mq_alloc_driver_tag blk_mq_tag_busy -> tag is already busy // failed to get driver tag blk_mq_mark_tag_wait spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait) __add_wait_queue(wq, wait) -> wait queue active blk_mq_get_driver_tag __blk_mq_tag_busy -> 1) tag must be idle, which means there can't be inflight IO spin_lock_irq(&tags->lock) -> lock B (hctx->tags) spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally -> 2) context must be preempt by IO interrupt to trigger deadlock. As shown above, the deadlock is not possible in theory, but the warning still need to be fixed. Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq. Fixes: 4f1731d ("blk-mq: fix potential io hang by wrong 'wake_batch'") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit b313a8c) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # block/blk-mq-tag.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kemeng Shi <shikemeng@huaweicloud.com> commit 5c7fa5c After commit 1063973 ("sbitmap: fix batching wakeup"), we may wake up more than one queue for each batch. Just remove stale comment that we wake up only one queue for each batch. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20240115145626.665562-1-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 5c7fa5c) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-38464 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Kuniyuki Iwashima <kuniyu@google.com> commit 667eeab syzbot reported a null-ptr-deref in tipc_conn_close() during netns dismantle. [0] tipc_topsrv_stop() iterates tipc_net(net)->topsrv->conn_idr and calls tipc_conn_close() for each tipc_conn. The problem is that tipc_conn_close() is called after releasing the IDR lock. At the same time, there might be tipc_conn_recv_work() running and it could call tipc_conn_close() for the same tipc_conn and release its last ->kref. Once we release the IDR lock in tipc_topsrv_stop(), there is no guarantee that the tipc_conn is alive. Let's hold the ref before releasing the lock and put the ref after tipc_conn_close() in tipc_topsrv_stop(). [0]: BUG: KASAN: use-after-free in tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165 Read of size 8 at addr ffff888099305a08 by task kworker/u4:3/435 CPU: 0 PID: 435 Comm: kworker/u4:3 Not tainted 4.19.204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1fc/0x2ef lib/dump_stack.c:118 print_address_description.cold+0x54/0x219 mm/kasan/report.c:256 kasan_report_error.cold+0x8a/0x1b9 mm/kasan/report.c:354 kasan_report mm/kasan/report.c:412 [inline] __asan_report_load8_noabort+0x88/0x90 mm/kasan/report.c:433 tipc_conn_close+0x122/0x140 net/tipc/topsrv.c:165 tipc_topsrv_stop net/tipc/topsrv.c:701 [inline] tipc_topsrv_exit_net+0x27b/0x5c0 net/tipc/topsrv.c:722 ops_exit_list+0xa5/0x150 net/core/net_namespace.c:153 cleanup_net+0x3b4/0x8b0 net/core/net_namespace.c:553 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Allocated by task 23: kmem_cache_alloc_trace+0x12f/0x380 mm/slab.c:3625 kmalloc include/linux/slab.h:515 [inline] kzalloc include/linux/slab.h:709 [inline] tipc_conn_alloc+0x43/0x4f0 net/tipc/topsrv.c:192 tipc_topsrv_accept+0x1b5/0x280 net/tipc/topsrv.c:470 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 23: __cache_free mm/slab.c:3503 [inline] kfree+0xcc/0x210 mm/slab.c:3822 tipc_conn_kref_release net/tipc/topsrv.c:150 [inline] kref_put include/linux/kref.h:70 [inline] conn_put+0x2cd/0x3a0 net/tipc/topsrv.c:155 process_one_work+0x864/0x1570 kernel/workqueue.c:2153 worker_thread+0x64c/0x1130 kernel/workqueue.c:2296 kthread+0x33f/0x460 kernel/kthread.c:259 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 The buggy address belongs to the object at ffff888099305a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 8 bytes inside of 512-byte region [ffff888099305a00, ffff888099305c00) The buggy address belongs to the page: page:ffffea000264c140 count:1 mapcount:0 mapping:ffff88813bff0940 index:0x0 flags: 0xfff00000000100(slab) raw: 00fff00000000100 ffffea00028b6b88 ffffea0002cd2b08 ffff88813bff0940 raw: 0000000000000000 ffff888099305000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888099305900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888099305980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888099305a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888099305a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888099305b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: c5fa7b3 ("tipc: introduce new TIPC server infrastructure") Reported-by: syzbot+d333febcf8f4bc5f6110@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=27169a847a70550d17be Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20250702014350.692213-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 667eeab) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Arvind Sankar <nivedita@alum.mit.edu> commit 7dde67f Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/7dde67f2.failed Add support for the x86 CMDLINE_BOOL and CMDLINE_OVERRIDE configuration options. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200430182843.2510180-11-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 7dde67f) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # drivers/firmware/efi/libstub/x86-stub.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Arvind Sankar <nivedita@alum.mit.edu> commit 055042b Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/055042be.failed efi_parse_options can fail if it is unable to allocate space for a copy of the command line. Check the return value to make sure it succeeded. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200430182843.2510180-12-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 055042b) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # drivers/firmware/efi/libstub/arm-stub.c # drivers/firmware/efi/libstub/x86-stub.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Ard Biesheuvel <ardb@kernel.org> commit 15aa8fb Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/15aa8fb8.failed The legacy decompressor has elaborate logic to ensure that the randomized physical placement of the decompressed kernel image does not conflict with any memory reservations, including ones specified on the command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are taken into account by the kernel proper at a later stage. When booting in EFI mode, it is the firmware's job to ensure that the chosen range does not conflict with any memory reservations that it knows about, and this is trivially achieved by using the firmware's memory allocation APIs. That leaves reservations specified on the command line, though, which the firmware knows nothing about, as these regions have no other special significance to the platform. Since commit a1b87d5 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") these reservations are not taken into account when randomizing the physical placement, which may result in conflicts where the memory cannot be reserved by the kernel proper because its own executable image resides there. To avoid having to duplicate or reuse the existing complicated logic, disable physical KASLR entirely when such overrides are specified. These are mostly diagnostic tools or niche features, and physical KASLR (as opposed to virtual KASLR, which is much more important as it affects the memory addresses observed by code executing in the kernel) is something we can live without. Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com Reported-by: Ben Chaney <bchaney@akamai.com> Fixes: a1b87d5 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 15aa8fb) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # arch/x86/boot/compressed/eboot.c
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Matthew Wilcox (Oracle) <willy@infradead.org> commit b0b598e The original problem of the overly long list of waiters on a locked page was solved properly by commit 9a1ea43 ("mm: put_and_wait_on_page_locked() while page is migrated"). In the meantime, using bookmarks for the writeback bit can cause livelocks, so we need to stop using them. Link: https://lkml.kernel.org/r/20231010035829.544242-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Bin Lai <sclaibin@gmail.com> Cc: Benjamin Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit b0b598e) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Victor Nogueira <victor@mojatatu.com> commit ffdde7b Lion's patch [1] revealed an ancient bug in the qdisc API. Whenever a user creates/modifies a qdisc specifying as a parent another qdisc, the qdisc API will, during grafting, detect that the user is not trying to attach to a class and reject. However grafting is performed after qdisc_create (and thus the qdiscs' init callback) is executed. In qdiscs that eventually call qdisc_tree_reduce_backlog during init or change (such as fq, hhf, choke, etc), an issue arises. For example, executing the following commands: sudo tc qdisc add dev lo root handle a: htb default 2 sudo tc qdisc add dev lo parent a: handle beef fq Qdiscs such as fq, hhf, choke, etc unconditionally invoke qdisc_tree_reduce_backlog() in their control path init() or change() which then causes a failure to find the child class; however, that does not stop the unconditional invocation of the assumed child qdisc's qlen_notify with a null class. All these qdiscs make the assumption that class is non-null. The solution is ensure that qdisc_leaf() which looks up the parent class, and is invoked prior to qdisc_create(), should return failure on not finding the class. In this patch, we leverage qdisc_leaf to return ERR_PTRs whenever the parentid doesn't correspond to a class, so that we can detect it earlier on and abort before qdisc_create is called. [1] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Fixes: 5e50da0 ("[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs") Reported-by: syzbot+d8b58d7b0ad89a678a16@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68663c93.a70a0220.5d25f.0857.GAE@google.com/ Reported-by: syzbot+5eccb463fa89309d8bdc@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68663c94.a70a0220.5d25f.0858.GAE@google.com/ Reported-by: syzbot+1261670bbdefc5485a06@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0013.GAE@google.com/ Reported-by: syzbot+15b96fc3aac35468fe77@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/686764a5.a00a0220.c7b3.0014.GAE@google.com/ Reported-by: syzbot+4dadc5aecf80324d5a51@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68679e81.a70a0220.29cf51.0016.GAE@google.com/ Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250707210801.372995-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit ffdde7b) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Anumula Murali Mohan Reddy <anumula@chelsio.com> commit 356983f t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl() uses port number to get mac addr, this leads to error when an attempt to set MAC address on VF's of PF2 and PF3. This patch fixes the issue by using port number to set mac address. Fixes: e0cdac6 ("cxgb4vf: configure ports accessible by the VF") Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 356983f) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-38477 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Xiang Mei <xmei5@asu.edu> commit 5e28d5a A race condition can occur when 'agg' is modified in qfq_change_agg (called during qfq_enqueue) while other threads access it concurrently. For example, qfq_dump_class may trigger a NULL dereference, and qfq_delete_class may cause a use-after-free. This patch addresses the issue by: 1. Moved qfq_destroy_class into the critical section. 2. Added sch_tree_lock protection to qfq_dump_class and qfq_dump_class_stats. Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 5e28d5a) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
… qfq_delete_class jira LE-4066 cve CVE-2025-38477 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Xiang Mei <xmei5@asu.edu> commit cf074ec might_sleep could be trigger in the atomic context in qfq_delete_class. qfq_destroy_class was moved into atomic context locked by sch_tree_lock to avoid a race condition bug on qfq_aggregate. However, might_sleep could be triggered by qfq_destroy_class, which introduced sleeping in atomic context (path: qfq_destroy_class->qdisc_put->__qdisc_destroy->lockdep_unregister_key ->might_sleep). Considering the race is on the qfq_aggregate objects, keeping qfq_rm_from_agg in the lock but moving the left part out can solve this issue. Fixes: 5e28d5a ("net/sched: sch_qfq: Fix race condition on qfq_aggregate") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/4a04e0cc-a64b-44e7-9213-2880ed641d77@sabinyo.mountain Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/20250717230128.159766-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit cf074ec) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2024-42285 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Bart Van Assche <bvanassche@acm.org> commit aee2424 iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with an existing struct iw_cm_id (cm_id) as follows: conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make sure that cm_work_handler() does not trigger a use-after-free by only freeing of the struct rdma_id_private after all pending work has finished. Cc: stable@vger.kernel.org Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref") Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit aee2424) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2024-47696 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Zhu Yanjun <yanjun.zhu@linux.dev> commit 86dfdd8 In the commit aee2424 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs"), the function flush_workqueue is invoked to flush the work queue iwcm_wq. But at that time, the work queue iwcm_wq was created via the function alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM. Because the current process is trying to flush the whole iwcm_wq, if iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current process is not reclaiming memory or running on a workqueue which doesn't have the flag WQ_MEM_RECLAIM as that can break forward-progress guarantee leading to a deadlock. The call trace is as below: [ 125.350876][ T1430] Call Trace: [ 125.356281][ T1430] <TASK> [ 125.361285][ T1430] ? __warn (kernel/panic.c:693) [ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9)) [ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239) [ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) [ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9)) [ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9)) [ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970) [ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151) [ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm [ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910) [ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161) [ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm [ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma [ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma [ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231) [ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393) [ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339) [ 125.531837][ T1430] kthread (kernel/kthread.c:389) [ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342) [ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147) [ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342) [ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 125.566487][ T1430] </TASK> [ 125.566488][ T1430] ---[ end trace 0000000000000000 ]--- Fixes: aee2424 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs") Link: https://patch.msgid.link/r/20240820113336.19860-1-yanjun.zhu@linux.dev Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> (cherry picked from commit 86dfdd8) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-38211 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> commit 6883b68 The commit 59c68ac ("iw_cm: free cm_id resources on the last deref") simplified cm_id resource management by freeing cm_id once all references to the cm_id were removed. The references are removed either upon completion of iw_cm event handlers or when the application destroys the cm_id. This commit introduced the use-after-free condition where cm_id_private object could still be in use by event handler works during the destruction of cm_id. The commit aee2424 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs") addressed this use-after- free by flushing all pending works at the cm_id destruction. However, still another use-after-free possibility remained. It happens with the work objects allocated for each cm_id_priv within alloc_work_entries() during cm_id creation, and subsequently freed in dealloc_work_entries() once all references to the cm_id are removed. If the cm_id's last reference is decremented in the event handler work, the work object for the work itself gets removed, and causes the use- after-free BUG below: BUG: KASAN: slab-use-after-free in __pwq_activate_work+0x1ff/0x250 Read of size 8 at addr ffff88811f9cf800 by task kworker/u16:1/147091 CPU: 2 UID: 0 PID: 147091 Comm: kworker/u16:1 Not tainted 6.15.0-rc2+ #27 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Workqueue: 0x0 (iw_cm_wq) Call Trace: <TASK> dump_stack_lvl+0x6a/0x90 print_report+0x174/0x554 ? __virt_addr_valid+0x208/0x430 ? __pwq_activate_work+0x1ff/0x250 kasan_report+0xae/0x170 ? __pwq_activate_work+0x1ff/0x250 __pwq_activate_work+0x1ff/0x250 pwq_dec_nr_in_flight+0x8c5/0xfb0 process_one_work+0xc11/0x1460 ? __pfx_process_one_work+0x10/0x10 ? assign_work+0x16c/0x240 worker_thread+0x5ef/0xfd0 ? __pfx_worker_thread+0x10/0x10 kthread+0x3b0/0x770 ? __pfx_kthread+0x10/0x10 ? rcu_is_watching+0x11/0xb0 ? _raw_spin_unlock_irq+0x24/0x50 ? rcu_is_watching+0x11/0xb0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 147416: kasan_save_stack+0x2c/0x50 kasan_save_track+0x10/0x30 __kasan_kmalloc+0xa6/0xb0 alloc_work_entries+0xa9/0x260 [iw_cm] iw_cm_connect+0x23/0x4a0 [iw_cm] rdma_connect_locked+0xbfd/0x1920 [rdma_cm] nvme_rdma_cm_handler+0x8e5/0x1b60 [nvme_rdma] cma_cm_event_handler+0xae/0x320 [rdma_cm] cma_work_handler+0x106/0x1b0 [rdma_cm] process_one_work+0x84f/0x1460 worker_thread+0x5ef/0xfd0 kthread+0x3b0/0x770 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 Freed by task 147091: kasan_save_stack+0x2c/0x50 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x60 __kasan_slab_free+0x4b/0x70 kfree+0x13a/0x4b0 dealloc_work_entries+0x125/0x1f0 [iw_cm] iwcm_deref_id+0x6f/0xa0 [iw_cm] cm_work_handler+0x136/0x1ba0 [iw_cm] process_one_work+0x84f/0x1460 worker_thread+0x5ef/0xfd0 kthread+0x3b0/0x770 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 Last potentially related work creation: kasan_save_stack+0x2c/0x50 kasan_record_aux_stack+0xa3/0xb0 __queue_work+0x2ff/0x1390 queue_work_on+0x67/0xc0 cm_event_handler+0x46a/0x820 [iw_cm] siw_cm_upcall+0x330/0x650 [siw] siw_cm_work_handler+0x6b9/0x2b20 [siw] process_one_work+0x84f/0x1460 worker_thread+0x5ef/0xfd0 kthread+0x3b0/0x770 ret_from_fork+0x30/0x70 ret_from_fork_asm+0x1a/0x30 This BUG is reproducible by repeating the blktests test case nvme/061 for the rdma transport and the siw driver. To avoid the use-after-free of cm_id_private work objects, ensure that the last reference to the cm_id is decremented not in the event handler works, but in the cm_id destruction context. For that purpose, move iwcm_deref_id() call from destroy_cm_id() to the callers of destroy_cm_id(). In iw_destroy_cm_id(), call iwcm_deref_id() after flushing the pending works. During the fix work, I noticed that iw_destroy_cm_id() is called from cm_work_handler() and process_event() context. However, the comment of iw_destroy_cm_id() notes that the function "cannot be called by the event thread". Drop the false comment. Closes: https://lore.kernel.org/linux-rdma/r5676e754sv35aq7cdsqrlnvyhiq5zktteaurl7vmfih35efko@z6lay7uypy3c/ Fixes: 59c68ac ("iw_cm: free cm_id resources on the last deref") Cc: stable@vger.kernel.org Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://patch.msgid.link/20250510101036.1756439-1-shinichiro.kawasaki@wdc.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 6883b68) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
… counter jira LE-4066 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Thomas Gleixner <tglx@linutronix.de> commit f944ffc For systems on which the performance counter can expire early due to turbo modes the watchdog handler has a safety net in place which validates that since the last watchdog event there has at least 4/5th of the watchdog period elapsed. This works reliably only after the first watchdog event because the per CPU variable which holds the timestamp of the last event is never initialized. So a first spurious event will validate against a timestamp of 0 which results in a delta which is likely to be way over the 4/5 threshold of the period. As this might happen before the first watchdog hrtimer event increments the watchdog counter, this can lead to false positives. Fix this by initializing the timestamp before enabling the hardware event. Reset the rearm counter as well, as that might be non zero after the watchdog was disabled and reenabled. Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx Fixes: 7edaeb6 ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit f944ffc) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
jira LE-4066 cve CVE-2025-38332 Rebuild_History Non-Buildable kernel-4.18.0-553.72.1.el8_10 commit-author Daniel Wagner <wagi@kernel.org> commit ae82eaf The strlcat() with FORTIFY support is triggering a panic because it thinks the target buffer will overflow although the correct target buffer size is passed in. Anyway, instead of memset() with 0 followed by a strlcat(), just use memcpy() and ensure that the resulting buffer is NULL terminated. BIOSVersion is only used for the lpfc_printf_log() which expects a properly terminated string. Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20250409-fix-lpfc-bios-str-v1-1-05dac9e51e13@kernel.org Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit ae82eaf) Signed-off-by: Jonathan Maple <jmaple@ciq.com>
Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..kernel-mainline: 567023 Number of commits in rpm: 61 Number of commits matched with upstream: 54 (88.52%) Number of commits in upstream but not in rpm: 566969 Number of commits NOT found in upstream: 7 (11.48%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.72.1.el8_10 for kernel-4.18.0-553.72.1.el8_10 Clean Cherry Picks: 35 (64.81%) Empty Cherry Picks: 19 (35.19%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.72.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA
jdieter
approved these changes
Sep 4, 2025
bmastbergen
approved these changes
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥌
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
General Process:
src.rpm
4.18.0-553
git cherry-pick
rpmbuild -bp
from corresponding src.rpm.Checking Rebuild Commits for Potentially missing commits:
kernel-4.18.0-553.72.1.el8_10
Build
KselfTest