-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[syzbot] KMSAN: uninit-value in hci_event_packet (3) #2
Comments
BluezTestBot
pushed a commit
that referenced
this issue
Jun 30, 2021
syzbot complained in neigh_reduce(), because rcu_read_lock_bh() is treated differently than rcu_read_lock() WARNING: suspicious RCU usage 5.13.0-rc6-syzkaller #0 Not tainted ----------------------------- include/net/addrconf.h:313 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/0:0/5: #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline] #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2247 #1: ffffc90000ca7da8 ((work_completion)(&port->wq)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2251 #2: ffffffff8bf795c0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1da/0x3130 net/core/dev.c:4180 stack backtrace: CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.13.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events ipvlan_process_multicast Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 __in6_dev_get include/net/addrconf.h:313 [inline] __in6_dev_get include/net/addrconf.h:311 [inline] neigh_reduce drivers/net/vxlan.c:2167 [inline] vxlan_xmit+0x34d5/0x4c30 drivers/net/vxlan.c:2919 __netdev_start_xmit include/linux/netdevice.h:4944 [inline] netdev_start_xmit include/linux/netdevice.h:4958 [inline] xmit_one net/core/dev.c:3654 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3670 __dev_queue_xmit+0x2133/0x3130 net/core/dev.c:4246 ipvlan_process_multicast+0xa99/0xd70 drivers/net/ipvlan/ipvlan_core.c:287 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422 kthread+0x3b1/0x4a0 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Fixes: f564f45 ("vxlan: add ipv6 proxy support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Jun 30, 2021
Patch was based on wrong presumption that be_poll can be called only from bh context. It reintroducing old regression (also reverted) and causing deadlock when we use netconsole with benet in bonding. Old revert: commit 072a9c4 ("netpoll: revert 6bdb7fe and fix be_poll() instead") [ 331.269715] bond0: (slave enp0s7f0): Releasing backup interface [ 331.270121] CPU: 4 PID: 1479 Comm: ifenslave Not tainted 5.13.0-rc7+ #2 [ 331.270122] Call Trace: [ 331.270122] [c00000001789f200] [c0000000008c505c] dump_stack+0x100/0x174 (unreliable) [ 331.270124] [c00000001789f240] [c008000001238b9c] be_poll+0x64/0xe90 [be2net] [ 331.270125] [c00000001789f330] [c000000000d1e6e4] netpoll_poll_dev+0x174/0x3d0 [ 331.270127] [c00000001789f400] [c008000001bc167c] bond_poll_controller+0xb4/0x130 [bonding] [ 331.270128] [c00000001789f450] [c000000000d1e624] netpoll_poll_dev+0xb4/0x3d0 [ 331.270129] [c00000001789f520] [c000000000d1ed88] netpoll_send_skb+0x448/0x470 [ 331.270130] [c00000001789f5d0] [c0080000011f14f8] write_msg+0x180/0x1b0 [netconsole] [ 331.270131] [c00000001789f640] [c000000000230c0c] console_unlock+0x54c/0x790 [ 331.270132] [c00000001789f7b0] [c000000000233098] vprintk_emit+0x2d8/0x450 [ 331.270133] [c00000001789f810] [c000000000234758] vprintk+0xc8/0x270 [ 331.270134] [c00000001789f850] [c000000000233c28] printk+0x40/0x54 [ 331.270135] [c00000001789f870] [c000000000ccf908] __netdev_printk+0x150/0x198 [ 331.270136] [c00000001789f910] [c000000000ccfdb4] netdev_info+0x68/0x94 [ 331.270137] [c00000001789f950] [c008000001bcbd70] __bond_release_one+0x188/0x6b0 [bonding] [ 331.270138] [c00000001789faa0] [c008000001bcc6f4] bond_do_ioctl+0x42c/0x490 [bonding] [ 331.270139] [c00000001789fb60] [c000000000d0d17c] dev_ifsioc+0x17c/0x400 [ 331.270140] [c00000001789fbc0] [c000000000d0db70] dev_ioctl+0x390/0x890 [ 331.270141] [c00000001789fc10] [c000000000c7c76c] sock_do_ioctl+0xac/0x1b0 [ 331.270142] [c00000001789fc90] [c000000000c7ffac] sock_ioctl+0x31c/0x6e0 [ 331.270143] [c00000001789fd60] [c0000000005b9728] sys_ioctl+0xf8/0x150 [ 331.270145] [c00000001789fdb0] [c0000000000336c0] system_call_exception+0x160/0x2f0 [ 331.270146] [c00000001789fe10] [c00000000000d35c] system_call_common+0xec/0x278 [ 331.270147] --- interrupt: c00 at 0x7fffa6c6ec00 [ 331.270147] NIP: 00007fffa6c6ec00 LR: 0000000105c4185c CTR: 0000000000000000 [ 331.270148] REGS: c00000001789fe80 TRAP: 0c00 Not tainted (5.13.0-rc7+) [ 331.270148] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000 [ 331.270155] IRQMASK: 0 [ 331.270156] GPR00: 0000000000000036 00007fffd494d5b0 00007fffa6d57100 0000000000000003 [ 331.270158] GPR04: 0000000000008991 00007fffd494d6d0 0000000000000008 00007fffd494f28c [ 331.270161] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000 [ 331.270164] GPR12: 0000000000000000 00007fffa6dfa220 0000000000000000 0000000000000000 [ 331.270167] GPR16: 0000000105c44880 0000000000000000 0000000105c60088 0000000105c60318 [ 331.270170] GPR20: 0000000105c602c0 0000000105c44560 0000000000000000 0000000000000000 [ 331.270172] GPR24: 00007fffd494dc50 00007fffd494d6a8 0000000105c60008 00007fffd494d6d0 [ 331.270175] GPR28: 00007fffd494f27e 0000000105c6026c 00007fffd494f284 0000000000000000 [ 331.270178] NIP [00007fffa6c6ec00] 0x7fffa6c6ec00 [ 331.270178] LR [0000000105c4185c] 0x105c4185c [ 331.270179] --- interrupt: c00 This reverts commit d0d006a. Fixes: d0d006a ("be2net: disable bh with spin_lock in be_process_mcc") Signed-off-by: Petr Oros <poros@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Aug 27, 2021
In a future patch, calls to bh_lock_sock in sco.c should be replaced by lock_sock now that none of the functions are run in IRQ context. However, doing so results in a circular locking dependency: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc4-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.2/14867 is trying to acquire lock: ffff88803e3c1120 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1613 [inline] ffff88803e3c1120 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_conn_del+0x12a/0x2a0 net/bluetooth/sco.c:191 but task is already holding lock: ffffffff8d2dc7c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_disconn_cfm include/net/bluetooth/hci_core.h:1497 [inline] ffffffff8d2dc7c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_hash_flush+0xda/0x260 net/bluetooth/hci_conn.c:1608 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (hci_cb_list_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:959 [inline] __mutex_lock+0x12a/0x10a0 kernel/locking/mutex.c:1104 hci_connect_cfm include/net/bluetooth/hci_core.h:1482 [inline] hci_remote_features_evt net/bluetooth/hci_event.c:3263 [inline] hci_event_packet+0x2f4d/0x7c50 net/bluetooth/hci_event.c:6240 hci_rx_work+0x4f8/0xd30 net/bluetooth/hci_core.c:5122 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 kthread+0x3e5/0x4d0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 -> #1 (&hdev->lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:959 [inline] __mutex_lock+0x12a/0x10a0 kernel/locking/mutex.c:1104 sco_connect net/bluetooth/sco.c:245 [inline] sco_sock_connect+0x227/0xa10 net/bluetooth/sco.c:601 __sys_connect_file+0x155/0x1a0 net/socket.c:1879 __sys_connect+0x161/0x190 net/socket.c:1896 __do_sys_connect net/socket.c:1906 [inline] __se_sys_connect net/socket.c:1903 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1903 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3051 [inline] check_prevs_add kernel/locking/lockdep.c:3174 [inline] validate_chain kernel/locking/lockdep.c:3789 [inline] __lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_nested+0xca/0x120 net/core/sock.c:3170 lock_sock include/net/sock.h:1613 [inline] sco_conn_del+0x12a/0x2a0 net/bluetooth/sco.c:191 sco_disconn_cfm+0x71/0xb0 net/bluetooth/sco.c:1202 hci_disconn_cfm include/net/bluetooth/hci_core.h:1500 [inline] hci_conn_hash_flush+0x127/0x260 net/bluetooth/hci_conn.c:1608 hci_dev_do_close+0x528/0x1130 net/bluetooth/hci_core.c:1778 hci_unregister_dev+0x1c0/0x5a0 net/bluetooth/hci_core.c:4015 vhci_release+0x70/0xe0 drivers/bluetooth/hci_vhci.c:340 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xbd4/0x2a60 kernel/exit.c:825 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x47f/0x2160 kernel/signal.c:2808 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302 ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:288 other info that might help us debug this: Chain exists of: sk_lock-AF_BLUETOOTH-BTPROTO_SCO --> &hdev->lock --> hci_cb_list_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hci_cb_list_lock); lock(&hdev->lock); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); *** DEADLOCK *** The issue is that the lock hierarchy should go from &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO. For example, one such call trace is: hci_dev_do_close(): hci_dev_lock(); hci_conn_hash_flush(): hci_disconn_cfm(): mutex_lock(&hci_cb_list_lock); sco_disconn_cfm(): sco_conn_del(): lock_sock(sk); However, in sco_sock_connect, we call lock_sock before calling hci_dev_lock inside sco_connect, thus inverting the lock hierarchy. We fix this by pulling the call to hci_dev_lock out from sco_connect. Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
BluezTestBot
pushed a commit
that referenced
this issue
Nov 24, 2021
Add a convenience function, folio_inode() that will get the host inode from a folio's mapping. Changes: ver #3: - Fix mistake in function description[2]. ver #2: - Fix contradiction between doc and implementation by disallowing use with swap caches[1]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: kafs-testing@auristor.com Link: https://lore.kernel.org/r/YST8OcVNy02Rivbm@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/YYKLkBwQdtn4ja+i@casper.infradead.org/ [2] Link: https://lore.kernel.org/r/162880453171.3369675.3704943108660112470.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162981151155.1901565.7010079316994382707.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005744370.2472992.18324470937328925723.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584184628.4023316.9386282630968981869.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163649325519.309189.15072332908703129455.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657850401.834781.1031963517399283294.stgit@warthog.procyon.org.uk/ # v5
BluezTestBot
pushed a commit
that referenced
this issue
Nov 24, 2021
Convert the netfs helper library to use folios throughout, convert the 9p and afs filesystems to use folios in their file I/O paths and convert the ceph filesystem to use just enough folios to compile. With these changes, afs passes -g quick xfstests. Changes ======= ver #5: - Got rid of folio_end{io,_read,_write}() and inlined the stuff it does instead (Willy decided he didn't want this after all). ver #4: - Fixed a bug in afs_redirty_page() whereby it didn't set the next page index in the loop and returned too early. - Simplified a check in v9fs_vfs_write_folio_locked()[1]. - Undid a change to afs_symlink_readpage()[1]. - Used offset_in_folio() in afs_write_end()[1]. - Changed from using page_endio() to folio_end{io,_read,_write}()[1]. ver #2: - Add 9p foliation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: kafs-testing@auristor.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
BluezTestBot
pushed a commit
that referenced
this issue
Nov 24, 2021
The exit function fixes a memory leak with the src field as detected by leak sanitizer. An example of which is: Indirect leak of 25133184 byte(s) in 207 object(s) allocated from: #0 0x7f199ecfe987 in __interceptor_calloc libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x55defe638224 in annotated_source__alloc_histograms util/annotate.c:803 #2 0x55defe6397e4 in symbol__hists util/annotate.c:952 #3 0x55defe639908 in symbol__inc_addr_samples util/annotate.c:968 #4 0x55defe63aa29 in hist_entry__inc_addr_samples util/annotate.c:1119 #5 0x55defe499a79 in hist_iter__report_callback tools/perf/builtin-report.c:182 #6 0x55defe7a859d in hist_entry_iter__add util/hist.c:1236 #7 0x55defe49aa63 in process_sample_event tools/perf/builtin-report.c:315 #8 0x55defe731bc8 in evlist__deliver_sample util/session.c:1473 #9 0x55defe731e38 in machines__deliver_event util/session.c:1510 #10 0x55defe732a23 in perf_session__deliver_event util/session.c:1590 #11 0x55defe72951e in ordered_events__deliver_event util/session.c:183 #12 0x55defe740082 in do_flush util/ordered-events.c:244 #13 0x55defe7407cb in __ordered_events__flush util/ordered-events.c:323 #14 0x55defe740a61 in ordered_events__flush util/ordered-events.c:341 #15 0x55defe73837f in __perf_session__process_events util/session.c:2390 #16 0x55defe7385ff in perf_session__process_events util/session.c:2420 ... Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211112035124.94327-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
Nov 24, 2021
Often some test cases like btrfs/161 trigger lockdep splats that complain about possible unsafe lock scenario due to the fact that during mount, when reading the chunk tree we end up calling blkdev_get_by_path() while holding a read lock on a leaf of the chunk tree. That produces a lockdep splat like the following: [ 3653.683975] ====================================================== [ 3653.685148] WARNING: possible circular locking dependency detected [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted [ 3653.687239] ------------------------------------------------------ [ 3653.688400] mount/447465 is trying to acquire lock: [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.691054] but task is already holding lock: [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.693978] which lock already depends on the new lock. [ 3653.695510] the existing dependency chain (in reverse order) is: [ 3653.696915] -> #3 (btrfs-chunk-00){++++}-{3:3}: [ 3653.698053] down_read_nested+0x4b/0x140 [ 3653.698893] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.699988] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 3653.701205] btrfs_search_slot+0x537/0xc00 [btrfs] [ 3653.702234] btrfs_insert_empty_items+0x32/0x70 [btrfs] [ 3653.703332] btrfs_init_new_device+0x563/0x15b0 [btrfs] [ 3653.704439] btrfs_ioctl+0x2110/0x3530 [btrfs] [ 3653.705405] __x64_sys_ioctl+0x83/0xb0 [ 3653.706215] do_syscall_64+0x3b/0xc0 [ 3653.706990] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.708040] -> #2 (sb_internal#2){.+.+}-{0:0}: [ 3653.708994] lock_release+0x13d/0x4a0 [ 3653.709533] up_write+0x18/0x160 [ 3653.710017] btrfs_sync_file+0x3f3/0x5b0 [btrfs] [ 3653.710699] __loop_update_dio+0xbd/0x170 [loop] [ 3653.711360] lo_ioctl+0x3b1/0x8a0 [loop] [ 3653.711929] block_ioctl+0x48/0x50 [ 3653.712442] __x64_sys_ioctl+0x83/0xb0 [ 3653.712991] do_syscall_64+0x3b/0xc0 [ 3653.713519] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.714233] -> #1 (&lo->lo_mutex){+.+.}-{3:3}: [ 3653.715026] __mutex_lock+0x92/0x900 [ 3653.715648] lo_open+0x28/0x60 [loop] [ 3653.716275] blkdev_get_whole+0x28/0x90 [ 3653.716867] blkdev_get_by_dev.part.0+0x142/0x320 [ 3653.717537] blkdev_open+0x5e/0xa0 [ 3653.718043] do_dentry_open+0x163/0x390 [ 3653.718604] path_openat+0x3f0/0xa80 [ 3653.719128] do_filp_open+0xa9/0x150 [ 3653.719652] do_sys_openat2+0x97/0x160 [ 3653.720197] __x64_sys_openat+0x54/0x90 [ 3653.720766] do_syscall_64+0x3b/0xc0 [ 3653.721285] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.721986] -> #0 (&disk->open_mutex){+.+.}-{3:3}: [ 3653.722775] __lock_acquire+0x130e/0x2210 [ 3653.723348] lock_acquire+0xd7/0x310 [ 3653.723867] __mutex_lock+0x92/0x900 [ 3653.724394] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.725041] blkdev_get_by_path+0xb8/0xd0 [ 3653.725614] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.726332] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.726999] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.727739] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.728384] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.729130] legacy_get_tree+0x30/0x50 [ 3653.729676] vfs_get_tree+0x28/0xc0 [ 3653.730192] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.730800] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.731427] legacy_get_tree+0x30/0x50 [ 3653.731970] vfs_get_tree+0x28/0xc0 [ 3653.732486] path_mount+0x2d4/0xbe0 [ 3653.732997] __x64_sys_mount+0x103/0x140 [ 3653.733560] do_syscall_64+0x3b/0xc0 [ 3653.734080] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.734782] other info that might help us debug this: [ 3653.735784] Chain exists of: &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00 [ 3653.737123] Possible unsafe locking scenario: [ 3653.737865] CPU0 CPU1 [ 3653.738435] ---- ---- [ 3653.739007] lock(btrfs-chunk-00); [ 3653.739449] lock(sb_internal#2); [ 3653.740193] lock(btrfs-chunk-00); [ 3653.740955] lock(&disk->open_mutex); [ 3653.741431] *** DEADLOCK *** [ 3653.742176] 3 locks held by mount/447465: [ 3653.742739] #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0 [ 3653.744114] #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs] [ 3653.745563] #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.747066] stack backtrace: [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1 [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 3653.750592] Call Trace: [ 3653.750967] dump_stack_lvl+0x57/0x72 [ 3653.751526] check_noncircular+0xf3/0x110 [ 3653.752136] ? stack_trace_save+0x4b/0x70 [ 3653.752748] __lock_acquire+0x130e/0x2210 [ 3653.753356] lock_acquire+0xd7/0x310 [ 3653.753898] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.754596] ? lock_is_held_type+0xe8/0x140 [ 3653.755125] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.755729] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.756338] __mutex_lock+0x92/0x900 [ 3653.756794] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.757400] ? do_raw_spin_unlock+0x4b/0xa0 [ 3653.757930] ? _raw_spin_unlock+0x29/0x40 [ 3653.758437] ? bd_prepare_to_claim+0x129/0x150 [ 3653.758999] ? trace_module_get+0x2b/0xd0 [ 3653.759508] ? try_module_get.part.0+0x50/0x80 [ 3653.760072] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.760661] ? devcgroup_check_permission+0xc1/0x1f0 [ 3653.761288] blkdev_get_by_path+0xb8/0xd0 [ 3653.761797] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.762454] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.763055] ? clone_fs_devices+0x8f/0x170 [btrfs] [ 3653.763689] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.764370] ? kvm_sched_clock_read+0x14/0x40 [ 3653.764922] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.765493] ? super_setup_bdi_name+0x79/0xd0 [ 3653.766043] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.766780] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.767488] ? kfree+0x1f2/0x3c0 [ 3653.767979] legacy_get_tree+0x30/0x50 [ 3653.768548] vfs_get_tree+0x28/0xc0 [ 3653.769076] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.769718] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.770381] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.771086] ? kfree+0x1f2/0x3c0 [ 3653.771574] legacy_get_tree+0x30/0x50 [ 3653.772136] vfs_get_tree+0x28/0xc0 [ 3653.772673] path_mount+0x2d4/0xbe0 [ 3653.773201] __x64_sys_mount+0x103/0x140 [ 3653.773793] do_syscall_64+0x3b/0xc0 [ 3653.774333] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.775094] RIP: 0033:0x7f648bc45aaa This happens because through btrfs_read_chunk_tree(), which is called only during mount, ends up acquiring the mutex open_mutex of a block device while holding a read lock on a leaf of the chunk tree while other paths need to acquire other locks before locking extent buffers of the chunk tree. Since at mount time when we call btrfs_read_chunk_tree() we know that we don't have other tasks running in parallel and modifying the chunk tree, we can simply skip locking of chunk tree extent buffers. So do that and move the assertion that checks the fs is not yet mounted to the top block of btrfs_read_chunk_tree(), with a comment before doing it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
BluezTestBot
pushed a commit
that referenced
this issue
Nov 24, 2021
Ido Schimmel says: ==================== mlxsw: Various updates Patch #1 removes deadcode reported by Coverity. Patch #2 adds a shutdown method in the PCI driver to ensure the kexeced kernel starts working with a device that is in a sane state. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Jan 31, 2022
Tony Lu says: ==================== net/smc: Improvements for TCP_CORK and sendfile() Currently, SMC use default implement for syscall sendfile() [1], which is wildly used in nginx and big data sences. Usually, applications use sendfile() with TCP_CORK: fstat(20, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0 setsockopt(19, SOL_TCP, TCP_CORK, [1], 4) = 0 writev(19, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=240}], 1) = 240 sendfile(19, 20, [0] => [4096], 4096) = 4096 close(20) = 0 setsockopt(19, SOL_TCP, TCP_CORK, [0], 4) = 0 The above is an example of Nginx, when sendfile() on, Nginx first enables TCP_CORK, write headers, the data will not be sent. Then call sendfile(), it reads file and write to sndbuf. When TCP_CORK is cleared, all pending data is sent out. The performance of the default implement of sendfile is lower than when it is off. After investigation, it shows two parts to improve: - unnecessary lock contention of delayed work - less data per send than when sendfile off Patch #1 tries to reduce lock_sock() contention in smc_tx_work(). Patch #2 removes timed work for corking, and let applications control it. See TCP_CORK [2] MSG_MORE [3]. Patch #3 adds MSG_SENDPAGE_NOTLAST for corking more data when sendfile(). Test environments: - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4 - socket sndbuf / rcvbuf: 16384 / 131072 bytes - server: smc_run nginx - client: smc_run ./wrk -c 100 -t 2 -d 30 http://192.168.100.1:8080/4k.html - payload: 4KB local disk file Items QPS sendfile off 272477.10 sendfile on (orig) 223622.79 sendfile on (this) 395847.21 This benchmark shows +45.28% improvement compared with sendfile off, and +77.02% compared with original sendfile implement. [1] https://man7.org/linux/man-pages/man2/sendfile.2.html [2] https://linux.die.net/man/7/tcp [3] https://man7.org/linux/man-pages/man2/send.2.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Mar 5, 2022
…ux/kernel/git/arm64/linux Will Deacon says: ==================== On Tue, Feb 22, 2022 at 10:38:02PM +0000, Will Deacon wrote: > On Thu, 17 Feb 2022 15:22:28 +0800, Hou Tao wrote: > > Atomics support in bpf has already been done by "Atomics for eBPF" > > patch series [1], but it only adds support for x86, and this patchset > > adds support for arm64. > > > > Patch #1 & patch #2 are arm64 related. Patch #1 moves the common used > > macro AARCH64_BREAK_FAULT into insn-def.h for insn.h. Patch #2 adds > > necessary encoder helpers for atomic operations. > > > > [...] > > Applied to arm64 (for-next/insn), thanks! > > [1/4] arm64: move AARCH64_BREAK_FAULT into insn-def.h > https://git.kernel.org/arm64/c/97e58e395e9c > [2/4] arm64: insn: add encoders for atomic operations > https://git.kernel.org/arm64/c/fa1114d9eba5 Daniel -- let's give this a day or so in -next, then if nothing catches fire you're more than welcome to pull this branch as a base for the rest of the series. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220222224211.GB16976@willie-the-truck
BluezTestBot
pushed a commit
that referenced
this issue
Apr 1, 2022
The per-channel data is available directly in the driver data struct. So use it without making use of pwm_[gs]et_chip_data(). The relevant change introduced by this patch to lpc18xx_pwm_disable() at the assembler level (for an arm lpc18xx_defconfig build) is: push {r3, r4, r5, lr} mov r4, r0 mov r0, r1 mov r5, r1 bl 0 <pwm_get_chip_data> ldr r3, [r0, #0] changes to ldr r3, [r1, #8] push {r4, lr} add.w r3, r0, r3, lsl #2 ldr r3, [r3, #92] ; 0x5c So this reduces stack usage, has an improved runtime behavior because of better pipeline usage, doesn't branch to an external function and the generated code is a bit smaller occupying less memory. The codesize of lpc18xx_pwm_probe() is reduced by 32 bytes. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Adjust helper function names and comments after mass rename of struct netfs_read_*request to struct netfs_io_*request. Changes ======= ver #2) - Make the changes in the docs also. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622992433.3564931.6684311087845150271.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678196111.1200972.5001114956865989528.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692892567.2099075.13895804222087028813.stgit@warthog.procyon.org.uk/ # v3
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Pass start and len to the rreq allocator. This should ensure that the fields are set so that ->init_request() can use them. Also add a parameter to indicates the origin of the request. Ceph can use this to tell whether to get caps. Changes ======= ver #3) - Change the author to me as Jeff feels that most of the patch is my changes now. ver #2) - Show the request origin in the netfs_rreq tracepoint. Signed-off-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622989020.3564931.17517006047854958747.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678208569.1200972.12153682697842916557.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692904155.2099075.14717645623034355995.stgit@warthog.procyon.org.uk/ # v3
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Add a netfs_i_context struct that should be included in the network filesystem's own inode struct wrapper, directly after the VFS's inode struct, e.g.: struct my_inode { struct { /* These must be contiguous */ struct inode vfs_inode; struct netfs_i_context netfs_ctx; }; }; The netfs_i_context struct so far contains a single field for the network filesystem to use - the cache cookie: struct netfs_i_context { ... struct fscache_cookie *cache; }; Three functions are provided to help with this: (1) void netfs_i_context_init(struct inode *inode, const struct netfs_request_ops *ops); Initialise the netfs context and set the operations. (2) struct netfs_i_context *netfs_i_context(struct inode *inode); Find the netfs context from the VFS inode. (3) struct inode *netfs_inode(struct netfs_i_context *ctx); Find the VFS inode from the netfs context. Changes ======= ver #4) - Fix netfs_is_cache_enabled() to check cookie->cache_priv to see if a cache is present[3]. - Fix netfs_skip_folio_read() to zero out all of the page, not just some of it[3]. ver #3) - Split out the bit to move ceph cap-getting on readahead into ceph_init_request()[1]. - Stick in a comment to the netfs inode structs indicating the contiguity requirements[2]. ver #2) - Adjust documentation to match. - Use "#if IS_ENABLED()" in netfs_i_cookie(), not "#ifdef". - Move the cap check from ceph_readahead() to ceph_init_request() to be called from netfslib. - Remove ceph_readahead() and use netfs_readahead() directly instead. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/8af0d47f17d89c06bbf602496dd845f2b0bf25b3.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/beaf4f6a6c2575ed489adb14b257253c868f9a5c.camel@kernel.org/ [2] Link: https://lore.kernel.org/r/3536452.1647421585@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/164622984545.3564931.15691742939278418580.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678213320.1200972.16807551936267647470.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692909854.2099075.9535537286264248057.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/306388.1647595110@warthog.procyon.org.uk/ # v4
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Add a function to do the steps needed to begin a read request, allowing this code to be removed from several other functions and consolidated. Changes ======= ver #2) - Move before the unstaticking patch so that some functions can be left static. - Set uninitialised return code in netfs_begin_read()[1][2]. - Fixed a refleak caused by non-removal of a get from netfs_write_begin() when the request submission code got moved to netfs_begin_read(). - Use INIT_WORK() to (re-)init the request work_struct[3]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20220303163826.1120936-1-nathan@kernel.org/ [1] Link: https://lore.kernel.org/r/20220303235647.1297171-1-colin.i.king@gmail.com/ [2] Link: https://lore.kernel.org/r/9d69be49081bccff44260e4c6e0049c63d6d04a1.camel@redhat.com/ [3] Link: https://lore.kernel.org/r/164623004355.3564931.7275693529042495641.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678214287.1200972.16734134007649832160.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692911113.2099075.1060868473229451371.stgit@warthog.procyon.org.uk/ # v3
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Rename netfs_rreq_unlock() to netfs_rreq_unlock_folios() to make it sound less like it's dropping a lock on an netfs_io_request struct. Remove the 'static' marker on netfs_rreq_unlock_folios() and declaring it in internal.h preparatory to splitting the file. Changes ======= ver #2) - Slide this patch to after the one adding netfs_begin_read(). - As a consequence, don't need to unstatic so many functions. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164623002861.3564931.17340149482236413375.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678215208.1200972.9761906209395002182.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692912709.2099075.4349905992838317797.stgit@warthog.procyon.org.uk/ # v3
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Rename the read_helper.c file to io.c before splitting out the buffered read functions and some other bits. Changes ======= ver #2) - Rename read_helper.c before splitting. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164678216109.1200972.16567696909952495832.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692918076.2099075.8120961172717347610.stgit@warthog.procyon.org.uk/ # v3
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
Split fs/netfs/read_helper.c into two pieces, one to deal with buffered writes and one to deal with the I/O mechanism. Changes ======= ver #2) - Add kdoc reference to new file. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164623005586.3564931.6149556072728481767.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678217075.1200972.5101072043126828757.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692919953.2099075.7156989585513833046.stgit@warthog.procyon.org.uk/ # v3
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
We've got a mess on our hands. 1. xfs_trans_commit() cannot cancel transactions because the mount is shut down - that causes dirty, aborted, unlogged log items to sit unpinned in memory and potentially get written to disk before the log is shut down. Hence xfs_trans_commit() can only abort transactions when xlog_is_shutdown() is true. 2. xfs_force_shutdown() is used in places to cause the current modification to be aborted via xfs_trans_commit() because it may be impractical or impossible to cancel the transaction directly, and hence xfs_trans_commit() must cancel transactions when xfs_is_shutdown() is true in this situation. But we can't do that because of #1. 3. Log IO errors cause log shutdowns by calling xfs_force_shutdown() to shut down the mount and then the log from log IO completion. 4. xfs_force_shutdown() can result in a log force being issued, which has to wait for log IO completion before it will mark the log as shut down. If #3 races with some other shutdown trigger that runs a log force, we rely on xfs_force_shutdown() silently ignoring #3 and avoiding shutting down the log until the failed log force completes. 5. To ensure #2 always works, we have to ensure that xfs_force_shutdown() does not return until the the log is shut down. But in the case of #4, this will result in a deadlock because the log Io completion will block waiting for a log force to complete which is blocked waiting for log IO to complete.... So the very first thing we have to do here to untangle this mess is dissociate log shutdown triggers from mount shutdowns. We already have xlog_forced_shutdown, which will atomically transistion to the log a shutdown state. Due to internal asserts it cannot be called multiple times, but was done simply because the only place that could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!) and that could only call it once and once only. So the first thing we do is remove the asserts. We then convert all the internal log shutdown triggers to call xlog_force_shutdown() directly instead of xfs_force_shutdown(). This allows the log shutdown triggers to shut down the log without needing to care about mount based shutdown constraints. This means we shut down the log independently of the mount and the mount may not notice this until it's next attempt to read or modify metadata. At that point (e.g. xfs_trans_commit()) it will see that the log is shutdown, error out and shutdown the mount. To ensure that all the unmount behaviours and asserts track correctly as a result of a log shutdown, propagate the shutdown up to the mount if it is not already set. This keeps the mount and log state in sync, and saves a huge amount of hassle where code fails because of a log shutdown but only checks for mount shutdowns and hence ends up doing the wrong thing. Cleaning up that mess is an exercise for another day. This enables us to address the other problems noted above in followup patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
When calling smb2_ioctl_query_info() with invalid smb_query_info::flags, a NULL ptr dereference is triggered when trying to kfree() uninitialised rqst[n].rq_iov array. This also fixes leaked paths that are created in SMB2_open_init() which required SMB2_open_free() to properly free them. Here is a small C reproducer that triggers it #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <unistd.h> #include <fcntl.h> #include <sys/ioctl.h> #define die(s) perror(s), exit(1) #define QUERY_INFO 0xc018cf07 int main(int argc, char *argv[]) { int fd; if (argc < 2) exit(1); fd = open(argv[1], O_RDONLY); if (fd == -1) die("open"); if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1) die("ioctl"); close(fd); return 0; } mount.cifs //srv/share /mnt -o ... gcc repro.c && ./a.out /mnt/f0 [ 1832.124468] CIFS: VFS: \\w22-dc.zelda.test\test Invalid passthru query flags: 0x4 [ 1832.125043] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1832.125764] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 1832.126241] CPU: 3 PID: 1133 Comm: a.out Not tainted 5.17.0-rc8 #2 [ 1832.126630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 1832.127322] RIP: 0010:smb2_ioctl_query_info+0x7a3/0xe30 [cifs] [ 1832.127749] Code: 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 6c 05 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 74 24 28 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 cb 04 00 00 49 8b 3e e8 bb fc fa ff 48 89 da 48 [ 1832.128911] RSP: 0018:ffffc90000957b08 EFLAGS: 00010256 [ 1832.129243] RAX: dffffc0000000000 RBX: ffff888117e9b850 RCX: ffffffffa020580d [ 1832.129691] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a2c0 [ 1832.130137] RBP: ffff888117e9b878 R08: 0000000000000001 R09: 0000000000000003 [ 1832.130585] R10: fffffbfff4087458 R11: 0000000000000001 R12: ffff888117e9b800 [ 1832.131037] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff888117e9b8a8 [ 1832.131485] FS: 00007fcee9900740(0000) GS:ffff888151a00000(0000) knlGS:0000000000000000 [ 1832.131993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1832.132354] CR2: 00007fcee9a1ef5e CR3: 0000000114cd2000 CR4: 0000000000350ee0 [ 1832.132801] Call Trace: [ 1832.132962] <TASK> [ 1832.133104] ? smb2_query_reparse_tag+0x890/0x890 [cifs] [ 1832.133489] ? cifs_mapchar+0x460/0x460 [cifs] [ 1832.133822] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1832.134125] ? cifs_strndup_to_utf16+0x15b/0x250 [cifs] [ 1832.134502] ? lock_downgrade+0x6f0/0x6f0 [ 1832.134760] ? cifs_convert_path_to_utf16+0x198/0x220 [cifs] [ 1832.135170] ? smb2_check_message+0x1080/0x1080 [cifs] [ 1832.135545] cifs_ioctl+0x1577/0x3320 [cifs] [ 1832.135864] ? lock_downgrade+0x6f0/0x6f0 [ 1832.136125] ? cifs_readdir+0x2e60/0x2e60 [cifs] [ 1832.136468] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1832.136769] ? __rseq_handle_notify_resume+0x80b/0xbe0 [ 1832.137096] ? __up_read+0x192/0x710 [ 1832.137327] ? __ia32_sys_rseq+0xf0/0xf0 [ 1832.137578] ? __x64_sys_openat+0x11f/0x1d0 [ 1832.137850] __x64_sys_ioctl+0x127/0x190 [ 1832.138103] do_syscall_64+0x3b/0x90 [ 1832.138378] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1832.138702] RIP: 0033:0x7fcee9a253df [ 1832.138937] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 1832.140107] RSP: 002b:00007ffeba94a8a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1832.140606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcee9a253df [ 1832.141058] RDX: 00007ffeba94a910 RSI: 00000000c018cf07 RDI: 0000000000000003 [ 1832.141503] RBP: 00007ffeba94a930 R08: 00007fcee9b24db0 R09: 00007fcee9b45c4e [ 1832.141948] R10: 00007fcee9918d40 R11: 0000000000000246 R12: 00007ffeba94aa48 [ 1832.142396] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007fcee9b78000 [ 1832.142851] </TASK> [ 1832.142994] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [last unloaded: cifs] Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 5, 2022
As guest_irq is coming from KVM_IRQFD API call, it may trigger crash in svm_update_pi_irte() due to out-of-bounds: crash> bt PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8" #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace [exception RIP: svm_update_pi_irte+227] RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086 RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001 RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8 RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200 R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm] #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm] #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm] RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020 RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0 R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0 R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Vmx have been fix this in commit 3a8b067 (KVM: VMX: Do not BUG() on out-of-bounds guest IRQ), so we can just copy source from that to fix this. Co-developed-by: Yi Liu <liu.yi24@zte.com.cn> Signed-off-by: Yi Liu <liu.yi24@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 15, 2022
…e_zone btrfs_can_activate_zone() can be called with the device_list_mutex already held, which will lead to a deadlock: insert_dev_extents() // Takes device_list_mutex `-> insert_dev_extent() `-> btrfs_insert_empty_item() `-> btrfs_insert_empty_items() `-> btrfs_search_slot() `-> btrfs_cow_block() `-> __btrfs_cow_block() `-> btrfs_alloc_tree_block() `-> btrfs_reserve_extent() `-> find_free_extent() `-> find_free_extent_update_loop() `-> can_allocate_chunk() `-> btrfs_can_activate_zone() // Takes device_list_mutex again Instead of using the RCU on fs_devices->device_list we can use fs_devices->alloc_list, protected by the chunk_mutex to traverse the list of active devices. We are in the chunk allocation thread. The newer chunk allocation happens from the devices in the fs_device->alloc_list protected by the chunk_mutex. btrfs_create_chunk() lockdep_assert_held(&info->chunk_mutex); gather_device_info list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) Also, a device that reappears after the mount won't join the alloc_list yet and, it will be in the dev_list, which we don't want to consider in the context of the chunk alloc. [15.166572] WARNING: possible recursive locking detected [15.167117] 5.17.0-rc6-dennis #79 Not tainted [15.167487] -------------------------------------------- [15.167733] kworker/u8:3/146 is trying to acquire lock: [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs] [15.167733] [15.167733] but task is already holding lock: [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] [15.167733] [15.167733] other info that might help us debug this: [15.167733] Possible unsafe locking scenario: [15.167733] [15.171834] CPU0 [15.171834] ---- [15.171834] lock(&fs_devs->device_list_mutex); [15.171834] lock(&fs_devs->device_list_mutex); [15.171834] [15.171834] *** DEADLOCK *** [15.171834] [15.171834] May be due to missing lock nesting notation [15.171834] [15.171834] 5 locks held by kworker/u8:3/146: [15.171834] #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 [15.171834] #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 [15.176244] #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs] [15.176244] #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] [15.176244] #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs] [15.179641] [15.179641] stack backtrace: [15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79 [15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [15.179641] Call Trace: [15.179641] <TASK> [15.179641] dump_stack_lvl+0x45/0x59 [15.179641] __lock_acquire.cold+0x217/0x2b2 [15.179641] lock_acquire+0xbf/0x2b0 [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] __mutex_lock+0x8e/0x970 [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] ? lock_is_held_type+0xd7/0x130 [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] ? _raw_spin_unlock+0x24/0x40 [15.183838] ? btrfs_get_alloc_profile+0x106/0x230 [btrfs] [15.187601] btrfs_reserve_extent+0x131/0x260 [btrfs] [15.187601] btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs] [15.187601] __btrfs_cow_block+0x138/0x600 [btrfs] [15.187601] btrfs_cow_block+0x10f/0x230 [btrfs] [15.187601] btrfs_search_slot+0x55f/0xbc0 [btrfs] [15.187601] ? lock_is_held_type+0xd7/0x130 [15.187601] btrfs_insert_empty_items+0x2d/0x60 [btrfs] [15.187601] btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs] [15.187601] __btrfs_end_transaction+0x36/0x2a0 [btrfs] [15.192037] flush_space+0x374/0x600 [btrfs] [15.192037] ? find_held_lock+0x2b/0x80 [15.192037] ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs] [15.192037] ? lock_release+0x131/0x2b0 [15.192037] btrfs_async_reclaim_data_space+0x70/0x180 [btrfs] [15.192037] process_one_work+0x24c/0x5a0 [15.192037] worker_thread+0x4a/0x3d0 Fixes: a85f05e ("btrfs: zoned: avoid chunk allocation if active block group has enough space") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 15, 2022
Andrii Nakryiko says: ==================== Add libbpf support for USDT (User Statically-Defined Tracing) probes. USDTs is important part of tracing, and BPF, ecosystem, widely used in mission-critical production applications for observability, performance analysis, and debugging. And while USDTs themselves are pretty complicated abstraction built on top of uprobes, for end-users USDT is as natural a primitive as uprobes themselves. And thus it's important for libbpf to provide best possible user experience when it comes to build tracing applications relying on USDTs. USDTs historically presented a lot of challenges for libbpf's no compilation-on-the-fly general approach to BPF tracing. BCC utilizes power of on-the-fly source code generation and compilation using its embedded Clang toolchain, which was impractical for more lightweight and thus more rigid libbpf-based approach. But still, with enough diligence and BPF cookies it's possible to implement USDT support that feels as natural as tracing any uprobe. This patch set is the culmination of such effort to add libbpf USDT support following the spirit and philosophy of BPF CO-RE (even though it's not inherently relying on BPF CO-RE much, see patch #1 for some notes regarding this). Each respective patch has enough details and explanations, so I won't go into details here. In the end, I think the overall usability of libbpf's USDT support *exceeds* the status quo set by BCC due to the elimination of awkward runtime USDT supporting code generation. It also exceeds BCC's capabilities due to the use of BPF cookie. This eliminates the need to determine a USDT call site (and thus specifics about how exactly to fetch arguments) based on its *absolute IP address*, which is impossible with shared libraries if no PID is specified (as we then just *can't* know absolute IP at which shared library is loaded, because it might be different for each process). With BPF cookie this is not a problem as we record "call site ID" directly in a BPF cookie value. This makes it possible to do a system-wide tracing of a USDT defined in a shared library. Think about tracing some USDT in libc across any process in the system, both running at the time of attachment and all the new processes started *afterwards*. This is a very powerful capability that allows more efficient observability and tracing tooling. Once this functionality lands, the plan is to extend libbpf-bootstrap ([0]) with an USDT example. It will also become possible to start converting BCC tools that rely on USDTs to their libbpf-based counterparts ([1]). It's worth noting that preliminary version of this code was currently used and tested in production code running fleet-wide observability toolkit. Libbpf functionality is broken down into 5 mostly logically independent parts, for ease of reviewing: - patch #1 adds BPF-side implementation; - patch #2 adds user-space APIs and wires bpf_link for USDTs; - patch #3 adds the most mundate pieces: handling ELF, parsing USDT notes, dealing with memory segments, relative vs absolute addresses, etc; - patch #4 adds internal ID allocation and setting up/tearing down of BPF-side state (spec and IP-to-ID mapping); - patch #5 implements x86/x86-64-specific logic of parsing USDT argument specifications; - patch #6 adds testing of various basic aspects of handling of USDT; - patch #7 extends the set of tests with more combinations of semaphore, executable vs shared library, and PID filter options. [0] https://github.com/libbpf/libbpf-bootstrap [1] https://github.com/iovisor/bcc/tree/master/libbpf-tools v2->v3: - fix typos, leave link to systemtap doc, acks, etc (Dave); - include sys/sdt.h to avoid extra system-wide package dependencies; v1->v2: - huge high-level comment describing how all the moving parts fit together (Alan, Alexei); - switched from `__hidden __weak` to `static inline __noinline` for now, as there is a bug in BPF linker breaking final BPF object file due to invalid .BTF.ext data; I want to fix it separately at which point I'll switch back to __hidden __weak again. The fix isn't trivial, so I don't want to block on that. Same for __weak variable lookup bug that Henqi reported. - various fixes and improvements, addressing other feedback (Alan, Hengqi); Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Hengqi Chen <hengqi.chen@gmail.com> ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 15, 2022
The function does not check that parsing_end is false after parsing argument. Thus, if the final part of the argument is something like '4-', which is invalid, parse_num_list() will discard it instead of returning -EINVAL. Before: $ ./test_progs -n 2,4- #2 atomic_bounds:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED After: $ ./test_progs -n 2,4- Failed to parse test numbers. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220406003622.73539-1-ytcoode@gmail.com
BluezTestBot
pushed a commit
that referenced
this issue
Apr 15, 2022
Ido Schimmel says: ==================== net/sched: Better error reporting for offload failures This patchset improves error reporting to user space when offload fails during the flow action setup phase. That is, when failures occur in the actions themselves, even before calling device drivers. Requested / reported in [1]. This is done by passing extack to the offload_act_setup() callback and making use of it in the various actions. Patches #1-#2 change matchall and flower to log error messages to user space in accordance with the verbose flag. Patch #3 passes extack to the offload_act_setup() callback from the various call sites, including matchall and flower. Patches #4-#11 make use of extack in the various actions to report offload failures. Patch #12 adds an error message when the action does not support offload at all. Patches #13-#14 change matchall and flower to stop overwriting more specific error messages. [1] https://lore.kernel.org/netdev/20220317185249.5mff5u2x624pjewv@skbuf/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 21, 2022
…de-initialization Add callback functions for line card thermal area initialization and de-initialization. Each line card is associated with the relevant thermal area, which may contain thermal zones for cages and gearboxes found on this line card. The line card thermal initialization / de-initialization APIs are to be called when line card is set to active / inactive state by got_active() / got_inactive() callbacks from line card state machine. For example thermal zone for module #9 located at line card #7 will have type: mlxsw-lc7-module9. And thermal zone for gearbox #2 located at line card #5 will have type: mlxsw-lc5-gearbox2. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 21, 2022
Ido Schimmel says: ==================== mlxsw: Line cards status tracking When a line card is provisioned, netdevs corresponding to the ports found on the line card are registered. User space can then perform various logical configurations (e.g., splitting, setting MTU) on these netdevs. However, since the line card is not present / powered on (i.e., it is not in 'active' state), user space cannot access the various components found on the line card. For example, user space cannot read the temperature of gearboxes or transceiver modules found on the line card via hwmon / thermal. Similarly, it cannot dump the EEPROM contents of these transceiver modules. The above is only possible when the line card becomes active. This patchset solves the problem by tracking the status of each line card and invoking callbacks from interested parties when a line card becomes active / inactive. Patchset overview: Patch #1 adds the infrastructure in the line cards core that allows users to registers a set of callbacks that are invoked when a line card becomes active / inactive. To avoid races, if a line card is already active during registration, the got_active() callback is invoked. Patches #2-#3 are preparations. Patch #4 changes the port module core to register a set of callbacks with the line cards core. See detailed description with examples in the commit message. Patches #5-#6 do the same with regards to thermal / hwmon support, so that user space will be able to monitor the temperature of various components on the line card when it becomes active. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2022
There is possible circular locking dependency detected on event_mutex (see below logs). This is due to set fail safe mode is done at dp_panel_read_sink_caps() within event_mutex scope. To break this possible circular locking, this patch move setting fail safe mode out of event_mutex scope. [ 23.958078] ====================================================== [ 23.964430] WARNING: possible circular locking dependency detected [ 23.970777] 5.17.0-rc2-lockdep-00088-g05241de1f69e #148 Not tainted [ 23.977219] ------------------------------------------------------ [ 23.983570] DrmThread/1574 is trying to acquire lock: [ 23.988763] ffffff808423aab0 (&dp->event_mutex){+.+.}-{3:3}, at: msm_dp_displ ay_enable+0x58/0x164 [ 23.997895] [ 23.997895] but task is already holding lock: [ 24.003895] ffffff808420b280 (&kms->commit_lock[i]/1){+.+.}-{3:3}, at: lock_c rtcs+0x80/0x8c [ 24.012495] [ 24.012495] which lock already depends on the new lock. [ 24.012495] [ 24.020886] [ 24.020886] the existing dependency chain (in reverse order) is: [ 24.028570] [ 24.028570] -> #5 (&kms->commit_lock[i]/1){+.+.}-{3:3}: [ 24.035472] __mutex_lock+0xc8/0x384 [ 24.039695] mutex_lock_nested+0x54/0x74 [ 24.044272] lock_crtcs+0x80/0x8c [ 24.048222] msm_atomic_commit_tail+0x1e8/0x3d0 [ 24.053413] commit_tail+0x7c/0xfc [ 24.057452] drm_atomic_helper_commit+0x158/0x15c [ 24.062826] drm_atomic_commit+0x60/0x74 [ 24.067403] drm_mode_atomic_ioctl+0x6b0/0x908 [ 24.072508] drm_ioctl_kernel+0xe8/0x168 [ 24.077086] drm_ioctl+0x320/0x370 [ 24.081123] drm_compat_ioctl+0x40/0xdc [ 24.085602] __arm64_compat_sys_ioctl+0xe0/0x150 [ 24.090895] invoke_syscall+0x80/0x114 [ 24.095294] el0_svc_common.constprop.3+0xc4/0xf8 [ 24.100668] do_el0_svc_compat+0x2c/0x54 [ 24.105242] el0_svc_compat+0x4c/0xe4 [ 24.109548] el0t_32_sync_handler+0xc4/0xf4 [ 24.114381] el0t_32_sync+0x178 [ 24.118688] [ 24.118688] -> #4 (&kms->commit_lock[i]){+.+.}-{3:3}: [ 24.125408] __mutex_lock+0xc8/0x384 [ 24.129628] mutex_lock_nested+0x54/0x74 [ 24.134204] lock_crtcs+0x80/0x8c [ 24.138155] msm_atomic_commit_tail+0x1e8/0x3d0 [ 24.143345] commit_tail+0x7c/0xfc [ 24.147382] drm_atomic_helper_commit+0x158/0x15c [ 24.152755] drm_atomic_commit+0x60/0x74 [ 24.157323] drm_atomic_helper_set_config+0x68/0x90 [ 24.162869] drm_mode_setcrtc+0x394/0x648 [ 24.167535] drm_ioctl_kernel+0xe8/0x168 [ 24.172102] drm_ioctl+0x320/0x370 [ 24.176135] drm_compat_ioctl+0x40/0xdc [ 24.180621] __arm64_compat_sys_ioctl+0xe0/0x150 [ 24.185904] invoke_syscall+0x80/0x114 [ 24.190302] el0_svc_common.constprop.3+0xc4/0xf8 [ 24.195673] do_el0_svc_compat+0x2c/0x54 [ 24.200241] el0_svc_compat+0x4c/0xe4 [ 24.204544] el0t_32_sync_handler+0xc4/0xf4 [ 24.209378] el0t_32_sync+0x174/0x178 [ 24.213680] -> #3 (crtc_ww_class_mutex){+.+.}-{3:3}: [ 24.220308] __ww_mutex_lock.constprop.20+0xe8/0x878 [ 24.225951] ww_mutex_lock+0x60/0xd0 [ 24.230166] modeset_lock+0x190/0x19c [ 24.234467] drm_modeset_lock+0x34/0x54 [ 24.238953] drmm_mode_config_init+0x550/0x764 [ 24.244065] msm_drm_bind+0x170/0x59c [ 24.248374] try_to_bring_up_master+0x244/0x294 [ 24.253572] __component_add+0xf4/0x14c [ 24.258057] component_add+0x2c/0x38 [ 24.262273] dsi_dev_attach+0x2c/0x38 [ 24.266575] dsi_host_attach+0xc4/0x120 [ 24.271060] mipi_dsi_attach+0x34/0x48 [ 24.275456] devm_mipi_dsi_attach+0x28/0x68 [ 24.280298] ti_sn_bridge_probe+0x2b4/0x2dc [ 24.285137] auxiliary_bus_probe+0x78/0x90 [ 24.289893] really_probe+0x1e4/0x3d8 [ 24.294194] __driver_probe_device+0x14c/0x164 [ 24.299298] driver_probe_device+0x54/0xf8 [ 24.304043] __device_attach_driver+0xb4/0x118 [ 24.309145] bus_for_each_drv+0xb0/0xd4 [ 24.313628] __device_attach+0xcc/0x158 [ 24.318112] device_initial_probe+0x24/0x30 [ 24.322954] bus_probe_device+0x38/0x9c [ 24.327439] deferred_probe_work_func+0xd4/0xf0 [ 24.332628] process_one_work+0x2f0/0x498 [ 24.337289] process_scheduled_works+0x44/0x48 [ 24.342391] worker_thread+0x1e4/0x26c [ 24.346788] kthread+0xe4/0xf4 [ 24.350470] ret_from_fork+0x10/0x20 [ 24.354683] [ 24.354683] [ 24.354683] -> #2 (crtc_ww_class_acquire){+.+.}-{0:0}: [ 24.361489] drm_modeset_acquire_init+0xe4/0x138 [ 24.366777] drm_helper_probe_detect_ctx+0x44/0x114 [ 24.372327] check_connector_changed+0xbc/0x198 [ 24.377517] drm_helper_hpd_irq_event+0xcc/0x11c [ 24.382804] dsi_hpd_worker+0x24/0x30 [ 24.387104] process_one_work+0x2f0/0x498 [ 24.391762] worker_thread+0x1d0/0x26c [ 24.396158] kthread+0xe4/0xf4 [ 24.399840] ret_from_fork+0x10/0x20 [ 24.404053] [ 24.404053] -> #1 (&dev->mode_config.mutex){+.+.}-{3:3}: [ 24.411032] __mutex_lock+0xc8/0x384 [ 24.415247] mutex_lock_nested+0x54/0x74 [ 24.419819] dp_panel_read_sink_caps+0x23c/0x26c [ 24.425108] dp_display_process_hpd_high+0x34/0xd4 [ 24.430570] dp_display_usbpd_configure_cb+0x30/0x3c [ 24.436205] hpd_event_thread+0x2ac/0x550 [ 24.440864] kthread+0xe4/0xf4 [ 24.444544] ret_from_fork+0x10/0x20 [ 24.448757] [ 24.448757] -> #0 (&dp->event_mutex){+.+.}-{3:3}: [ 24.455116] __lock_acquire+0xe2c/0x10d8 [ 24.459690] lock_acquire+0x1ac/0x2d0 [ 24.463988] __mutex_lock+0xc8/0x384 [ 24.468201] mutex_lock_nested+0x54/0x74 [ 24.472773] msm_dp_display_enable+0x58/0x164 [ 24.477789] dp_bridge_enable+0x24/0x30 [ 24.482273] drm_atomic_bridge_chain_enable+0x78/0x9c [ 24.488006] drm_atomic_helper_commit_modeset_enables+0x1bc/0x244 [ 24.494801] msm_atomic_commit_tail+0x248/0x3d0 [ 24.499992] commit_tail+0x7c/0xfc [ 24.504031] drm_atomic_helper_commit+0x158/0x15c [ 24.509404] drm_atomic_commit+0x60/0x74 [ 24.513976] drm_mode_atomic_ioctl+0x6b0/0x908 [ 24.519079] drm_ioctl_kernel+0xe8/0x168 [ 24.523650] drm_ioctl+0x320/0x370 [ 24.527689] drm_compat_ioctl+0x40/0xdc [ 24.532175] __arm64_compat_sys_ioctl+0xe0/0x150 [ 24.537463] invoke_syscall+0x80/0x114 [ 24.541861] el0_svc_common.constprop.3+0xc4/0xf8 [ 24.547235] do_el0_svc_compat+0x2c/0x54 [ 24.551806] el0_svc_compat+0x4c/0xe4 [ 24.556106] el0t_32_sync_handler+0xc4/0xf4 [ 24.560948] el0t_32_sync+0x174/0x178 Changes in v2: -- add circular lockiing trace Fixes: d4aca42 ("drm/msm/dp: always add fail-safe mode into connector mode list") Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/481396/ Link: https://lore.kernel.org/r/1649451894-554-1-git-send-email-quic_khsieh@quicinc.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2022
The root cause is the race as follows: Thread #1 Thread #2(irq ctx) z_erofs_runqueue() struct z_erofs_decompressqueue io_A[]; submit bio A z_erofs_decompress_kickoff(,,1) z_erofs_decompressqueue_endio(bio A) z_erofs_decompress_kickoff(,,-1) spin_lock_irqsave() atomic_add_return() io_wait_event() -> pending_bios is already 0 [end of function] wake_up_locked(io_A[]) // crash Referenced backtrace in kernel 5.4: [ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4 [ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1 [ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48) [ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0) [ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c) [ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0) [ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc) [ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c) [ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc) [ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c) [ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138) [ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0) [ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4) [ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0) Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2022
into HEAD KVM/riscv fixes for 5.18, take #2 - Remove 's' & 'u' as valid ISA extension - Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
Daniel Machon says: ==================== Add support for flower actions mirred and redirect This series adds support for the two tc flower actions mirred and redirect. Both actions are implemented by means of a port mask and a mask mode. The mask mode controls how the mask is applied, and together they are used by the switch to make a forwarding decision. Both actions are configurable via the IS0 or IS2 VCAP's (ingress stage 0 and 2, respectively). Patch #1: adds support for tc flower mirred action. Patch #2: adds support for tc flower redirect action. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> ==================== Link: https://lore.kernel.org/r/20240405-mirror-redirect-actions-v2-0-875d4c1927c8@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
When reading received messages from a socket with MSG_PEEK, we may want to read the contents with an offset, like we can do with pread/preadv() when reading files. Currently, it is not possible to do that. In this commit, we add support for the SO_PEEK_OFF socket option for TCP, in a similar way it is done for Unix Domain sockets. In the iperf3 log examples shown below, we can observe a throughput improvement of 15-20 % in the direction host->namespace when using the protocol splicer 'pasta' (https://passt.top). This is a consistent result. pasta(1) and passt(1) implement user-mode networking for network namespaces (containers) and virtual machines by means of a translation layer between Layer-2 network interface and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo). Received, pending TCP data to the container/guest is kept in kernel buffers until acknowledged, so the tool routinely needs to fetch new data from socket, skipping data that was already sent. At the moment this is implemented using a dummy buffer passed to recvmsg(). With this change, we don't need a dummy buffer and the related buffer copy (copy_to_user()) anymore. passt and pasta are supported in KubeVirt and libvirt/qemu. jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f SO_PEEK_OFF not supported by kernel. jmaloy@freyr:~/passt# iperf3 -s ----------------------------------------------------------- Server listening on 5201 (test #1) ----------------------------------------------------------- Accepted connection from 192.168.122.1, port 44822 [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec [ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec [ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec [ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec [ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec [ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec [ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec [ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec [ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec [ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec [ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver ----------------------------------------------------------- Server listening on 5201 (test #2) ----------------------------------------------------------- ^Ciperf3: interrupt - the server has terminated jmaloy@freyr:~/passt# logout [ perf record: Woken up 23 times to write data ] [ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ] jmaloy@freyr:~/passt$ jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f SO_PEEK_OFF supported by kernel. jmaloy@freyr:~/passt# iperf3 -s ----------------------------------------------------------- Server listening on 5201 (test #1) ----------------------------------------------------------- Accepted connection from 192.168.122.1, port 52084 [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 52098 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 1.32 GBytes 11.3 Gbits/sec [ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec [ 5] 2.00-3.00 sec 1.26 GBytes 10.8 Gbits/sec [ 5] 3.00-4.00 sec 1.36 GBytes 11.7 Gbits/sec [ 5] 4.00-5.00 sec 1.33 GBytes 11.4 Gbits/sec [ 5] 5.00-6.00 sec 1.21 GBytes 10.4 Gbits/sec [ 5] 6.00-7.00 sec 1.31 GBytes 11.2 Gbits/sec [ 5] 7.00-8.00 sec 1.25 GBytes 10.7 Gbits/sec [ 5] 8.00-9.00 sec 1.33 GBytes 11.5 Gbits/sec [ 5] 9.00-10.00 sec 1.24 GBytes 10.7 Gbits/sec [ 5] 10.00-10.04 sec 56.0 MBytes 12.1 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.04 sec 12.9 GBytes 11.0 Gbits/sec receiver ----------------------------------------------------------- Server listening on 5201 (test #2) ----------------------------------------------------------- ^Ciperf3: interrupt - the server has terminated logout [ perf record: Woken up 20 times to write data ] [ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ] jmaloy@freyr:~/passt$ The perf record confirms this result. Below, we can observe that the CPU spends significantly less time in the function ____sys_recvmsg() when we have offset support. Without offset support: ---------------------- jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \ -p ____sys_recvmsg -x --stdio -i perf.data | head -1 46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg With offset support: ---------------------- jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \ -p ____sys_recvmsg -x --stdio -i perf.data | head -1 28.12% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240409152805.913891-1-jmaloy@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
…git/netfilter/nf netfilter pull request 24-04-11 Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patches #1 and #2 add missing rcu read side lock when iterating over expression and object type list which could race with module removal. Patch #3 prevents promisc packet from visiting the bridge/input hook to amend a recent fix to address conntrack confirmation race in br_netfilter and nf_conntrack_bridge. Patch #4 adds and uses iterate decorator type to fetch the current pipapo set backend datastructure view when netlink dumps the set elements. Patch #5 fixes removal of duplicate elements in the pipapo set backend. Patch #6 flowtable validates pppoe header before accessing it. Patch #7 fixes flowtable datapath for pppoe packets, otherwise lookup fails and pppoe packets follow classic path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
When disabling aRFS under the `priv->state_lock`, any scheduled aRFS works are canceled using the `cancel_work_sync` function, which waits for the work to end if it has already started. However, while waiting for the work handler, the handler will try to acquire the `state_lock` which is already acquired. The worker acquires the lock to delete the rules if the state is down, which is not the worker's responsibility since disabling aRFS deletes the rules. Add an aRFS state variable, which indicates whether the aRFS is enabled and prevent adding rules when the aRFS is disabled. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.7.0-rc4_net_next_mlx5_5483eb2 #1 Tainted: G I ------------------------------------------------------ ethtool/386089 is trying to acquire lock: ffff88810f21ce68 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}, at: __flush_work+0x74/0x4e0 but task is already holding lock: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x80/0xc90 arfs_handle_work+0x4b/0x3b0 [mlx5_core] process_one_work+0x1dc/0x4a0 worker_thread+0x1bf/0x3c0 kthread+0xd7/0x100 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 -> #0 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}: __lock_acquire+0x17b4/0x2c80 lock_acquire+0xd0/0x2b0 __flush_work+0x7a/0x4e0 __cancel_work_timer+0x131/0x1c0 arfs_del_rules+0x143/0x1e0 [mlx5_core] mlx5e_arfs_disable+0x1b/0x30 [mlx5_core] mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core] ethnl_set_channels+0x28f/0x3b0 ethnl_default_set_doit+0xec/0x240 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x188/0x2c0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1a1/0x270 netlink_sendmsg+0x214/0x460 __sock_sendmsg+0x38/0x60 __sys_sendto+0x113/0x170 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock((work_completion)(&rule->arfs_work)); lock(&priv->state_lock); lock((work_completion)(&rule->arfs_work)); *** DEADLOCK *** 3 locks held by ethtool/386089: #0: ffffffff82ea7210 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffffffff82e94c88 (rtnl_mutex){+.+.}-{3:3}, at: ethnl_default_set_doit+0xd3/0x240 #2: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core] stack backtrace: CPU: 15 PID: 386089 Comm: ethtool Tainted: G I 6.7.0-rc4_net_next_mlx5_5483eb2 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0xa0 check_noncircular+0x144/0x160 __lock_acquire+0x17b4/0x2c80 lock_acquire+0xd0/0x2b0 ? __flush_work+0x74/0x4e0 ? save_trace+0x3e/0x360 ? __flush_work+0x74/0x4e0 __flush_work+0x7a/0x4e0 ? __flush_work+0x74/0x4e0 ? __lock_acquire+0xa78/0x2c80 ? lock_acquire+0xd0/0x2b0 ? mark_held_locks+0x49/0x70 __cancel_work_timer+0x131/0x1c0 ? mark_held_locks+0x49/0x70 arfs_del_rules+0x143/0x1e0 [mlx5_core] mlx5e_arfs_disable+0x1b/0x30 [mlx5_core] mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core] ethnl_set_channels+0x28f/0x3b0 ethnl_default_set_doit+0xec/0x240 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x188/0x2c0 ? ethnl_ops_begin+0xb0/0xb0 ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1a1/0x270 netlink_sendmsg+0x214/0x460 __sock_sendmsg+0x38/0x60 __sys_sendto+0x113/0x170 ? do_user_addr_fault+0x53f/0x8f0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e </TASK> Fixes: 45bf454 ("net/mlx5e: Enabling aRFS mechanism") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240411115444.374475-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
trace_drop_common() is called with preemption disabled, and it acquires a spin_lock. This is problematic for RT kernels because spin_locks are sleeping locks in this configuration, which causes the following splat: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47 preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 5 locks held by rcuc/47/449: #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210 #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130 #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210 #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290 irq event stamp: 139909 hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80 hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290 softirqs last enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170 softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0 Preemption disabled at: [<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0 CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7 Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022 Call Trace: <TASK> dump_stack_lvl+0x8c/0xd0 dump_stack+0x14/0x20 __might_resched+0x21e/0x2f0 rt_spin_lock+0x5e/0x130 ? trace_drop_common.constprop.0+0xb5/0x290 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_drop_common.constprop.0+0xb5/0x290 ? preempt_count_sub+0x1c/0xd0 ? _raw_spin_unlock_irqrestore+0x4a/0x80 ? __pfx_trace_drop_common.constprop.0+0x10/0x10 ? rt_mutex_slowunlock+0x26a/0x2e0 ? skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_rt_mutex_slowunlock+0x10/0x10 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_kfree_skb_hit+0x15/0x20 trace_kfree_skb+0xe9/0x150 kfree_skb_reason+0x7b/0x110 skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10 ? mark_lock.part.0+0x8a/0x520 ... trace_drop_common() also disables interrupts, but this is a minor issue because we could easily replace it with a local_lock. Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic context. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reported-by: Hu Chunyu <chuhu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
Petr Machata says: ==================== selftests: Assortment of fixes This is a loose follow-up to the Kernel CI patchset posted recently. It contains various fixes that were supposed to be part of said patchset, but didn't fit due to its size. The latter 4 patches were written independently of the CI effort, but again didn't fit in their intended patchsets. - Patch #1 unifies code of two very similar looking functions, busywait() and slowwait(). - Patch #2 adds sanity checks around the setting of NETIFS, which carries list of interfaces to run on. - Patch #3 changes bail_on_lldpad() to SKIP instead of FAILing. - Patches #4 to #7 fix issues in selftests. - Patches #8 to #10 add topology diagrams to several selftests. This should have been part of the mlxsw leg of NH group stats patches, but again, it did not fit in due to size. ==================== Link: https://lore.kernel.org/r/cover.1712940759.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 25, 2024
…git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 amends a missing spot where the set iterator type is unset. This is fixing a issue in the previous pull request. Patch #2 fixes the delete set command abort path by restoring state of the elements. Reverse logic for the activate (abort) case otherwise element state is not restored, this requires to move the check for active/inactive elements to the set iterator callback. From the deactivate path, toggle the next generation bit and from the activate (abort) path, clear the next generation bitmask. Patch #3 skips elements already restored by delete set command from the abort path in case there is a previous delete element command in the batch. Check for the next generation bit just like it is done via set iteration to restore maps. netfilter pull request 24-04-18 * tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: fix memleak in map from abort path netfilter: nf_tables: restore set elements when delete set fails netfilter: nf_tables: missing iterator type in lookup walk ==================== Link: https://lore.kernel.org/r/20240418010948.3332346-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Andrii Nakryiko says: ==================== bench: fast in-kernel triggering benchmarks Remove "legacy" triggering benchmarks which rely on syscalls (and thus syscall overhead is a noticeable part of benchmark, unfortunately). Replace them with faster versions that rely on triggering BPF programs in-kernel through another simple "driver" BPF program. See patch #2 with comparison results. raw_tp/tp/fmodret benchmarks required adding a simple kfunc in kernel to be able to trigger a simple tracepoint from BPF program (plus it is also allowed to be replaced by fmod_ret programs). This limits raw_tp/tp/fmodret benchmarks to new kernels only, but it keeps bench tool itself very portable and most of other benchmarks will still work on wide variety of kernels without the need to worry about building and deploying custom kernel module. See patches #5 and #6 for details. v1->v2: - move new TP closer to BPF test run code; - rename/move kfunc and register it for fmod_rets (Alexei); - limit --trig-batch-iters param to [1, 1000] (Alexei). ==================== Link: https://lore.kernel.org/r/20240326162151.3981687-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. Bug #1 is that KVM doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() stores local variables as u8s and truncates the upper bits too, etc. Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value for PEBS events, perf will _always_ generate an adaptive record, even if the guest requested a basic record. Note, KVM will also enable adaptive PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, i.e. the guest will only ever see Basic records. Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what KVM requests. Bug #4 is that adaptive PEBS *might* effectively bypass event filters set by the host, as "Updated Memory Access Info Group" records information that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least zeros) when entering a vCPU with adaptive PEBS, which allows the guest to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" records. Disable adaptive PEBS support as an immediate fix due to the severity of the LBR leak in particular, and because fixing all of the bugs will be non-trivial, e.g. not suitable for backporting to stable kernels. Note! This will break live migration, but trying to make KVM play nice with live migration would be quite complicated, wouldn't be guaranteed to work (i.e. KVM might still kill/confuse the guest), and it's not clear that there are any publicly available VMMs that support adaptive PEBS, let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't support PEBS in any capacity. Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com Fixes: c59a1f1 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhang Xiong <xiong.y.zhang@intel.com> Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> Cc: Dapeng Mi <dapeng1.mi@intel.com> Cc: Jim Mattson <jmattson@google.com> Acked-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20240307005833.827147-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
…"RESET" Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL when refreshing the PMU to emulate the MSR's architecturally defined post-RESET behavior. Per Intel's SDM: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. and Where "n" is the number of general-purpose counters available in the processor. AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many PPRs. Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose counters, although a literal reading of the SDM would require the CPU to set either bits 63:0 or 31:0. The intent of the behavior is to globally enable all GP counters; honor the intent, if not the letter of the law. Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL. This bug was recently exposed when KVM added supported for AMD's PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to guest software that only knew how to program v1 PMUs (that don't support PERF_GLOBAL_CTRL). Failure to emulate the post-RESET behavior results in such guests unknowingly leaving all general purpose counters globally disabled (the entire reason the post-RESET value sets the GP counter enable bits is to maintain backwards compatibility). The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel CPUs, although the old behavior was likely dumb luck. Because (a) that old code was also broken in its own way (the history of this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented as having a value of '0' post-RESET in all SDMs before March 2023. Initial vPMU support in commit f5132b0 ("KVM: Expose a version 2 architectural PMU to a guests") *almost* got it right (again likely by dumb luck), but for some reason only set the bits if the guest PMU was advertised as v1: if (pmu->version == 1) { pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; return; } Commit f19a0c2 ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset") then tried to remedy that goof, presumably because guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling counters. pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED); pmu->global_ctrl_mask = ~pmu->global_ctrl; That was KVM's behavior up until commit c49467a ("KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing") removed *everything*. However, it did so based on the behavior defined by the SDM , which at the time stated that "Global Perf Counter Controls" is '0' at Power-Up and RESET. But then the March 2023 SDM (325462-079US), stealthily changed its "IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT" table to say: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a "pure" RESET flow. But it can only be called prior to the first KVM_RUN, i.e. the guest will only ever observe the final value. Note #2, KVM has always cleared global_ctrl during refresh (see commit f5132b0 ("KVM: Expose a version 2 architectural PMU to a guests")), i.e. there is no danger of breaking existing setups by clobbering a value set by userspace. Reported-by: Babu Moger <babu.moger@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
When I did hard offline test with hugetlb pages, below deadlock occurs: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-11409-gf6cef5f8c37f #1 Not tainted ------------------------------------------------------ bash/46904 is trying to acquire lock: ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60 but task is already holding lock: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (pcp_batch_high_lock){+.+.}-{3:3}: __mutex_lock+0x6c/0x770 page_alloc_cpu_online+0x3c/0x70 cpuhp_invoke_callback+0x397/0x5f0 __cpuhp_invoke_callback_range+0x71/0xe0 _cpu_up+0xeb/0x210 cpu_up+0x91/0xe0 cpuhp_bringup_mask+0x49/0xb0 bringup_nonboot_cpus+0xb7/0xe0 smp_init+0x25/0xa0 kernel_init_freeable+0x15f/0x3e0 kernel_init+0x15/0x1b0 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1298/0x1cd0 lock_acquire+0xc0/0x2b0 cpus_read_lock+0x2a/0xc0 static_key_slow_dec+0x16/0x60 __hugetlb_vmemmap_restore_folio+0x1b9/0x200 dissolve_free_huge_page+0x211/0x260 __page_handle_poison+0x45/0xc0 memory_failure+0x65e/0xc70 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xca/0x1e0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcp_batch_high_lock); lock(cpu_hotplug_lock); lock(pcp_batch_high_lock); rlock(cpu_hotplug_lock); *** DEADLOCK *** 5 locks held by bash/46904: #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70 #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 stack backtrace: CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 check_noncircular+0x129/0x140 __lock_acquire+0x1298/0x1cd0 lock_acquire+0xc0/0x2b0 cpus_read_lock+0x2a/0xc0 static_key_slow_dec+0x16/0x60 __hugetlb_vmemmap_restore_folio+0x1b9/0x200 dissolve_free_huge_page+0x211/0x260 __page_handle_poison+0x45/0xc0 memory_failure+0x65e/0xc70 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xca/0x1e0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc862314887 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887 RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001 RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00 In short, below scene breaks the lock dependency chain: memory_failure __page_handle_poison zone_pcp_disable -- lock(pcp_batch_high_lock) dissolve_free_huge_page __hugetlb_vmemmap_restore_folio static_key_slow_dec cpus_read_lock -- rlock(cpu_hotplug_lock) Fix this by calling drain_all_pages() instead. This issue won't occur until commit a6b4085 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key"). As it introduced rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while lock(pcp_batch_high_lock) is already in the __page_handle_poison(). [linmiaohe@huawei.com: extend comment per Oscar] [akpm@linux-foundation.org: reflow block comment] Link: https://lkml.kernel.org/r/20240407085456.2798193-1-linmiaohe@huawei.com Fixes: a6b4085 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Jane Chu <jane.chu@oracle.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Petr Machata says: ==================== mlxsw: Fixes This patchset fixes the following issues: - During driver de-initialization the driver unregisters the EMAD response trap by setting its action to DISCARD. However the manual only permits TRAP and FORWARD, and future firmware versions will enforce this. In patch #1, suppress the error message by aligning the driver to the manual and use a FORWARD (NOP) action when unregistering the trap. - The driver queries the Management Capabilities Mask (MCAM) register during initialization to understand if certain features are supported. However, not all firmware versions support this register, leading to the driver failing to load. Patches #2 and #3 fix this issue by treating an error in the register query as an indication that the feature is not supported. v2: - Patch #2: - Make mlxsw_env_max_module_eeprom_len_query() void ==================== Link: https://lore.kernel.org/r/cover.1713446092.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
… update The rule activity update delayed work periodically traverses the list of configured rules and queries their activity from the device. As part of this task it accesses the entry pointed by 'ventry->entry', but this entry can be changed concurrently by the rehash delayed work, leading to a use-after-free [1]. Fix by closing the race and perform the activity query under the 'vregion->lock' mutex. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181 CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 mlxsw_sp_acl_rule_activity_update_work+0x219/0x400 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: 2bffc53 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
9f74a3d ("ice: Fix VF Reset paths when interface in a failed over aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf(). The commit placed this lock acquisition just prior to the acquisition of the VF configuration lock. If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK flag, this could deadlock with ice_vc_cfg_qs_msg() because it always acquires the locks in the order of the VF configuration lock and then the LAG mutex. Lockdep reports this violation almost immediately on creating and then removing 2 VF: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-rc6 #54 Tainted: G W O ------------------------------------------------------ kworker/60:3/6771 is trying to acquire lock: ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice] but task is already holding lock: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&pf->lag_mutex){+.+.}-{3:3}: __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_vc_cfg_qs_msg+0x45/0x690 [ice] ice_vc_process_vf_msg+0x4f5/0x870 [ice] __ice_clean_ctrlq+0x2b5/0x600 [ice] ice_service_task+0x2c9/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 -> #0 (&vf->cfg_lock){+.+.}-{3:3}: check_prev_add+0xe2/0xc50 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_reset_vf+0x22f/0x4d0 [ice] ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pf->lag_mutex); lock(&vf->cfg_lock); lock(&pf->lag_mutex); lock(&vf->cfg_lock); *** DEADLOCK *** 4 locks held by kworker/60:3/6771: #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice] #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] stack backtrace: CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 #54 Hardware name: Workqueue: ice ice_service_task [ice] Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x12d/0x150 check_prev_add+0xe2/0xc50 ? save_trace+0x59/0x230 ? add_chain_cache+0x109/0x450 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 ? lockdep_hardirqs_on+0x7d/0x100 lock_acquire+0xd4/0x2d0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? lock_is_held_type+0xc7/0x120 __mutex_lock+0x9b/0xbf0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? ice_reset_vf+0x22f/0x4d0 [ice] ? rcu_is_watching+0x11/0x50 ? ice_reset_vf+0x22f/0x4d0 [ice] ice_reset_vf+0x22f/0x4d0 [ice] ? process_one_work+0x176/0x4d0 ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x104/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> To avoid deadlock, we must acquire the LAG mutex only after acquiring the VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only after we either acquire or check that the VF configuration lock is held. Fixes: 9f74a3d ("ice: Fix VF Reset paths when interface in a failed over aggregate") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
…nix_gc(). syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock(). One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket. So, the splat is false-positive. Let's add a dedicated lock class for the latter to suppress the splat. Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl. [0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock); *** DEADLOCK *** 3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 47d8ac0 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
…git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains two Netfilter/IPVS fixes for net: Patch #1 fixes SCTP checksumming for IPVS with gso packets, from Ismael Luceno. Patch #2 honor dormant flag from netdev event path to fix a possible double hook unregistration. * tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: honor table dormant flag from netdev release event path ipvs: Fix checksumming on GSO of SCTP packets ==================== Link: https://lore.kernel.org/r/20240425090149.1359547-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Eduard Zingerman says: ==================== check bpf_dummy_struct_ops program params for test runs When doing BPF_PROG_TEST_RUN for bpf_dummy_struct_ops programs, execution should be rejected when NULL is passed for non-nullable params, because for such params verifier assumes that such params are never NULL and thus might optimize out NULL checks. This problem was reported by Jose E. Marchesi in off-list discussion. The code generated by GCC for dummy_st_ops_success/test_1() function differs from LLVM variant in a way that allows verifier to remove the NULL check. The test dummy_st_ops/dummy_init_ret_value actually sets the 'state' parameter to NULL, thus GCC-generated version of the test triggers NULL pointer dereference when BPF program is executed. This patch-set addresses the issue in the following steps: - patch #1 marks bpf_dummy_struct_ops.test_1 parameter as nullable, for verifier to have correct assumptions about test_1() programs; - patch #2 modifies dummy_st_ops/dummy_init_ret_value to trigger NULL dereference with both GCC and LLVM (if patch #1 is not applied); - patch #3 adjusts a few dummy_st_ops test cases to avoid passing NULL for 'state' parameter of test_2() and test_sleepable() functions, as parameters of these functions are not marked as nullable; - patch #4 adjusts bpf_dummy_struct_ops to reject test execution of programs if NULL is passed for non-nullable parameter; - patch #5 adds a test to verify logic from patch #4. ==================== Link: https://lore.kernel.org/r/20240424012821.595216-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <maxim@isovalent.com> CC: Xiumei Mu <xmu@redhat.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: multipath-tcp/mptcp_net-next#451 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Petr Machata says: ==================== mlxsw: Improve events processing performance Amit Cohen writes: Spectrum ASICs only support a single interrupt, it means that all the events are handled by one IRQ (interrupt request) handler. Currently, we schedule a tasklet to handle events in EQ, then we also use tasklet for CQ, SDQ and RDQ. Tasklet runs in softIRQ (software IRQ) context, and will be run on the same CPU which scheduled it. It means that today we have one CPU which handles all the packets (both network packets and EMADs) from hardware. The existing implementation is not efficient and can be improved. Measuring latency of EMADs in the driver (without the time in FW) shows that latency is increased by factor of 28 (x28) when network traffic is handled by the driver. Measuring throughput in CPU shows that CPU can handle ~35% less packets of specific flow when corrupted packets are also handled by the driver. There are cases that these values even worse, we measure decrease of ~44% packet rate. This can be improved if network packet and EMADs will be handled in parallel by several CPUs, and more than that, if different types of traffic will be handled in parallel. We can achieve this using NAPI. This set converts the driver to process completions from hardware via NAPI. The idea is to add NAPI instance per CQ (which is mapped 1:1 to SDQ/RDQ), which means that each DQ can be handled separately. we have DQ for EMADs and DQs for each trap group (like LLDP, BGP, L3 drops, etc..). See more details in commit messages. An additional improvement which is done as part of this set is related to doorbells' ring. The idea is to handle small chunks of Rx packets (which is also recommended using NAPI) and ring doorbells once per chunk. This reduces the access to hardware which is expensive (time wise) and might take time because of memory barriers. With this set we can see better performance. To summerize: EMADs latency: +------------------------------------------------------------------------+ | | Before this set | Now | |------------------|---------------------------|-------------------------| | Increased factor | x28 | x1.5 | +------------------------------------------------------------------------+ Note that we can see even measurements that show better latency when traffic is handled by the driver. Throughput: +------------------------------------------------------------------------+ | | Before this set | Now | |-------------|----------------------------|-----------------------------| | Reduced | 35% | 6% | | packet rate | | | +------------------------------------------------------------------------+ Additional improvements are planned - use page pool for buffer allocations and avoid cache miss of each SKB using napi_build_skb(). Patch set overview: Patches #1-#2 improve access to hardware by reducing dorbells' rings Patch #3-#4 are preaparations for NAPI usage Patch #5 converts the driver to use NAPI ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
Apr 30, 2024
Wen Gu says: ==================== net/smc: SMC intra-OS shortcut with loopback-ism This patch set acts as the second part of the new version of [1] (The first part can be referred from [2]), the updated things of this version are listed at the end. - Background SMC-D is now used in IBM z with ISM function to optimize network interconnect for intra-CPC communications. Inspired by this, we try to make SMC-D available on the non-s390 architecture through a software-implemented Emulated-ISM device, that is the loopback-ism device here, to accelerate inter-process or inter-containers communication within the same OS instance. - Design This patch set includes 3 parts: - Patch #1: some prepare work for loopback-ism. - Patch #2-#7: implement loopback-ism device and adapt SMC-D for it. loopback-ism now serves only SMC and no userspace interfaces exposed. - Patch #8-#11: memory copy optimization for intra-OS scenario. The loopback-ism device is designed as an ISMv2 device and not be limited to a specific net namespace, ends of both inter-process connection (1/1' in diagram below) or inter-container connection (2/2' in diagram below) can find the same available loopback-ism and choose it during the CLC handshake. Container 1 (ns1) Container 2 (ns2) +-----------------------------------------+ +-------------------------+ | +-------+ +-------+ +-------+ | | +-------+ | | | App A | | App B | | App C | | | | App D |<-+ | | +-------+ +---^---+ +-------+ | | +-------+ |(2') | | |127.0.0.1 (1')| |192.168.0.11 192.168.0.12| | | (1)| +--------+ | +--------+ |(2) | | +--------+ +--------+ | | `-->| lo |-` | eth0 |<-` | | | lo | | eth0 | | +---------+--|---^-+---+-----|--+---------+ +-+--------+---+-^------+-+ | | | | Kernel | | | | +----+-------v---+-----------v----------------------------------+---+----+ | | TCP | | | | | | | +--------------------------------------------------------------+ | | | | +--------------+ | | | smc loopback | | +---------------------------+--------------+-----------------------------+ loopback-ism device creates DMBs (shared memory) for each connection peer. Since data transfer occurs within the same kernel, the sndbuf of each peer is only a descriptor and point to the same memory region as peer DMB, so that the data copy from sndbuf to peer DMB can be avoided in loopback-ism case. Container 1 (ns1) Container 2 (ns2) +-----------------------------------------+ +-------------------------+ | +-------+ | | +-------+ | | | App C |-----+ | | | App D | | | +-------+ | | | +-^-----+ | | | | | | | | (2) | | | (2') | | | | | | | | +---------------|-------------------------+ +----------|--------------+ | | Kernel | | +---------------|-----------------------------------------|--------------+ | +--------+ +--v-----+ +--------+ +--------+ | | |dmb_desc| |snd_desc| |dmb_desc| |snd_desc| | | +-----|--+ +--|-----+ +-----|--+ +--------+ | | +-----|--+ | +-----|--+ | | | DMB C | +---------------------------------| DMB D | | | +--------+ +--------+ | | | | +--------------+ | | | smc loopback | | +---------------------------+--------------+-----------------------------+ - Benchmark Test * Test environments: - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem. - SMC sndbuf/DMB size 1MB. * Test object: - TCP: run on TCP loopback. - SMC lo: run on SMC loopback-ism. 1. ipc-benchmark (see [3]) - ./<foo> -c 1000000 -s 100 TCP SMC-lo Message rate (msg/s) 84991 151293(+78.01%) 2. sockperf - serv: <smc_run> sockperf sr --tcp - clnt: <smc_run> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30 TCP SMC-lo Bandwidth(MBps) 5033.569 7987.732(+58.69%) Latency(us) 5.986 3.398(-43.23%) 3. nginx/wrk - serv: <smc_run> nginx - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80 TCP SMC-lo Requests/s 187951.76 267107.90(+42.12%) 4. redis-benchmark - serv: <smc_run> redis-server - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024 TCP SMC-lo GET(Requests/s) 86132.64 118133.49(+37.15%) SET(Requests/s) 87374.40 122887.86(+40.65%) Change log: v7->v6 - Patch #2: minor: remove unnecessary 'return' of inline smc_loopback_exit(). - Patch #10: minor: directly return 0 instead of 'rc' in smcd_cdc_msg_send(). - all: collect the Reviewed-by tags. v6->RFC v5 Link: https://lore.kernel.org/netdev/20240414040304.54255-1-guwen@linux.alibaba.com/ - Patch #2: make the use of CONFIG_SMC_LO cleaner. - Patch #5: mark some smcd_ops that loopback-ism doesn't support as optional and check for the support when they are called. - Patch #7: keep loopback-ism at the beginning of the SMC-D device list. - Some expression changes in commit logs and comments. RFC v5->RFC v4: Link: https://lore.kernel.org/netdev/20240324135522.108564-1-guwen@linux.alibaba.com/ - Patch #2: minor changes in description of config SMC_LO and comments. - Patch #10: minor changes in comments and if(smc_ism_support_dmb_nocopy()) check in smcd_cdc_msg_send(). - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() and SMC_LO_CHID to SMC_LO_RESERVED_CHID. - Patch #5: memcpy while holding the ldev->dmb_ht_lock. - Some expression changes in commit logs. RFC v4->v3: Link: https://lore.kernel.org/netdev/20240317100545.96663-1-guwen@linux.alibaba.com/ - The merge window of v6.9 is open, so post this series as an RFC. - Patch #6: since some information fed back by smc_nl_handle_smcd_dev() dose not apply to Emulated-ISM (including loopback-ism here), loopback-ism is not exposed through smc netlink for the time being. we may refactor this part when smc netlink interface is updated. v3->v2: Link: https://lore.kernel.org/netdev/20240312142743.41406-1-guwen@linux.alibaba.com/ - Patch #11: use tasklet_schedule(&conn->rx_tsklet) instead of smcd_cdc_rx_handler() to avoid possible recursive locking of conn->send_lock and use {read|write}_lock_bh() to acquire dmb_ht_lock. v2->v1: Link: https://lore.kernel.org/netdev/20240307095536.29648-1-guwen@linux.alibaba.com/ - All the patches: changed the term virtual-ISM to Emulated-ISM as defined by SMCv2.1. - Patch #3: optimized the description of SMC_LO config. Avoid exposing loopback-ism to sysfs and remove all the knobs until future definition clear. - Patch #3: try to make lockdep happy by using read_lock_bh() in smc_lo_move_data(). - Patch #6: defaultly use physical contiguous DMB buffers. - Patch #11: defaultly enable DMB no-copy for loopback-ism and free the DMB in unregister_dmb or detach_dmb when dmb_node->refcnt reaches 0, instead of using wait_event to keep waiting in unregister_dmb. v1->RFC: Link: https://lore.kernel.org/netdev/20240111120036.109903-1-guwen@linux.alibaba.com/ - Patch #9: merge rx_bytes and tx_bytes as xfer_bytes statistics: /sys/devices/virtual/smc/loopback-ism/xfer_bytes - Patch #10: add support_dmb_nocopy operation to check if SMC-D device supports merging sndbuf with peer DMB. - Patch #13 & #14: introduce loopback-ism device control of DMB memory type and control of whether to merge sndbuf and DMB. They can be respectively set by: /sys/devices/virtual/smc/loopback-ism/dmb_type /sys/devices/virtual/smc/loopback-ism/dmb_copy The motivation for these two control is that a performance bottleneck was found when using vzalloced DMB and sndbuf is merged with DMB, and there are many CPUs and CONFIG_HARDENED_USERCOPY is set [4]. The bottleneck is caused by the lock contention in vmap_area_lock [5] which is involved in memcpy_from_msg() or memcpy_to_msg(). Currently, Uladzislau Rezki is working on mitigating the vmap lock contention [6]. It has significant effects, but using virtual memory still has additional overhead compared to using physical memory. So this new version provides controls of dmb_type and dmb_copy to suit different scenarios. - Some minor changes and comments improvements. RFC->old version([1]): Link: https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/ - Patch #1: improve the loopback-ism dump, it shows as follows now: # smcd d FID Type PCI-ID PCHID InUse #LGs PNET-ID 0000 0 loopback-ism ffff No 0 - Patch #3: introduce the smc_ism_set_v2_capable() helper and set smc_ism_v2_capable when ISMv2 or virtual ISM is registered, regardless of whether there is already a device in smcd device list. - Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/. - Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active to activate or deactivate the loopback-ism. - Patch #9: introduce the statistics of loopback-ism by /sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}. - Some minor changes and comments improvements. [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/ [2] https://lore.kernel.org/netdev/20231219142616.80697-1-guwen@linux.alibaba.com/ [3] https://github.com/goldsborough/ipc-bench [4] https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/ [5] https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/ [6] https://lore.kernel.org/all/20240102184633.748113-1-urezki@gmail.com/ ==================== Link: https://lore.kernel.org/r/20240428060738.60843-1-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BluezTestBot
pushed a commit
that referenced
this issue
May 3, 2024
Lockdep detects a possible deadlock as listed below. This is because it detects the IA55 interrupt controller .irq_eoi() API is called from interrupt context while configuration-specific API (e.g., .irq_enable()) could be called from process context on resume path (by calling rzg2l_gpio_irq_restore()). To avoid this, protect the call of rzg2l_gpio_irq_enable() with spin_lock_irqsave()/spin_unlock_irqrestore(). With this the same approach that is available in __setup_irq() is mimicked to pinctrl IRQ resume function. Below is the lockdep report: WARNING: inconsistent lock state 6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f #90 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. str_rwdt_t_001./159 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff00000b001d70 (&rzg2l_irqc_data->lock){?...}-{2:2}, at: rzg2l_irqc_irq_enable+0x60/0xa4 {IN-HARDIRQ-W} state was registered at: lock_acquire+0x1e0/0x310 _raw_spin_lock+0x44/0x58 rzg2l_irqc_eoi+0x2c/0x130 irq_chip_eoi_parent+0x18/0x20 rzg2l_gpio_irqc_eoi+0xc/0x14 handle_fasteoi_irq+0x134/0x230 generic_handle_domain_irq+0x28/0x3c gic_handle_irq+0x4c/0xbc call_on_irq_stack+0x24/0x34 do_interrupt_handler+0x78/0x7c el1_interrupt+0x30/0x5c el1h_64_irq_handler+0x14/0x1c el1h_64_irq+0x64/0x68 _raw_spin_unlock_irqrestore+0x34/0x70 __setup_irq+0x4d4/0x6b8 request_threaded_irq+0xe8/0x1a0 request_any_context_irq+0x60/0xb8 devm_request_any_context_irq+0x74/0x104 gpio_keys_probe+0x374/0xb08 platform_probe+0x64/0xcc really_probe+0x140/0x2ac __driver_probe_device+0x74/0x124 driver_probe_device+0x3c/0x15c __driver_attach+0xec/0x1c4 bus_for_each_dev+0x70/0xcc driver_attach+0x20/0x28 bus_add_driver+0xdc/0x1d0 driver_register+0x5c/0x118 __platform_driver_register+0x24/0x2c gpio_keys_init+0x18/0x20 do_one_initcall+0x70/0x290 kernel_init_freeable+0x294/0x504 kernel_init+0x20/0x1cc ret_from_fork+0x10/0x20 irq event stamp: 69071 hardirqs last enabled at (69071): [<ffff800080e0dafc>] _raw_spin_unlock_irqrestore+0x6c/0x70 hardirqs last disabled at (69070): [<ffff800080e0cfec>] _raw_spin_lock_irqsave+0x7c/0x80 softirqs last enabled at (67654): [<ffff800080010614>] __do_softirq+0x494/0x4dc softirqs last disabled at (67645): [<ffff800080015238>] ____do_softirq+0xc/0x14 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rzg2l_irqc_data->lock); <Interrupt> lock(&rzg2l_irqc_data->lock); *** DEADLOCK *** 4 locks held by str_rwdt_t_001./159: #0: ffff00000b10f3f0 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x1a4/0x35c #1: ffff00000e43ba88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe8/0x1a8 #2: ffff00000aa21dc8 (kn->active#40){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1a8 #3: ffff80008179d970 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x9c/0x278 stack backtrace: CPU: 0 PID: 159 Comm: str_rwdt_t_001. Not tainted 6.8.0-rc5-next-20240219-arm64-renesas-00030-gb17a289abf1f #90 Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT) Call trace: dump_backtrace+0x94/0xe8 show_stack+0x14/0x1c dump_stack_lvl+0x88/0xc4 dump_stack+0x14/0x1c print_usage_bug.part.0+0x294/0x348 mark_lock+0x6b0/0x948 __lock_acquire+0x750/0x20b0 lock_acquire+0x1e0/0x310 _raw_spin_lock+0x44/0x58 rzg2l_irqc_irq_enable+0x60/0xa4 irq_chip_enable_parent+0x1c/0x34 rzg2l_gpio_irq_enable+0xc4/0xd8 rzg2l_pinctrl_resume_noirq+0x4cc/0x520 pm_generic_resume_noirq+0x28/0x3c genpd_finish_resume+0xc0/0xdc genpd_resume_noirq+0x14/0x1c dpm_run_callback+0x34/0x90 device_resume_noirq+0xa8/0x268 dpm_noirq_resume_devices+0x13c/0x160 dpm_resume_noirq+0xc/0x1c suspend_devices_and_enter+0x2c8/0x570 pm_suspend+0x1ac/0x278 state_store+0x88/0x124 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x48/0x6c kernfs_fop_write_iter+0x118/0x1a8 vfs_write+0x270/0x35c ksys_write+0x64/0xec __arm64_sys_write+0x18/0x20 invoke_syscall+0x44/0x108 el0_svc_common.constprop.0+0xb4/0xd4 do_el0_svc+0x18/0x20 el0_svc+0x3c/0xb8 el0t_64_sync_handler+0xb8/0xbc el0t_64_sync+0x14c/0x150 Fixes: 254203f ("pinctrl: renesas: rzg2l: Add suspend/resume support") Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240320104230.446400-2-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
BluezTestBot
pushed a commit
that referenced
this issue
May 3, 2024
…active The default nna (node_nr_active) is used when the pool isn't tied to a specific NUMA node. This can happen in the following cases: 1. On NUMA, if per-node pwq init failure and the fallback pwq is used. 2. On NUMA, if a pool is configured to span multiple nodes. 3. On single node setups. 5797b1c ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") set the default nna->max to min_active because only #1 was being considered. For #2 and #3, using min_active means that the max concurrency in normal operation is pushed down to min_active which is currently 8, which can obviously lead to performance issues. exact value nna->max is set to doesn't really matter. #2 can only happen if the workqueue is intentionally configured to ignore NUMA boundaries and there's no good way to distribute max_active in this case. #3 is the default behavior on single node machines. Let's set it the default nna->max to max_active. This fixes the artificially lowered concurrency problem on single node machines and shouldn't hurt anything for other cases. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 5797b1c ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") Link: https://lore.kernel.org/dm-devel/20240410084531.2134621-1-shinichiro.kawasaki@wdc.com/ Signed-off-by: Tejun Heo <tj@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
May 3, 2024
…kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #2 - Fix + test for a NULL dereference resulting from unsanitised user input in the vgic-v2 device attribute accessors
BluezTestBot
pushed a commit
that referenced
this issue
May 15, 2024
One of my CI runs popped the following lockdep splat ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc4+ #1 Not tainted ------------------------------------------------------ btrfs/471533 is trying to acquire lock: ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0 but task is already holding lock: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&fs_info->subvol_sem){++++}-{3:3}: down_read+0x42/0x170 btrfs_rename+0x607/0xb00 btrfs_rename2+0x2e/0x70 vfs_rename+0xaf8/0xfc0 do_renameat2+0x586/0x600 __x64_sys_rename+0x43/0x50 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}: down_write+0x3f/0xc0 btrfs_inode_lock+0x40/0x70 prealloc_file_extent_cluster+0x1b0/0x370 relocate_file_extent_cluster+0xb2/0x720 relocate_data_extent+0x107/0x160 relocate_block_group+0x442/0x550 btrfs_relocate_block_group+0x2cb/0x4b0 btrfs_relocate_chunk+0x50/0x1b0 btrfs_balance+0x92f/0x13d0 btrfs_ioctl+0x1abf/0x2600 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}: __lock_acquire+0x13e7/0x2180 lock_acquire+0xcb/0x2e0 __mutex_lock+0xbe/0xc00 btrfs_quota_disable+0x54/0x4c0 btrfs_ioctl+0x206b/0x2600 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e other info that might help us debug this: Chain exists of: &fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->subvol_sem); lock(&sb->s_type->i_mutex_key#16); lock(&fs_info->subvol_sem); lock(&fs_info->cleaner_mutex); *** DEADLOCK *** 2 locks held by btrfs/471533: #0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600 #1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600 stack backtrace: CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1 Call Trace: <TASK> dump_stack_lvl+0x77/0xb0 check_noncircular+0x148/0x160 ? lock_acquire+0xcb/0x2e0 __lock_acquire+0x13e7/0x2180 lock_acquire+0xcb/0x2e0 ? btrfs_quota_disable+0x54/0x4c0 ? lock_is_held_type+0x9a/0x110 __mutex_lock+0xbe/0xc00 ? btrfs_quota_disable+0x54/0x4c0 ? srso_return_thunk+0x5/0x5f ? lock_acquire+0xcb/0x2e0 ? btrfs_quota_disable+0x54/0x4c0 ? btrfs_quota_disable+0x54/0x4c0 btrfs_quota_disable+0x54/0x4c0 btrfs_ioctl+0x206b/0x2600 ? srso_return_thunk+0x5/0x5f ? __do_sys_statfs+0x61/0x70 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x95/0x180 ? srso_return_thunk+0x5/0x5f ? reacquire_held_locks+0xd1/0x1f0 ? do_user_addr_fault+0x307/0x8a0 ? srso_return_thunk+0x5/0x5f ? lock_acquire+0xcb/0x2e0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? find_held_lock+0x2b/0x80 ? srso_return_thunk+0x5/0x5f ? lock_release+0xca/0x2a0 ? srso_return_thunk+0x5/0x5f ? do_user_addr_fault+0x35c/0x8a0 ? srso_return_thunk+0x5/0x5f ? trace_hardirqs_off+0x4b/0xc0 ? srso_return_thunk+0x5/0x5f ? lockdep_hardirqs_on_prepare+0xde/0x190 ? srso_return_thunk+0x5/0x5f This happens because when we call rename we already have the inode mutex held, and then we acquire the subvol_sem if we are a subvolume. This makes the dependency inode lock -> subvol sem When we're running data relocation we will preallocate space for the data relocation inode, and we always run the relocation under the ->cleaner_mutex. This now creates the dependency of cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem Qgroup delete is doing this in the opposite order, it is acquiring the subvol_sem and then it is acquiring the cleaner_mutex, which results in this lockdep splat. This deadlock can't happen in reality, because we won't ever rename the data reloc inode, nor is the data reloc inode a subvolume. However this is fairly easy to fix, simply take the cleaner mutex in the case where we are disabling qgroups before we take the subvol_sem. This resolves the lockdep splat. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
BluezTestBot
pushed a commit
that referenced
this issue
May 15, 2024
Merge series from Jerome Brunet <jbrunet@baylibre.com>: This patchset fixes 2 problems on TDM which both find a solution by properly implementing the .trigger() callback for the TDM backend. ATM, enabling the TDM formatters is done by the .prepare() callback because handling the formatter is slow due to necessary calls to CCF. The first problem affects the TDMIN. Because .prepare() is called on DPCM backend first, the formatter are started before the FIFOs and this may cause a random channel shifts if the TDMIN use multiple lanes with more than 2 slots per lanes. Using trigger() allows to set the FE/BE order, solving the problem. There has already been an attempt to fix this 3y ago [1] and reverted [2] It triggered a 'sleep in irq' error on the period IRQ. The solution is to just use the bottom half of threaded IRQ. This is patch #1. Patch #2 and #3 remain mostly the same as 3y ago. For TDMOUT, the problem is on pause. ATM pause only stops the FIFO and the TDMOUT just starves. When it does, it will actually repeat the last sample continuously. Depending on the platform, if there is no high-pass filter on the analog path, this may translate to a constant position of the speaker membrane. There is no audible glitch but it may damage the speaker coil. Properly stopping the TDMOUT in pause solves the problem. There is behaviour change associated with that fix. Clocks used to be continuous on pause because of the problem above. They will now be gated on pause by default, as they should. The last change introduce the proper support for continuous clocks, if needed. [1]: https://lore.kernel.org/linux-amlogic/20211020114217.133153-1-jbrunet@baylibre.com [2]: https://lore.kernel.org/linux-amlogic/20220421155725.2589089-1-narmstrong@baylibre.com
BluezTestBot
pushed a commit
that referenced
this issue
May 15, 2024
…/git/pablo/gtp Pablo neira Ayuso says: ==================== gtp pull request 24-05-07 This v3 includes: - fix for clang uninitialized variable per Jakub. - address Smatch and Coccinelle reports per Simon - remove inline in new IPv6 support per Simon - fix memleaks in netlink control plane per Simon -o- The following patchset contains IPv6 GTP driver support for net-next, this also includes IPv6 over IPv4 and vice-versa: Patch #1 removes a unnecessary stack variable initialization in the socket routine. Patch #2 deals with GTP extension headers. This variable length extension header to decapsulate packets accordingly. Otherwise, packets are dropped when these extension headers are present which breaks interoperation with other non-Linux based GTP implementations. Patch #3 prepares for IPv6 support by moving IPv4 specific fields in PDP context objects to a union. Patch #4 adds IPv6 support while retaining backward compatibility. Three new attributes allows to declare an IPv6 GTP tunnel GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6 as well as IFLA_GTP_LOCAL6 to declare the IPv6 GTP UDP socket. Up to this patch, only IPv6 outer in IPv6 inner is supported. Patch #5 uses IPv6 address /64 prefix for UE/MS in the inner headers. Unlike IPv4, which provides a 1:1 mapping between UE/MS, IPv6 tunnel encapsulates traffic for /64 address as specified by 3GPP TS. Patch has been split from Patch #4 to highlight this behaviour. Patch #6 passes up IPv6 link-local traffic, such as IPv6 SLAAC, for handling to userspace so they are handled as control packets. Patch #7 prepares to allow for GTP IPv4 over IPv6 and vice-versa by moving IP specific debugging out of the function to build IPv4 and IPv6 GTP packets. Patch #8 generalizes TOS/DSCP handling following similar approach as in the existing iptunnel infrastructure. Patch #9 adds a helper function to build an IPv4 GTP packet in the outer header. Patch #10 adds a helper function to build an IPv6 GTP packet in the outer header. Patch #11 adds support for GTP IPv4-over-IPv6 and vice-versa. Patch #12 allows to use the same TID/TEID (tunnel identifier) for inner IPv4 and IPv6 packets for better UE/MS dual stack integration. This series integrates with the osmocom.org project CI and TTCN-3 test infrastructure (Oliver Smith) as well as the userspace libgtpnl library. Thanks to Harald Welte, Oliver Smith and Pau Espin for reviewing and providing feedback through the osmocom.org redmine platform to make this happen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
BluezTestBot
pushed a commit
that referenced
this issue
May 15, 2024
…rnel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: Patch #1 skips transaction if object type provides no .update interface. Patch #2 skips NETDEV_CHANGENAME which is unused. Patch #3 enables conntrack to handle Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) as untracked opposed to invalid packets. From Linus Luessing. Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of dropping them, from Jason Xing. Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0, also from Jason. Patch #6 removes reference in netfilter's sysctl documentation on pickup entries which were already removed by Florian Westphal. Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which allows to evict entries from the conntrack table, also from Florian. Patches #8 to #16 updates nf_tables pipapo set backend to allocate the datastructure copy on-demand from preparation phase, to better deal with OOM situations where .commit step is too late to fail. Series from Florian Westphal. Patch #17 adds a selftest with packetdrill to cover conntrack TCP state transitions, also from Florian. Patch #18 use GFP_KERNEL to clone elements from control plane to avoid quick atomic reserves exhaustion with large sets, reporter refers to million entries magnitude. * tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow clone callbacks to sleep selftests: netfilter: add packetdrill based conntrack tests netfilter: nft_set_pipapo: remove dirty flag netfilter: nft_set_pipapo: move cloning of match info to insert/removal path netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone netfilter: nft_set_pipapo: merge deactivate helper into caller netfilter: nft_set_pipapo: prepare walk function for on-demand clone netfilter: nft_set_pipapo: prepare destroy function for on-demand clone netfilter: nft_set_pipapo: make pipapo_clone helper return NULL netfilter: nft_set_pipapo: move prove_locking helper around netfilter: conntrack: remove flowtable early-drop test netfilter: conntrack: documentation: remove reference to non-existent sysctl netfilter: use NF_DROP instead of -NF_DROP netfilter: conntrack: dccp: try not to drop skb in conntrack netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler netfilter: nf_tables: skip transaction if update object is not implemented ==================== Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
May 15, 2024
Xuan Zhuo says: ==================== virtio_net: rx enable premapped mode by default Actually, for the virtio drivers, we can enable premapped mode whatever the value of use_dma_api. Because we provide the virtio dma apis. So the driver can enable premapped mode unconditionally. This patch set makes the big mode of virtio-net to support premapped mode. And enable premapped mode for rx by default. Based on the following points, we do not use page pool to manage these pages: 1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore, we can only prevent the page pool from performing DMA operations, and let the driver perform DMA operations on the allocated pages. 2. But when the page pool releases the page, we have no chance to execute dma unmap. 3. A solution to #2 is to execute dma unmap every time before putting the page back to the page pool. (This is actually a waste, we don't execute unmap so frequently.) 4. But there is another problem, we still need to use page.dma_addr to save the dma address. Using page.dma_addr while using page pool is unsafe behavior. 5. And we need space the chain the pages submitted once to virtio core. More: https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/ Why we do not use the page space to store the dma? http://lore.kernel.org/all/CACGkMEuyeJ9mMgYnnB42=hw6umNuo=agn7VBqBqYPd7GN=+39Q@mail.gmail.com ==================== Link: https://lore.kernel.org/r/20240511031404.30903-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BluezTestBot
pushed a commit
that referenced
this issue
May 15, 2024
This adds a check before freeing the rx->skb in flush and close functions to handle the kernel crash seen while removing driver after FW download fails or before FW download completes. dmesg log: [ 54.634586] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 [ 54.643398] Mem abort info: [ 54.646204] ESR = 0x0000000096000004 [ 54.649964] EC = 0x25: DABT (current EL), IL = 32 bits [ 54.655286] SET = 0, FnV = 0 [ 54.658348] EA = 0, S1PTW = 0 [ 54.661498] FSC = 0x04: level 0 translation fault [ 54.666391] Data abort info: [ 54.669273] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 54.674768] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 54.674771] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 54.674775] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000048860000 [ 54.674780] [0000000000000080] pgd=0000000000000000, p4d=0000000000000000 [ 54.703880] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 54.710152] Modules linked in: btnxpuart(-) overlay fsl_jr_uio caam_jr caamkeyblob_desc caamhash_desc caamalg_desc crypto_engine authenc libdes crct10dif_ce polyval_ce polyval_generic snd_soc_imx_spdif snd_soc_imx_card snd_soc_ak5558 snd_soc_ak4458 caam secvio error snd_soc_fsl_micfil snd_soc_fsl_spdif snd_soc_fsl_sai snd_soc_fsl_utils imx_pcm_dma gpio_ir_recv rc_core sch_fq_codel fuse [ 54.744357] CPU: 3 PID: 72 Comm: kworker/u9:0 Not tainted 6.6.3-otbr-g128004619037 #2 [ 54.744364] Hardware name: FSL i.MX8MM EVK board (DT) [ 54.744368] Workqueue: hci0 hci_power_on [ 54.757244] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 54.757249] pc : kfree_skb_reason+0x18/0xb0 [ 54.772299] lr : btnxpuart_flush+0x40/0x58 [btnxpuart] [ 54.782921] sp : ffff8000805ebca0 [ 54.782923] x29: ffff8000805ebca0 x28: ffffa5c6cf1869c0 x27: ffffa5c6cf186000 [ 54.782931] x26: ffff377b84852400 x25: ffff377b848523c0 x24: ffff377b845e7230 [ 54.782938] x23: ffffa5c6ce8dbe08 x22: ffffa5c6ceb65410 x21: 00000000ffffff92 [ 54.782945] x20: ffffa5c6ce8dbe98 x19: ffffffffffffffac x18: ffffffffffffffff [ 54.807651] x17: 0000000000000000 x16: ffffa5c6ce2824ec x15: ffff8001005eb857 [ 54.821917] x14: 0000000000000000 x13: ffffa5c6cf1a02e0 x12: 0000000000000642 [ 54.821924] x11: 0000000000000040 x10: ffffa5c6cf19d690 x9 : ffffa5c6cf19d688 [ 54.821931] x8 : ffff377b86000028 x7 : 0000000000000000 x6 : 0000000000000000 [ 54.821938] x5 : ffff377b86000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 54.843331] x2 : 0000000000000000 x1 : 0000000000000002 x0 : ffffffffffffffac [ 54.857599] Call trace: [ 54.857601] kfree_skb_reason+0x18/0xb0 [ 54.863878] btnxpuart_flush+0x40/0x58 [btnxpuart] [ 54.863888] hci_dev_open_sync+0x3a8/0xa04 [ 54.872773] hci_power_on+0x54/0x2e4 [ 54.881832] process_one_work+0x138/0x260 [ 54.881842] worker_thread+0x32c/0x438 [ 54.881847] kthread+0x118/0x11c [ 54.881853] ret_from_fork+0x10/0x20 [ 54.896406] Code: a9be7bfd 910003fd f9000bf3 aa0003f3 (b940d400) [ 54.896410] ---[ end trace 0000000000000000 ]--- Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com> Tested-by: Guillaume Legoupil <guillaume.legoupil@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
syzbot found the following issue on:
HEAD commit: 29ad81a1 arch/x86: add missing include to sparsemem.h
git tree: https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=100da362d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9c7da2160236454
dashboard link: https://syzkaller.appspot.com/bug?extid=b12240a286aa7cd4f3fb
compiler: Debian clang version 11.0.1-2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17e08faed00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=147a978ad00000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b12240a286aa7cd4f3fb@syzkaller.appspotmail.com
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches
The text was updated successfully, but these errors were encountered: