CPU freqency scaling issue. #12

swejuggalo · 2020-11-30T10:59:45Z

I've looked around and haven't really found a good answer or reason for this.
This seems to affect firmware from beta 2 and forward (In my case stock firmware and 8 Pro).
I have a constant frequency monitor app running and hoping to find a pattern of why and when it happens.

The problem in short can be described as refusal of using freqency lower than 1056MHz. It is still allowed to deep sleep but locked out from using lower freqencies ranges on all CPU cores. This can happen randomly. From within an hour to 1 or 2 days uptime.

I've seen a few commits to CPU scheduler. And some seems to touch this area but I don't really know if they can be the fault of this behavior.
Could any recent changes auto adjust the ranges allowed?

For reference.
https://forums.oneplus.com/threads/8-pro-beta-3-cpu-frequency-scaling.1340830/

swejuggalo · 2020-11-30T11:11:53Z

b619cde

097243c

And perhaps most relevant
7816143
f177186
6b3fd22

commit e3336ca upstream. We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg doesn't have any reclaimable memory. It can be easily reproduced as below: watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204] CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ OnePlusOSS#12 Call Trace: shrink_lruvec+0x49f/0x640 shrink_node+0x2a6/0x6f0 do_try_to_free_pages+0xe9/0x3e0 try_to_free_mem_cgroup_pages+0xef/0x1f0 try_charge+0x2c1/0x750 mem_cgroup_charge+0xd7/0x240 __add_to_page_cache_locked+0x2fd/0x370 add_to_page_cache_lru+0x4a/0xc0 pagecache_get_page+0x10b/0x2f0 filemap_fault+0x661/0xad0 ext4_filemap_fault+0x2c/0x40 __do_fault+0x4d/0xf9 handle_mm_fault+0x1080/0x1790 It only happens on our 1-vcpu instances, because there's no chance for oom reaper to run to reclaim the to-be-killed process. Add a cond_resched() at the upper shrink_node_memcgs() to solve this issue, this will mean that we will get a scheduling point for each memcg in the reclaimed hierarchy without any dependency on the reclaimable memory in that memcg thus making it more predictable. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

@2

WQ_MEM_RECLAIM cause this warning [20210226_12:29:50.153395]@2 workqueue: WQ_MEM_RECLAIM mhi_w:mhi_pm_st_worker is flushing !WQ_MEM_RECLAIM events_highpri:flush_backlog WARNING: CPU: 2 PID: 14326 at kernel/workqueue.c:2545 check_flush_dependency+0x118/0x120 [20210226_12:29:50.154078]@2 Modules linked in: [20210226_12:29:50.154261]@2 CPU: 2 PID: 14326 Comm: kworker/u17:0 Tainted: GFS O 4.19.113-perf+ OnePlusOSS#12 [20210226_12:29:50.154591]@2 Hardware name: Qualcomm Technologies, Inc. kona MTP dvt/mp 19811 14 15 52 53 (DT) [20210226_12:29:50.154928]@2 Workqueue: mhi_w mhi_pm_st_worker [20210226_12:29:50.155109]@2 pstate: 60c00085 (nZCv daIf +PAN +UAO) [20210226_12:29:50.155437]@2 pc : check_flush_dependency+0x118/0x120 [20210226_12:29:50.155765]@2 lr : check_flush_dependency+0x118/0x120 [20210226_12:29:50.155939]@2 sp : ffffff803019b870 [20210226_12:29:50.156115]@2 x29: ffffff803019b870 x28: 0000000000000003 [20210226_12:29:50.156444]@2 x27: ffffff801d79f0d8 x26: ffffff801d79f098 [20210226_12:29:50.156771]@2 x25: ffffff9b26937620 x24: ffffff9b26937620 [20210226_12:29:50.156948]@2 x23: ffffffeb3fb23f00 x22: ffffffe957278000 [20210226_12:29:50.157272]@2 x21: ffffffeaece2db00 x20: ffffffe9bfc24a00 [20210226_12:29:50.157448]@2 x19: ffffff9b25646e94 x18: 0000000000000001 [20210226_12:29:50.157775]@2 x17: 0000000000000031 x16: ffffff9b258dde18 [20210226_12:29:50.157953]@2 x15: ffffff9b25fee7f7 x14: 775f74735f6d705f [20210226_12:29:50.158131]@2 x13: 69686d3a775f6968 x12: 0000000000000000 [20210226_12:29:50.158461]@2 x11: 0000000000000000 x10: ffffffffffffffff [20210226_12:29:50.158640]@2 x9 : 5950ec8a8f378000 x8 : 5950ec8a8f378000 [20210226_12:29:50.158969]@2 x7 : 0000000000000086 x6 : ffffff9b26d70136 [20210226_12:29:50.159148]@2 x5 : 000000000000001a x4 : 0000000000000008 [20210226_12:29:50.159475]@2 x3 : 000000000000676f x2 : fffffffffffffffe [20210226_12:29:50.159803]@2 x1 : ffffff9b26d6d7fe x0 : 0000000000000086 [20210226_12:29:50.159980]@2 Call trace: [20210226_12:29:50.160156]@2 check_flush_dependency+0x118/0x120 [20210226_12:29:50.160484]@2 __flush_work+0xc0/0x294 [20210226_12:29:50.160660]@2 flush_work+0x10/0x1c [20210226_12:29:50.160986]@2 rollback_registered_many+0x208/0x61c [20210226_12:29:50.161163]@2 unregister_netdevice_queue+0xe0/0x168 [20210226_12:29:50.161491]@2 unregister_netdev+0x20/0x30 [20210226_12:29:50.161670]@2 mhi_netdev_remove+0x1b0/0x20c [20210226_12:29:50.162001]@2 mhi_driver_remove+0x2dc/0x3dc [20210226_12:29:50.162183]@2 device_release_driver_internal+0x16c/0x220 [20210226_12:29:50.162505]@2 device_release_driver+0x14/0x1c [20210226_12:29:50.162678]@2 bus_remove_device+0xd0/0xf4 [20210226_12:29:50.162998]@2 device_del+0x27c/0x4a8 [20210226_12:29:50.163175]@2 mhi_destroy_device+0x74/0xb0 [20210226_12:29:50.163495]@2 device_for_each_child+0x54/0xa4 [20210226_12:29:50.163672]@2 mhi_pm_st_worker+0x6d4/0xdd0 [20210226_12:29:50.164002]@2 process_one_work+0x1ec/0x438 [20210226_12:29:50.164181]@2 worker_thread+0x310/0x4b0 [20210226_12:29:50.164362]@2 kthread+0x114/0x124 [20210226_12:29:50.164695]@2 ret_from_fork+0x10/0x18

commit e3336ca upstream. We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg doesn't have any reclaimable memory. It can be easily reproduced as below: watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204] CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ OnePlusOSS#12 Call Trace: shrink_lruvec+0x49f/0x640 shrink_node+0x2a6/0x6f0 do_try_to_free_pages+0xe9/0x3e0 try_to_free_mem_cgroup_pages+0xef/0x1f0 try_charge+0x2c1/0x750 mem_cgroup_charge+0xd7/0x240 __add_to_page_cache_locked+0x2fd/0x370 add_to_page_cache_lru+0x4a/0xc0 pagecache_get_page+0x10b/0x2f0 filemap_fault+0x661/0xad0 ext4_filemap_fault+0x2c/0x40 __do_fault+0x4d/0xf9 handle_mm_fault+0x1080/0x1790 It only happens on our 1-vcpu instances, because there's no chance for oom reaper to run to reclaim the to-be-killed process. Add a cond_resched() at the upper shrink_node_memcgs() to solve this issue, this will mean that we will get a scheduling point for each memcg in the reclaimed hierarchy without any dependency on the reclaimable memory in that memcg thus making it more predictable. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

@2

WQ_MEM_RECLAIM cause this warning [20210226_12:29:50.153395]@2 workqueue: WQ_MEM_RECLAIM mhi_w:mhi_pm_st_worker is flushing !WQ_MEM_RECLAIM events_highpri:flush_backlog WARNING: CPU: 2 PID: 14326 at kernel/workqueue.c:2545 check_flush_dependency+0x118/0x120 [20210226_12:29:50.154078]@2 Modules linked in: [20210226_12:29:50.154261]@2 CPU: 2 PID: 14326 Comm: kworker/u17:0 Tainted: GFS O 4.19.113-perf+ OnePlusOSS#12 [20210226_12:29:50.154591]@2 Hardware name: Qualcomm Technologies, Inc. kona MTP dvt/mp 19811 14 15 52 53 (DT) [20210226_12:29:50.154928]@2 Workqueue: mhi_w mhi_pm_st_worker [20210226_12:29:50.155109]@2 pstate: 60c00085 (nZCv daIf +PAN +UAO) [20210226_12:29:50.155437]@2 pc : check_flush_dependency+0x118/0x120 [20210226_12:29:50.155765]@2 lr : check_flush_dependency+0x118/0x120 [20210226_12:29:50.155939]@2 sp : ffffff803019b870 [20210226_12:29:50.156115]@2 x29: ffffff803019b870 x28: 0000000000000003 [20210226_12:29:50.156444]@2 x27: ffffff801d79f0d8 x26: ffffff801d79f098 [20210226_12:29:50.156771]@2 x25: ffffff9b26937620 x24: ffffff9b26937620 [20210226_12:29:50.156948]@2 x23: ffffffeb3fb23f00 x22: ffffffe957278000 [20210226_12:29:50.157272]@2 x21: ffffffeaece2db00 x20: ffffffe9bfc24a00 [20210226_12:29:50.157448]@2 x19: ffffff9b25646e94 x18: 0000000000000001 [20210226_12:29:50.157775]@2 x17: 0000000000000031 x16: ffffff9b258dde18 [20210226_12:29:50.157953]@2 x15: ffffff9b25fee7f7 x14: 775f74735f6d705f [20210226_12:29:50.158131]@2 x13: 69686d3a775f6968 x12: 0000000000000000 [20210226_12:29:50.158461]@2 x11: 0000000000000000 x10: ffffffffffffffff [20210226_12:29:50.158640]@2 x9 : 5950ec8a8f378000 x8 : 5950ec8a8f378000 [20210226_12:29:50.158969]@2 x7 : 0000000000000086 x6 : ffffff9b26d70136 [20210226_12:29:50.159148]@2 x5 : 000000000000001a x4 : 0000000000000008 [20210226_12:29:50.159475]@2 x3 : 000000000000676f x2 : fffffffffffffffe [20210226_12:29:50.159803]@2 x1 : ffffff9b26d6d7fe x0 : 0000000000000086 [20210226_12:29:50.159980]@2 Call trace: [20210226_12:29:50.160156]@2 check_flush_dependency+0x118/0x120 [20210226_12:29:50.160484]@2 __flush_work+0xc0/0x294 [20210226_12:29:50.160660]@2 flush_work+0x10/0x1c [20210226_12:29:50.160986]@2 rollback_registered_many+0x208/0x61c [20210226_12:29:50.161163]@2 unregister_netdevice_queue+0xe0/0x168 [20210226_12:29:50.161491]@2 unregister_netdev+0x20/0x30 [20210226_12:29:50.161670]@2 mhi_netdev_remove+0x1b0/0x20c [20210226_12:29:50.162001]@2 mhi_driver_remove+0x2dc/0x3dc [20210226_12:29:50.162183]@2 device_release_driver_internal+0x16c/0x220 [20210226_12:29:50.162505]@2 device_release_driver+0x14/0x1c [20210226_12:29:50.162678]@2 bus_remove_device+0xd0/0xf4 [20210226_12:29:50.162998]@2 device_del+0x27c/0x4a8 [20210226_12:29:50.163175]@2 mhi_destroy_device+0x74/0xb0 [20210226_12:29:50.163495]@2 device_for_each_child+0x54/0xa4 [20210226_12:29:50.163672]@2 mhi_pm_st_worker+0x6d4/0xdd0 [20210226_12:29:50.164002]@2 process_one_work+0x1ec/0x438 [20210226_12:29:50.164181]@2 worker_thread+0x310/0x4b0 [20210226_12:29:50.164362]@2 kthread+0x114/0x124 [20210226_12:29:50.164695]@2 ret_from_fork+0x10/0x18

commit e3336ca upstream. We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg doesn't have any reclaimable memory. It can be easily reproduced as below: watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204] CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ OnePlusOSS#12 Call Trace: shrink_lruvec+0x49f/0x640 shrink_node+0x2a6/0x6f0 do_try_to_free_pages+0xe9/0x3e0 try_to_free_mem_cgroup_pages+0xef/0x1f0 try_charge+0x2c1/0x750 mem_cgroup_charge+0xd7/0x240 __add_to_page_cache_locked+0x2fd/0x370 add_to_page_cache_lru+0x4a/0xc0 pagecache_get_page+0x10b/0x2f0 filemap_fault+0x661/0xad0 ext4_filemap_fault+0x2c/0x40 __do_fault+0x4d/0xf9 handle_mm_fault+0x1080/0x1790 It only happens on our 1-vcpu instances, because there's no chance for oom reaper to run to reclaim the to-be-killed process. Add a cond_resched() at the upper shrink_node_memcgs() to solve this issue, this will mean that we will get a scheduling point for each memcg in the reclaimed hierarchy without any dependency on the reclaimable memory in that memcg thus making it more predictable. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

@2

WQ_MEM_RECLAIM cause this warning [20210226_12:29:50.153395]@2 workqueue: WQ_MEM_RECLAIM mhi_w:mhi_pm_st_worker is flushing !WQ_MEM_RECLAIM events_highpri:flush_backlog WARNING: CPU: 2 PID: 14326 at kernel/workqueue.c:2545 check_flush_dependency+0x118/0x120 [20210226_12:29:50.154078]@2 Modules linked in: [20210226_12:29:50.154261]@2 CPU: 2 PID: 14326 Comm: kworker/u17:0 Tainted: GFS O 4.19.113-perf+ OnePlusOSS#12 [20210226_12:29:50.154591]@2 Hardware name: Qualcomm Technologies, Inc. kona MTP dvt/mp 19811 14 15 52 53 (DT) [20210226_12:29:50.154928]@2 Workqueue: mhi_w mhi_pm_st_worker [20210226_12:29:50.155109]@2 pstate: 60c00085 (nZCv daIf +PAN +UAO) [20210226_12:29:50.155437]@2 pc : check_flush_dependency+0x118/0x120 [20210226_12:29:50.155765]@2 lr : check_flush_dependency+0x118/0x120 [20210226_12:29:50.155939]@2 sp : ffffff803019b870 [20210226_12:29:50.156115]@2 x29: ffffff803019b870 x28: 0000000000000003 [20210226_12:29:50.156444]@2 x27: ffffff801d79f0d8 x26: ffffff801d79f098 [20210226_12:29:50.156771]@2 x25: ffffff9b26937620 x24: ffffff9b26937620 [20210226_12:29:50.156948]@2 x23: ffffffeb3fb23f00 x22: ffffffe957278000 [20210226_12:29:50.157272]@2 x21: ffffffeaece2db00 x20: ffffffe9bfc24a00 [20210226_12:29:50.157448]@2 x19: ffffff9b25646e94 x18: 0000000000000001 [20210226_12:29:50.157775]@2 x17: 0000000000000031 x16: ffffff9b258dde18 [20210226_12:29:50.157953]@2 x15: ffffff9b25fee7f7 x14: 775f74735f6d705f [20210226_12:29:50.158131]@2 x13: 69686d3a775f6968 x12: 0000000000000000 [20210226_12:29:50.158461]@2 x11: 0000000000000000 x10: ffffffffffffffff [20210226_12:29:50.158640]@2 x9 : 5950ec8a8f378000 x8 : 5950ec8a8f378000 [20210226_12:29:50.158969]@2 x7 : 0000000000000086 x6 : ffffff9b26d70136 [20210226_12:29:50.159148]@2 x5 : 000000000000001a x4 : 0000000000000008 [20210226_12:29:50.159475]@2 x3 : 000000000000676f x2 : fffffffffffffffe [20210226_12:29:50.159803]@2 x1 : ffffff9b26d6d7fe x0 : 0000000000000086 [20210226_12:29:50.159980]@2 Call trace: [20210226_12:29:50.160156]@2 check_flush_dependency+0x118/0x120 [20210226_12:29:50.160484]@2 __flush_work+0xc0/0x294 [20210226_12:29:50.160660]@2 flush_work+0x10/0x1c [20210226_12:29:50.160986]@2 rollback_registered_many+0x208/0x61c [20210226_12:29:50.161163]@2 unregister_netdevice_queue+0xe0/0x168 [20210226_12:29:50.161491]@2 unregister_netdev+0x20/0x30 [20210226_12:29:50.161670]@2 mhi_netdev_remove+0x1b0/0x20c [20210226_12:29:50.162001]@2 mhi_driver_remove+0x2dc/0x3dc [20210226_12:29:50.162183]@2 device_release_driver_internal+0x16c/0x220 [20210226_12:29:50.162505]@2 device_release_driver+0x14/0x1c [20210226_12:29:50.162678]@2 bus_remove_device+0xd0/0xf4 [20210226_12:29:50.162998]@2 device_del+0x27c/0x4a8 [20210226_12:29:50.163175]@2 mhi_destroy_device+0x74/0xb0 [20210226_12:29:50.163495]@2 device_for_each_child+0x54/0xa4 [20210226_12:29:50.163672]@2 mhi_pm_st_worker+0x6d4/0xdd0 [20210226_12:29:50.164002]@2 process_one_work+0x1ec/0x438 [20210226_12:29:50.164181]@2 worker_thread+0x310/0x4b0 [20210226_12:29:50.164362]@2 kthread+0x114/0x124 [20210226_12:29:50.164695]@2 ret_from_fork+0x10/0x18

swejuggalo · 2021-06-13T10:58:36Z

New discovery. Pretty sure it applies to both 8/9 non Pro and Pro non modified kernels.

Turning OFF Bubbles in system settings removes this issue for me. So bubbles is apparently a trigger of this behaviour.

Problem remains though. Nothing should be able to trigger the CPU to get stuck like this.

[ Upstream commit 4224cfd7fb6523f7a9d1c8bb91bb5df1e38eb624 ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... OnePlusOSS#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 OnePlusOSS#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] OnePlusOSS#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] OnePlusOSS#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] OnePlusOSS#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] OnePlusOSS#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] OnePlusOSS#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] OnePlusOSS#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] OnePlusOSS#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 OnePlusOSS#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 OnePlusOSS#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 OnePlusOSS#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf OnePlusOSS#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 OnePlusOSS#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 OnePlusOSS#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 OnePlusOSS#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff OnePlusOSS#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f OnePlusOSS#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 0e435be ] crq->msgs could be NULL if the previous reset did not complete after freeing crq->msgs. Check for NULL before dereferencing them. Snippet of call trace: ... ibmvnic 30000003 env3 (unregistering): Releasing sub-CRQ ibmvnic 30000003 env3 (unregistering): Releasing CRQ BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0000000000c1a30 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(E-) rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag tun raw_diag inet_diag unix_diag bridge af_packet_diag netlink_diag stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibmvnic] CPU: 20 PID: 8426 Comm: kworker/20:0 Tainted: G E 5.10.0-rc1+ #12 Workqueue: events __ibmvnic_reset [ibmvnic] NIP: c0000000000c1a30 LR: c008000001b00c18 CTR: 0000000000000400 REGS: c00000000d05b7a0 TRAP: 0380 Tainted: G E (5.10.0-rc1+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002480 XER: 20040000 CFAR: c0000000000c19ec IRQMASK: 0 GPR00: 0000000000000400 c00000000d05ba30 c008000001b17c00 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000001e2 GPR08: 000000000001f400 ffffffffffffd950 0000000000000000 c008000001b0b280 GPR12: c0000000000c19c8 c00000001ec72e00 c00000000019a778 c00000002647b440 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000006 0000000000000001 0000000000000003 0000000000000002 GPR24: 0000000000001000 c008000001b0d570 0000000000000005 c00000007ab5d550 GPR28: c00000007ab5c000 c000000032fcf848 c00000007ab5cc00 c000000032fcf800 NIP [c0000000000c1a30] memset+0x68/0x104 LR [c008000001b00c18] ibmvnic_reset_crq+0x70/0x110 [ibmvnic] Call Trace: [c00000000d05ba30] [0000000000000800] 0x800 (unreliable) [c00000000d05bab0] [c008000001b0a930] do_reset.isra.40+0x224/0x634 [ibmvnic] [c00000000d05bb80] [c008000001b08574] __ibmvnic_reset+0x17c/0x3c0 [ibmvnic] [c00000000d05bc50] [c00000000018d9ac] process_one_work+0x2cc/0x800 [c00000000d05bd20] [c00000000018df58] worker_thread+0x78/0x520 [c00000000d05bdb0] [c00000000019a934] kthread+0x1c4/0x1d0 [c00000000d05be20] [c00000000000d5d0] ret_from_kernel_thread+0x5c/0x6c Fixes: 032c5e8 ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit c5c97cadd7ed13381cb6b4bef5c841a66938d350 ] The ubsan reported the following error. It was because sample's raw data missed u32 padding at the end. So it broke the alignment of the array after it. The raw data contains an u32 size prefix so the data size should have an u32 padding after 8-byte aligned data. 27: Sample parsing :util/synthetic-events.c:1539:4: runtime error: store to misaligned address 0x62100006b9bc for type '__u64' (aka 'unsigned long long'), which requires 8 byte alignment 0x62100006b9bc: note: pointer points here 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ #0 0x561532a9fc96 in perf_event__synthesize_sample util/synthetic-events.c:1539:13 #1 0x5615327f4a4f in do_test tests/sample-parsing.c:284:8 #2 0x5615327f3f50 in test__sample_parsing tests/sample-parsing.c:381:9 #3 0x56153279d3a1 in run_test tests/builtin-test.c:424:9 #4 0x56153279c836 in test_and_print tests/builtin-test.c:454:9 #5 0x56153279b7eb in __cmd_test tests/builtin-test.c:675:4 #6 0x56153279abf0 in cmd_test tests/builtin-test.c:821:9 #7 0x56153264e796 in run_builtin perf.c:312:11 #8 0x56153264cf03 in handle_internal_command perf.c:364:8 #9 0x56153264e47d in run_argv perf.c:408:2 #10 0x56153264c9a9 in main perf.c:538:3 #11 0x7f137ab6fbbc in __libc_start_main (/lib64/libc.so.6+0x38bbc) #12 0x561532596828 in _start ... SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use util/synthetic-events.c:1539:4 in Fixes: 045f8cd ("perf tests: Add a sample parsing test") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210214091638.519643-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This patch is to fix a crash: #3 [ffffb6580689f898] oops_end at ffffffffa2835bc2 #4 [ffffb6580689f8b8] no_context at ffffffffa28766e7 #5 [ffffb6580689f920] async_page_fault at ffffffffa320135e [exception RIP: f2fs_is_compressed_page+34] RIP: ffffffffa2ba83a2 RSP: ffffb6580689f9d8 RFLAGS: 00010213 RAX: 0000000000000001 RBX: fffffc0f50b34bc0 RCX: 0000000000002122 RDX: 0000000000002123 RSI: 0000000000000c00 RDI: fffffc0f50b34bc0 RBP: ffff97e815a40178 R8: 0000000000000000 R9: ffff97e83ffc9000 R10: 0000000000032300 R11: 0000000000032380 R12: ffffb6580689fa38 R13: fffffc0f50b34bc0 R14: ffff97e825cbd000 R15: 0000000000000c00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffffb6580689f9d8] __is_cp_guaranteed at ffffffffa2b7ea98 #7 [ffffb6580689f9f0] f2fs_submit_page_write at ffffffffa2b81a69 #8 [ffffb6580689fa30] f2fs_do_write_meta_page at ffffffffa2b99777 #9 [ffffb6580689fae0] __f2fs_write_meta_page at ffffffffa2b75f1a #10 [ffffb6580689fb18] f2fs_sync_meta_pages at ffffffffa2b77466 #11 [ffffb6580689fc98] do_checkpoint at ffffffffa2b78e46 #12 [ffffb6580689fd88] f2fs_write_checkpoint at ffffffffa2b79c29 #13 [ffffb6580689fdd0] f2fs_sync_fs at ffffffffa2b69d95 #14 [ffffb6580689fe20] sync_filesystem at ffffffffa2ad2574 #15 [ffffb6580689fe30] generic_shutdown_super at ffffffffa2a9b582 #16 [ffffb6580689fe48] kill_block_super at ffffffffa2a9b6d1 OnePlusOSS#17 [ffffb6580689fe60] kill_f2fs_super at ffffffffa2b6abe1 OnePlusOSS#18 [ffffb6580689fea0] deactivate_locked_super at ffffffffa2a9afb6 OnePlusOSS#19 [ffffb6580689feb8] cleanup_mnt at ffffffffa2abcad4 OnePlusOSS#20 [ffffb6580689fee0] task_work_run at ffffffffa28bca28 OnePlusOSS#21 [ffffb6580689ff00] exit_to_usermode_loop at ffffffffa28050b7 OnePlusOSS#22 [ffffb6580689ff38] do_syscall_64 at ffffffffa280560e OnePlusOSS#23 [ffffb6580689ff50] entry_SYSCALL_64_after_hwframe at ffffffffa320008c This occurred when umount f2fs if enable F2FS_FS_COMPRESSION with F2FS_IO_TRACE. Fixes it by adding IS_IO_TRACED_PAGE to check validity of pid for page_private. Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

https://bugzilla.kernel.org/show_bug.cgi?id=208565 PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init" #0 [<c0b420ec>] (__schedule) from [<c0b423c8>] #1 [<c0b423c8>] (schedule) from [<c0b459d4>] #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>] #3 [<c0b44fa0>] (down_read) from [<c044233c>] #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>] #5 [<c0442890>] (f2fs_truncate) from [<c044d408>] #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>] #7 [<c030be18>] (evict) from [<c030a558>] #8 [<c030a558>] (iput) from [<c047c600>] #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>] #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>] #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>] #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>] #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>] #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>] #15 [<c0324294>] (do_fsync) from [<c0324014>] #16 [<c0324014>] (sys_fsync) from [<c0108bc0>] This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where iput() requires f2fs_lock_op() again resulting in livelock. Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

…g the sock [ Upstream commit 3cf7203ca620682165706f70a1b12b5194607dce ] There is a race condition in vxlan that when deleting a vxlan device during receiving packets, there is a possibility that the sock is released after getting vxlan_sock vs from sk_user_data. Then in later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got NULL pointer dereference. e.g. #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757 OnePlusOSS#1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d OnePlusOSS#2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48 OnePlusOSS#3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b OnePlusOSS#4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb OnePlusOSS#5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542 OnePlusOSS#6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62 [exception RIP: vxlan_ecn_decapsulate+0x3b] RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246 RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000 RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700 RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700 R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 OnePlusOSS#7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan] OnePlusOSS#8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507 OnePlusOSS#9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45 OnePlusOSS#10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807 OnePlusOSS#11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951 OnePlusOSS#12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde OnePlusOSS#13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b OnePlusOSS#14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139 OnePlusOSS#15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a OnePlusOSS#16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3 OnePlusOSS#17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca OnePlusOSS#18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3 Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh Fix this by waiting for all sk_user_data reader to finish before releasing the sock. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 05bb0167c80b8f93c6a4e0451b7da9b96db990c2 ] ACPICA commit 770653e3ba67c30a629ca7d12e352d83c2541b1e Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x000021e4213b3302 in acpi_ds_init_aml_walk(struct acpi_walk_state*, union acpi_parse_object*, struct acpi_namespace_node*, u8*, u32, struct acpi_evaluate_info*, u8) ../../third_party/acpica/source/components/dispatcher/dswstate.c:682 <platform-bus-x86.so>+0x233302 realme-sm8250-devs#1.2 0x000020d0f660777f in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x3d77f realme-sm8250-devs#1.1 0x000020d0f660777f in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x3d77f realme-sm8250-devs#1 0x000020d0f660777f in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:387 <libclang_rt.asan.so>+0x3d77f realme-sm8250-devs#2 0x000020d0f660b96d in handlepointer_overflow_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:809 <libclang_rt.asan.so>+0x4196d realme-sm8250-devs#3 0x000020d0f660b50d in compiler-rt/lib/ubsan/ubsan_handlers.cpp:815 <libclang_rt.asan.so>+0x4150d OnePlusOSS#4 0x000021e4213b3302 in acpi_ds_init_aml_walk(struct acpi_walk_state*, union acpi_parse_object*, struct acpi_namespace_node*, u8*, u32, struct acpi_evaluate_info*, u8) ../../third_party/acpica/source/components/dispatcher/dswstate.c:682 <platform-bus-x86.so>+0x233302 OnePlusOSS#5 0x000021e4213e2369 in acpi_ds_call_control_method(struct acpi_thread_state*, struct acpi_walk_state*, union acpi_parse_object*) ../../third_party/acpica/source/components/dispatcher/dsmethod.c:605 <platform-bus-x86.so>+0x262369 OnePlusOSS#6 0x000021e421437fac in acpi_ps_parse_aml(struct acpi_walk_state*) ../../third_party/acpica/source/components/parser/psparse.c:550 <platform-bus-x86.so>+0x2b7fac OnePlusOSS#7 0x000021e4214464d2 in acpi_ps_execute_method(struct acpi_evaluate_info*) ../../third_party/acpica/source/components/parser/psxface.c:244 <platform-bus-x86.so>+0x2c64d2 OnePlusOSS#8 0x000021e4213aa052 in acpi_ns_evaluate(struct acpi_evaluate_info*) ../../third_party/acpica/source/components/namespace/nseval.c:250 <platform-bus-x86.so>+0x22a052 OnePlusOSS#9 0x000021e421413dd8 in acpi_ns_init_one_device(acpi_handle, u32, void*, void**) ../../third_party/acpica/source/components/namespace/nsinit.c:735 <platform-bus-x86.so>+0x293dd8 OnePlusOSS#10 0x000021e421429e98 in acpi_ns_walk_namespace(acpi_object_type, acpi_handle, u32, u32, acpi_walk_callback, acpi_walk_callback, void*, void**) ../../third_party/acpica/source/components/namespace/nswalk.c:298 <platform-bus-x86.so>+0x2a9e98 OnePlusOSS#11 0x000021e4214131ac in acpi_ns_initialize_devices(u32) ../../third_party/acpica/source/components/namespace/nsinit.c:268 <platform-bus-x86.so>+0x2931ac OnePlusOSS#12 0x000021e42147c40d in acpi_initialize_objects(u32) ../../third_party/acpica/source/components/utilities/utxfinit.c:304 <platform-bus-x86.so>+0x2fc40d OnePlusOSS#13 0x000021e42126d603 in acpi::acpi_impl::initialize_acpi(acpi::acpi_impl*) ../../src/devices/board/lib/acpi/acpi-impl.cc:224 <platform-bus-x86.so>+0xed603 Add a simple check that avoids incrementing a pointer by zero, but otherwise behaves as before. Note that our findings are against ACPICA 20221020, but the same code exists on master. Link: acpica/acpica@770653e3 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 0b0747d507bffb827e40fc0f9fb5883fffc23477 ] The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 OnePlusOSS#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 OnePlusOSS#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 OnePlusOSS#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 OnePlusOSS#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 OnePlusOSS#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 OnePlusOSS#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 OnePlusOSS#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 OnePlusOSS#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 OnePlusOSS#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 OnePlusOSS#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 OnePlusOSS#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 OnePlusOSS#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 OnePlusOSS#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 OnePlusOSS#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 OnePlusOSS#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 OnePlusOSS#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 OnePlusOSS#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 OnePlusOSS#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 OnePlusOSS#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 OnePlusOSS#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 OnePlusOSS#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 OnePlusOSS#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 OnePlusOSS#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 OnePlusOSS#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 OnePlusOSS#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 OnePlusOSS#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit a154f5f643c6ecddd44847217a7a3845b4350003 ] The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f OnePlusOSS#1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 OnePlusOSS#2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee OnePlusOSS#3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 OnePlusOSS#4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 OnePlusOSS#5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c OnePlusOSS#6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] OnePlusOSS#7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] OnePlusOSS#8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f OnePlusOSS#9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 OnePlusOSS#10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] OnePlusOSS#11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc OnePlusOSS#12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] OnePlusOSS#13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] OnePlusOSS#14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] OnePlusOSS#15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] OnePlusOSS#16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 OnePlusOSS#17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] OnePlusOSS#18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] OnePlusOSS#19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 OnePlusOSS#20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

swejuggalo mentioned this issue May 9, 2021

CPU frequency gets stuck OnePlusOSS/android_kernel_oneplus_sm8350#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU freqency scaling issue. #12

CPU freqency scaling issue. #12

swejuggalo commented Nov 30, 2020

swejuggalo commented Nov 30, 2020

swejuggalo commented Jun 13, 2021

CPU freqency scaling issue. #12

CPU freqency scaling issue. #12

Comments

swejuggalo commented Nov 30, 2020

swejuggalo commented Nov 30, 2020

swejuggalo commented Jun 13, 2021