Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RK3328 : 4K doesn't get any modes over 30 fps. #15

Closed
LongChair opened this issue Jun 17, 2017 · 5 comments
Closed

RK3328 : 4K doesn't get any modes over 30 fps. #15

LongChair opened this issue Jun 17, 2017 · 5 comments

Comments

@LongChair
Copy link
Collaborator

When I connect my RK3328 box to the 4K display i have which supports 4K@60 modes, i can't seem to get that mode.

The following modes will be listed :

cat /sys/class/drm/card0-HDMI-A-1/modes
4096x2160p30
4096x2160p30
4096x2160p25
4096x2160p24
4096x2160p24
3840x2160p30
3840x2160p30
3840x2160p25
3840x2160p24
3840x2160p24
1920x1080p60
1920x1080p60
1920x1080i60
1920x1080i60
1920x1080p50
1920x1080i50
1920x1080i48
1920x1080p30
1920x1080p30
1920x1080p25
1920x1080p24
1920x1080p24
1280x720p60
1280x720p60
1280x720p50
800x600p60
720x576p50
720x576i50
720x480p60
720x480i60

I grabbed the edid from :

hexdump /sys/class/drm/card0-HDMI-A-1/edid
0000000 ff00 ffff ffff 00ff b358 3700 0000 0000
0000010 1901 0301 5f80 7836 cf0a a374 4c57 23b0
0000020 4809 af4c 80ef 00b3 0095 40a9 4090 0081
0000030 8081 4081 0101 e808 3000 70f2 805a 58b0
0000040 008a 1d50 0074 1e00 3a02 1880 3871 402d
0000050 2c58 0045 1d50 0074 1e00 0000 fc00 3400
0000060 5533 4448 4c5f 4443 545f 0a56 0000 fd00
0000070 3000 0f3e 3c46 0a00 2020 2020 2020 c301
0000080 0302 f147 015c 0706 0302 1615 1211 0413
0000090 0514 901f 2120 5d22 5f5e 6160 6362 6564
00000a0 2c66 0709 1507 5007 063f 57c0 0006 0183
00000b0 0000 036e 000c 0020 3cf8 0020 0480 0203
00000c0 e501 000f 6000 010c 801d 733e 2d38 7e40
00000d0 452c 0080 52d0 0000 011e 801d 72d0 2d1c
00000e0 1020 252c 0080 52d0 0000 009e 0000 0000
00000f0 0000 0000 0000 0000 0000 0000 0000 f800
0000100

@Kwiboo showed me a site where you can get this interpreted http://www.edidreader.com/ and the EDID seems to report correct modes there, but they are not usable on 3328.

@yanghanxing : any thoughts ?

This was referenced Jun 19, 2017
@LongChair
Copy link
Collaborator Author

@zheng2012 : let me know if there is anything i can test to fix this :)

@Kwiboo
Copy link
Owner

Kwiboo commented Jun 25, 2017

At https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c#L459-L460 there is a check for 300Mhz that limits the use of 4K 50/60hz modes.

@LongChair
Copy link
Collaborator Author

@yanghanxing : i tried the patch you provided on ROCK64.

It gives somehow different results as it will not list the 4K@60Hz modes and will tell it is the current mode.
Although my 4K TV will say there is no signal.

I grabbed the kernel log :

kernel.txt

Kwiboo pushed a commit that referenced this issue Jul 28, 2017
commit 2474623 upstream.

When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
instead of the expected SIGSEGV signal.

This example shows how the kernel faults:
do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000

The vma->vm_end value is the first address which does not belong to the vma, so
adjust the check to include vma->vm_end to the range for which to send the
SIGSEGV signal.

This patch unbreaks building the debian libsigsegv package.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@Kwiboo
Copy link
Owner

Kwiboo commented Jul 30, 2017

The recent kernel update seems to detect a lot more modes, but all modes do not seem to work and some produce 'no-signal' message on TV, I will raise a new issue regarding non-working modes.

EDID

LibreELEC:~ # hexdump -C /sys/class/drm/card0-HDMI-A-1/edid
00000000  00 ff ff ff ff ff ff 00  4d d9 03 c8 01 01 01 01  |........M.......|
00000010  01 19 01 03 80 90 51 78  0a 0d c9 a0 57 47 98 27  |......Qx....WG.'|
00000020  12 48 4c 21 08 00 81 80  a9 c0 71 4f b3 00 01 01  |.HL!......qO....|
00000030  01 01 01 01 01 01 02 3a  80 18 71 38 2d 40 58 2c  |.......:..q8-@X,|
00000040  45 00 9f 29 53 00 00 1e  01 1d 00 72 51 d0 1e 20  |E..)S......rQ.. |
00000050  6e 28 55 00 9f 29 53 00  00 1e 00 00 00 fc 00 53  |n(U..)S........S|
00000060  4f 4e 59 20 54 56 20 20  2a 30 32 0a 00 00 00 fd  |ONY TV  *02.....|
00000070  00 30 3e 0e 46 3c 00 0a  20 20 20 20 20 20 01 df  |.0>.F<..      ..|
00000080  02 03 60 f0 5b 61 60 5d  5e 5f 62 1f 10 14 05 13  |..`.[a`]^_b.....|
00000090  04 20 22 3c 3e 12 16 03  07 11 15 02 06 01 65 66  |. "<>.........ef|
000000a0  29 0d 7f 07 15 07 50 3d  07 bc 83 0f 00 00 78 03  |).....P=......x.|
000000b0  0c 00 10 00 b8 3c 2f d0  8a 01 02 03 04 01 40 1f  |.....</.......@.|
000000c0  c0 80 90 d0 e0 f0 d6 67  d8 5d c4 01 78 80 01 e2  |.......g.]..x...|
000000d0  00 f9 e3 05 ff 01 e5 0f  03 00 00 06 e3 06 0d 01  |................|
000000e0  01 1d 80 18 71 1c 16 20  58 2c 25 00 9f 29 53 00  |....q.. X,%..)S.|
000000f0  00 9e 00 00 00 00 00 00  00 00 00 00 00 00 00 62  |...............b|

Reported modes

LibreELEC:~ # cat /sys/class/drm/card0-HDMI-A-1/modes
1920x1080p60
4096x2160p60
4096x2160p60
4096x2160p60
4096x2160p50
4096x2160p50
4096x2160p24
4096x2160p24
3840x2160p60
3840x2160p60
3840x2160p60
3840x2160p50
3840x2160p50
3840x2160p30
3840x2160p30
3840x2160p25
3840x2160p24
3840x2160p24
1920x1080p60
1920x1080p60
1920x1080p60
1920x1080p60
1920x1080p60
1920x1080i60
1920x1080i60
1920x1080i60
1920x1080i60
1920x1080i60
1920x1080i60
1920x1080i60
1920x1080i60
1920x1080p50
1920x1080p50
1920x1080p50
1920x1080i50
1920x1080i50
1920x1080i50
1920x1080i50
1920x1080p30
1920x1080p30
1920x1080p30
1920x1080p30
1920x1080p30
1920x1080p30
1920x1080p24
1920x1080p24
1920x1080p24
1920x1080p24
1920x1080p24
1920x1080p24
1920x1080p24
1920x1080p24
1280x720p60
1280x720p60
1280x720p60
1280x720p60
1280x720p60
1280x720p60
1280x720p60
1280x720p60
1280x720p50
1280x720p50
1280x720p50
1280x720p50
1280x720p30
1280x720p30
1280x720p30
1280x720p30
1280x720p24
1280x720p24
1280x720p24
1280x720p24
800x600p60
720x576p50
720x576i50
720x480p60
720x480i60

@Kwiboo Kwiboo closed this as completed Jul 30, 2017
Kwiboo pushed a commit that referenced this issue Mar 1, 2018
[ Upstream commit 293d264 ]

drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
On PowerNV platform cpu_present could be less than cpu_possible in cases
where firmware detects the cpu, but it is not available to the OS.  When
CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
we skip creating cpu_device.

This breaks cpuidle on powernv where register_cpu() is not called for
cpus in cpu_possible_mask that cannot be hot-added at runtime.

Trying cpuidle_register_device() on cpu without cpu_device will cause
crash like this:

cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
   msr: 9000000000009033
   dar: 6000000060000000
  current = 0xc000000ff1480000
  paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/8
Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4
(Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
enter ? for help
[link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
[c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
[c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
[c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
[c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
[c000000ff15038c0] c000000000619694 printk+0x38/0x4c
[c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
[c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
[c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
[c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
[c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
[c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
[c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
[c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
[c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
[c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
[c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

This patch fixes the bug by passing correct cpumask from
powernv-cpuidle driver.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
[ rjw: Comment massage ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kwiboo pushed a commit that referenced this issue Jul 3, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@prisoner881
Copy link

prisoner881 commented Aug 4, 2018

Anything new on this? My PINE64 RK3328 shows this after booting LibreELEC Alpha 3 from MicroSD:

LibreELEC:~ # cat /sys/class/drm/card0-HDMI-A-1/modes
1280x720p60
1920x1080p60
1920x1080p50
1280x720p50
720x576p50
720x480p60

This is hooked directly to an LG 4K TV. My prior ODroid C2 automatically recognized the 4K resolutions but the RK3328 will not. I don't know if this is a hardware issue or something to do with the LibreELEC Alpha (I suspect the Alpha).

Is there a way to force the 4K resolutions to appear?

I'm running LibreELEC-RK3328.arm-9.0-nightly-20180803-c26408a-rock64.img.gz as my image, downloaded from http://test.libreelec.tv/

Kwiboo pushed a commit that referenced this issue Dec 15, 2018
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 rockchip-linux#38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 rockchip-linux#39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 rockchip-linux#40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 rockchip-linux#41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 rockchip-linux#42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 rockchip-linux#43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 rockchip-linux#44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 rockchip-linux#45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 rockchip-linux#46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 rockchip-linux#47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 rockchip-linux#48 [9a06f388]  536 bytes  wb_workfn at a28228
 rockchip-linux#49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 rockchip-linux#50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 rockchip-linux#51 [9a06fe40]  104 bytes  kthread at 26545a
 rockchip-linux#52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Kwiboo pushed a commit that referenced this issue Dec 15, 2018
info->nr_rings isn't adjusted in case of ENOMEM error from
negotiate_mq(). This leads to kernel panic in error path.

Typical call stack involving panic -
 #8 page_fault at ffffffff8175936f
    [exception RIP: blkif_free_ring+33]
    RIP: ffffffffa0149491  RSP: ffff8804f7673c08  RFLAGS: 00010292
 ...
 #9 blkif_free at ffffffffa0149aaa [xen_blkfront]
 #10 talk_to_blkback at ffffffffa014c8cd [xen_blkfront]
 #11 blkback_changed at ffffffffa014ea8b [xen_blkfront]
 #12 xenbus_otherend_changed at ffffffff81424670
 #13 backend_changed at ffffffff81426dc3
 #14 xenwatch_thread at ffffffff81422f29
 #15 kthread at ffffffff810abe6a
 #16 ret_from_fork at ffffffff81754078

Cc: stable@vger.kernel.org
Fixes: 7ed8ce1 ("xen-blkfront: move negotiate_mq to cover all cases of new VBDs")
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Kwiboo pushed a commit that referenced this issue Apr 6, 2019
[ Upstream commit 0c81585 ]

After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining.  At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().

    BUG: unable to handle kernel paging request at ffff888453d00000
    PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
    Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
    RIP: 0010:scan_block+0xb5/0x290
    Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
    RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
    RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
    RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
    R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
    R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
    FS:  00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
    Call Trace:
     scan_gray_list+0x269/0x430
     kmemleak_scan+0x5a8/0x10f0
     kmemleak_write+0x541/0x6ca
     full_proxy_write+0xf8/0x190
     __vfs_write+0xeb/0x980
     vfs_write+0x15a/0x4f0
     ksys_write+0xd2/0x1b0
     __x64_sys_write+0x73/0xb0
     do_syscall_64+0xeb/0xaaa
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f6c0dad73b8
    Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
    RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
    RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
    RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
    R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
    R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
    Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
    CR2: ffff888453d00000
    ---[ end trace ccf646c7456717c5 ]---
    Kernel panic - not syncing: Fatal exception
    Shutting down cpus with NMI
    Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
    0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kwiboo pushed a commit that referenced this issue Apr 6, 2019
[ Upstream commit 2e25644 ]

Syzbot with KMSAN reports (excerpt):

==================================================================
BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:353 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
CPU: 1 PID: 17420 Comm: syz-executor4 Not tainted 4.20.0-rc7+ #15
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x173/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
  __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
  mpol_rebind_policy mm/mempolicy.c:353 [inline]
  mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
  update_tasks_nodemask+0x608/0xca0 kernel/cgroup/cpuset.c:1120
  update_nodemasks_hier kernel/cgroup/cpuset.c:1185 [inline]
  update_nodemask kernel/cgroup/cpuset.c:1253 [inline]
  cpuset_write_resmask+0x2a98/0x34b0 kernel/cgroup/cpuset.c:1728

...

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
  kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
  kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
  kmem_cache_alloc+0x572/0xb90 mm/slub.c:2777
  mpol_new mm/mempolicy.c:276 [inline]
  do_mbind mm/mempolicy.c:1180 [inline]
  kernel_mbind+0x8a7/0x31a0 mm/mempolicy.c:1347
  __do_sys_mbind mm/mempolicy.c:1354 [inline]

As it's difficult to report where exactly the uninit value resides in
the mempolicy object, we have to guess a bit.  mm/mempolicy.c:353
contains this part of mpol_rebind_policy():

        if (!mpol_store_user_nodemask(pol) &&
            nodes_equal(pol->w.cpuset_mems_allowed, *newmask))

"mpol_store_user_nodemask(pol)" is testing pol->flags, which I couldn't
ever see being uninitialized after leaving mpol_new().  So I'll guess
it's actually about accessing pol->w.cpuset_mems_allowed on line 354,
but still part of statement starting on line 353.

For w.cpuset_mems_allowed to be not initialized, and the nodes_equal()
reachable for a mempolicy where mpol_set_nodemask() is called in
do_mbind(), it seems the only possibility is a MPOL_PREFERRED policy
with empty set of nodes, i.e.  MPOL_LOCAL equivalent, with MPOL_F_LOCAL
flag.  Let's exclude such policies from the nodes_equal() check.  Note
the uninit access should be benign anyway, as rebinding this kind of
policy is always a no-op.  Therefore no actual need for stable
inclusion.

Link: http://lkml.kernel.org/r/a71997c3-e8ae-a787-d5ce-3db05768b27c@suse.cz
Link: http://lkml.kernel.org/r/73da3e9c-cc84-509e-17d9-0c434bb9967d@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: syzbot+b19c2dc2c990ea657a71@syzkaller.appspotmail.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kwiboo pushed a commit that referenced this issue Jul 1, 2019
[ Upstream commit 36a2ba0 ]

In a system where, through IORT firmware mappings, the SMMU device is
mapped to a NUMA node that is not online, the kernel bootstrap results
in the following crash:

  Unable to handle kernel paging request at virtual address 0000000000001388
  Mem abort info:
    ESR = 0x96000004
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000004
    CM = 0, WnR = 0
  [0000000000001388] user address but active_mm is swapper
  Internal error: Oops: 96000004 [#1] SMP
  Modules linked in:
  CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
  pstate: 80c00009 (Nzcv daif +PAN +UAO)
  pc : __alloc_pages_nodemask+0x13c/0x1068
  lr : __alloc_pages_nodemask+0xdc/0x1068
  ...
  Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
  Call trace:
   __alloc_pages_nodemask+0x13c/0x1068
   new_slab+0xec/0x570
   ___slab_alloc+0x3e0/0x4f8
   __slab_alloc+0x60/0x80
   __kmalloc_node_track_caller+0x10c/0x478
   devm_kmalloc+0x44/0xb0
   pinctrl_bind_pins+0x4c/0x188
   really_probe+0x78/0x2b8
   driver_probe_device+0x64/0x110
   device_driver_attach+0x74/0x98
   __driver_attach+0x9c/0xe8
   bus_for_each_dev+0x84/0xd8
   driver_attach+0x30/0x40
   bus_add_driver+0x170/0x218
   driver_register+0x64/0x118
   __platform_driver_register+0x54/0x60
   arm_smmu_driver_init+0x24/0x2c
   do_one_initcall+0xbc/0x328
   kernel_init_freeable+0x304/0x3ac
   kernel_init+0x18/0x110
   ret_from_fork+0x10/0x1c
  Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
  ---[ end trace dfeaed4c373a32da ]--

Change the dev_set_proximity() hook prototype so that it returns a
value and make it return failure if the PXM->NUMA-node mapping
corresponds to an offline node, fixing the crash.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/linux-arm-kernel/20190315021940.86905-1-wangkefeng.wang@huawei.com/
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kwiboo pushed a commit that referenced this issue Jul 1, 2019
commit 30d4057 upstream.

[BUG]
When a fs has orphan reloc tree along with unfinished balance:
  ...
        item 16 key (TREE_RELOC ROOT_ITEM FS_TREE) itemoff 12090 itemsize 439
                generation 12 root_dirid 256 bytenr 300400640 level 1 refs 0 <<<
                lastsnap 8 byte_limit 0 bytes_used 1359872 flags 0x0(none)
                uuid 7c48d938-33a3-4aae-ab19-6e5c9d406e46
        item 17 key (BALANCE TEMPORARY_ITEM 0) itemoff 11642 itemsize 448
                temporary item objectid BALANCE offset 0
                balance status flags 14

Then at mount time, we can hit the following kernel BUG_ON():
  BTRFS info (device dm-3): relocating block group 298844160 flags metadata|dup
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/relocation.c:1413!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 897 Comm: btrfs-balance Tainted: G           O      5.2.0-rc1-custom #15
  RIP: 0010:create_reloc_root+0x1eb/0x200 [btrfs]
  Call Trace:
   btrfs_init_reloc_root+0x96/0xb0 [btrfs]
   record_root_in_trans+0xb2/0xe0 [btrfs]
   btrfs_record_root_in_trans+0x55/0x70 [btrfs]
   select_reloc_root+0x7e/0x230 [btrfs]
   do_relocation+0xc4/0x620 [btrfs]
   relocate_tree_blocks+0x592/0x6a0 [btrfs]
   relocate_block_group+0x47b/0x5d0 [btrfs]
   btrfs_relocate_block_group+0x183/0x2f0 [btrfs]
   btrfs_relocate_chunk+0x4e/0xe0 [btrfs]
   btrfs_balance+0x864/0xfa0 [btrfs]
   balance_kthread+0x3b/0x50 [btrfs]
   kthread+0x123/0x140
   ret_from_fork+0x27/0x50

[CAUSE]
In btrfs, reloc trees are used to record swapped tree blocks during
balance.
Reloc tree either get merged (replace old tree blocks of its parent
subvolume) in next transaction if its ref is 1 (fresh).
Or is already merged and will be cleaned up if its ref is 0 (orphan).

After commit d2311e6 ("btrfs: relocation: Delay reloc tree deletion
after merge_reloc_roots"), reloc tree cleanup is delayed until one block
group is balanced.

Since fresh reloc roots are recorded during merge, as long as there
is no power loss, those orphan reloc roots converted from fresh ones are
handled without problem.

However when power loss happens, orphan reloc roots can be recorded
on-disk, thus at next mount time, we will have orphan reloc roots from
on-disk data directly, and ignored by clean_dirty_subvols() routine.

Then when background balance starts to balance another block group, and
needs to create new reloc root for the same root, btrfs_insert_item()
returns -EEXIST, and trigger that BUG_ON().

[FIX]
For orphan reloc roots, also queue them to rc->dirty_subvol_roots, so
all reloc roots no matter orphan or not, can be cleaned up properly and
avoid above BUG_ON().

And to cooperate with above change, clean_dirty_subvols() will check if
the queued root is a reloc root or a subvol root.
For a subvol root, do the old work, and for a orphan reloc root, clean it
up.

Fixes: d2311e6 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
CC: stable@vger.kernel.org # 5.1
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kwiboo pushed a commit that referenced this issue Jul 1, 2019
commit f01098c upstream.

Just like the case of commit 8b05a3a ("tracing/kprobes: Fix NULL
pointer dereference in trace_kprobe_create()"), writing an incorrectly
formatted string to uprobe_events can trigger NULL pointer dereference.

Reporeducer:

  # echo r > /sys/kernel/debug/tracing/uprobe_events

dmesg:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ #15
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  RIP: 0010:strchr+0x0/0x30
  Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c
  RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293

  RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625
  RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000
  RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000
  FS:  00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0
  Call Trace:
   trace_uprobe_create+0xe6/0xb10
   ? __kmalloc_track_caller+0xe6/0x1c0
   ? __kmalloc+0xf0/0x1d0
   ? trace_uprobe_create+0xb10/0xb10
   create_or_delete_trace_uprobe+0x35/0x90
   ? trace_uprobe_create+0xb10/0xb10
   trace_run_command+0x9c/0xb0
   trace_parse_run_command+0xf9/0x1eb
   ? probes_open+0x80/0x80
   __vfs_write+0x43/0x90
   vfs_write+0x14a/0x2a0
   ksys_write+0xa2/0x170
   do_syscall_64+0x7f/0x200
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Link: http://lkml.kernel.org/r/20190614074026.8045-1-devel@etsukata.com

Cc: stable@vger.kernel.org
Fixes: 0597c49 ("tracing/uprobes: Use dyn_event framework for uprobe events")
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kwiboo pushed a commit that referenced this issue Aug 29, 2019
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#15:
[1] https://lore.kernel.org/linux-arm-kernel/20190819114420.2535-1-walter-zh.wu@mediatek.com/

WARNING: Use #include <linux/io.h> instead of <asm/io.h>
rockchip-linux#38: FILE: lib/test_kasan.c:22:
+#include <asm/io.h>

total: 0 errors, 2 warnings, 59 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/lib-test_kasan-add-roundtrip-tests.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Kwiboo pushed a commit that referenced this issue Jul 20, 2020
The following deadlock was captured. The first process is holding 'kernfs_mutex'
and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device,
this pending bio list would be flushed by second process 'md127_raid1', but
it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace
sysfs_notify() can fix it. There were other sysfs_notify() invoked from io
path, removed all of them.

 PID: 40430  TASK: ffff8ee9c8c65c40  CPU: 29  COMMAND: "probe_file"
  #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec
  #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06
  #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6
  #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs]
  #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs]
  #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs]
  #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs]
  #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs]
  #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs]
  #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7
 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93
 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968
 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2
 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445
 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f
 #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a
 #16 [ffffb87c4df37958] new_slab at ffffffff9a251430
 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85
 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d
 #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89
 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10
 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854
 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377
 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b
 #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118
 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83
 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619
 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af
 #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566
 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787
 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d
 #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e
 #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949
 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad
     RIP: 00007f617a5f2905  RSP: 00007f607334f838  RFLAGS: 00000246
     RAX: ffffffffffffffda  RBX: 00007f6064044b20  RCX: 00007f617a5f2905
     RDX: 00007f6064044b20  RSI: 00007f6064044b20  RDI: 00007f6064005890
     RBP: 00007f6064044aa0   R8: 0000000000000030   R9: 000000000000011c
     R10: 0000000000000013  R11: 0000000000000246  R12: 00007f606417e6d0
     R13: 00007f6064044aa0  R14: 00007f6064044b10  R15: 00000000ffffffff
     ORIG_RAX: 0000000000000006  CS: 0033  SS: 002b

 PID: 927    TASK: ffff8f15ac5dbd80  CPU: 42  COMMAND: "md127_raid1"
  #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec
  #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06
  #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e
  #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc
  #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013
  #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f
  #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83
  #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a
  #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696
  #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5
 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c
 #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1]
 #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348
 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005
 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Kwiboo pushed a commit that referenced this issue Jan 5, 2023
RK3128 vop-iommu skip request the irq due to register read issue.
So don't free irq when shut down or belowing error will output

[  102.107589] WARNING: CPU: 3 PID: 1013 at kernel/irq/devres.c:143 devm_free_irq+0x68/0x9c
[  102.115720] Modules linked in:
[  102.118862] CPU: 3 PID: 1013 Comm: init Not tainted 5.10.110 #15
[  102.124907] Hardware name: Generic DT based system
[  102.129732] Backtrace:
[  102.132230] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.139843]  r7:600f0013 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.145553] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.153171] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.160780]  r7:00000000 r6:00000000 r5:00000009 r4:c017ac34
[  102.166488] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.173497] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.181026]  r9:00000000 r8:00000009 r7:c017ac34 r6:0000008f r5:c0df4ea4 r4:c353c000
[  102.188820] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017ac34>] (devm_free_irq+0x68/0x9c)
[  102.196962]  r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e r4:c353c000
[  102.204759] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.212895]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.217558] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.226385]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.232094] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.240938] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.249684]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.257548]  r4:00000000
[  102.260124] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.268785] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.277004]  r5:c11168ec r4:00000000
[  102.280617] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.288326]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.296121] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.303821] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.308912] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.317141] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.325364] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.330545] ---[ end trace adc766c58fa6634f ]---
[  102.335275] ------------[ cut here ]------------
[  102.339948] WARNING: CPU: 3 PID: 1013 at kernel/irq/manage.c:1756 free_irq+0x26c/0x29c
[  102.347907] Trying to free already-free IRQ 46
[  102.352381] Modules linked in:
[  102.355479] CPU: 3 PID: 1013 Comm: init Tainted: G        W         5.10.110 #15
[  102.362907] Hardware name: Generic DT based system
[  102.367721] Backtrace:
[  102.370212] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.377824]  r7:600f0093 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.383532] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.391150] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.398759]  r7:c353dd34 r6:00000000 r5:00000009 r4:c017823c
[  102.404466] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.411473] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.419002]  r9:c0df48e7 r8:00000009 r7:c017823c r6:000006dc r5:c0df4854 r4:c353c000
[  102.426799] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017823c>] (free_irq+0x26c/0x29c)
[  102.434676]  r9:600f0013 r8:0000002e r7:c1a17f40 r6:c1978a6c r5:00000000 r4:c1978a00
[  102.442471] [<c0177fd0>] (free_irq) from [<c017ac40>] (devm_free_irq+0x74/0x9c)
[  102.449832]  r10:c197ec54 r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e
[  102.457700]  r4:c353c000
[  102.460278] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.468414]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.473075] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.481901]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.487611] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.496453] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.505200]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.513066]  r4:00000000
[  102.515642] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.524302] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.532521]  r5:c11168ec r4:00000000
[  102.536134] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.543833]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.551626] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.559324] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.564414] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.572641] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.580864] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.585947] ---[ end trace adc766c58fa66350 ]---

Change-Id: Ic0603d4d00528dc6b5ef6d480b15d3c14585dec3
Signed-off-by: Simon Xue <xxm@rock-chips.com>
Kwiboo pushed a commit that referenced this issue Jun 17, 2023
Found by leak sanitizer:
```
==1632594==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 21 byte(s) in 1 object(s) allocated from:
    #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
    #1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369
    #2 0x556701d70589 in perf_env__cpuid util/env.c:465
    #3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14
    #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83
    #5 0x556701d8f78b in evsel__config util/evsel.c:1366
    #6 0x556701ef5872 in evlist__config util/record.c:108
    #7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112
    #8 0x556701cacd07 in run_test tests/builtin-test.c:236
    #9 0x556701cacfac in test_and_print tests/builtin-test.c:265
    #10 0x556701cadddb in __cmd_test tests/builtin-test.c:402
    #11 0x556701caf2aa in cmd_test tests/builtin-test.c:559
    #12 0x556701d3b557 in run_builtin tools/perf/perf.c:323
    #13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377
    #14 0x556701d3be90 in run_argv tools/perf/perf.c:421
    #15 0x556701d3c3f8 in main tools/perf/perf.c:537
    #16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s).
```

Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230613235416.1650755-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kwiboo pushed a commit that referenced this issue Jul 29, 2023
[ Upstream commit 99d4850 ]

Found by leak sanitizer:
```
==1632594==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 21 byte(s) in 1 object(s) allocated from:
    #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
    #1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369
    #2 0x556701d70589 in perf_env__cpuid util/env.c:465
    #3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14
    #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83
    #5 0x556701d8f78b in evsel__config util/evsel.c:1366
    #6 0x556701ef5872 in evlist__config util/record.c:108
    #7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112
    #8 0x556701cacd07 in run_test tests/builtin-test.c:236
    #9 0x556701cacfac in test_and_print tests/builtin-test.c:265
    #10 0x556701cadddb in __cmd_test tests/builtin-test.c:402
    #11 0x556701caf2aa in cmd_test tests/builtin-test.c:559
    #12 0x556701d3b557 in run_builtin tools/perf/perf.c:323
    #13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377
    #14 0x556701d3be90 in run_argv tools/perf/perf.c:421
    #15 0x556701d3c3f8 in main tools/perf/perf.c:537
    #16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s).
```

Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230613235416.1650755-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Kwiboo pushed a commit that referenced this issue Aug 23, 2023
RK3128 vop-iommu skip request the irq due to register read issue.
So don't free irq when shut down or belowing error will output

[  102.107589] WARNING: CPU: 3 PID: 1013 at kernel/irq/devres.c:143 devm_free_irq+0x68/0x9c
[  102.115720] Modules linked in:
[  102.118862] CPU: 3 PID: 1013 Comm: init Not tainted 5.10.110 #15
[  102.124907] Hardware name: Generic DT based system
[  102.129732] Backtrace:
[  102.132230] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.139843]  r7:600f0013 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.145553] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.153171] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.160780]  r7:00000000 r6:00000000 r5:00000009 r4:c017ac34
[  102.166488] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.173497] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.181026]  r9:00000000 r8:00000009 r7:c017ac34 r6:0000008f r5:c0df4ea4 r4:c353c000
[  102.188820] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017ac34>] (devm_free_irq+0x68/0x9c)
[  102.196962]  r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e r4:c353c000
[  102.204759] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.212895]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.217558] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.226385]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.232094] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.240938] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.249684]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.257548]  r4:00000000
[  102.260124] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.268785] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.277004]  r5:c11168ec r4:00000000
[  102.280617] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.288326]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.296121] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.303821] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.308912] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.317141] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.325364] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.330545] ---[ end trace adc766c58fa6634f ]---
[  102.335275] ------------[ cut here ]------------
[  102.339948] WARNING: CPU: 3 PID: 1013 at kernel/irq/manage.c:1756 free_irq+0x26c/0x29c
[  102.347907] Trying to free already-free IRQ 46
[  102.352381] Modules linked in:
[  102.355479] CPU: 3 PID: 1013 Comm: init Tainted: G        W         5.10.110 #15
[  102.362907] Hardware name: Generic DT based system
[  102.367721] Backtrace:
[  102.370212] [<c0bb9218>] (dump_backtrace) from [<c0bb95b8>] (show_stack+0x20/0x24)
[  102.377824]  r7:600f0093 r6:c0e96442 r5:00000000 r4:c1219d6c
[  102.383532] [<c0bb9598>] (show_stack) from [<c0bbc7f8>] (dump_stack_lvl+0x94/0xac)
[  102.391150] [<c0bbc764>] (dump_stack_lvl) from [<c0bbc824>] (dump_stack+0x14/0x1c)
[  102.398759]  r7:c353dd34 r6:00000000 r5:00000009 r4:c017823c
[  102.404466] [<c0bbc810>] (dump_stack) from [<c011fd80>] (__warn+0xd4/0x100)
[  102.411473] [<c011fcac>] (__warn) from [<c0bb9cd8>] (warn_slowpath_fmt+0x8c/0xc4)
[  102.419002]  r9:c0df48e7 r8:00000009 r7:c017823c r6:000006dc r5:c0df4854 r4:c353c000
[  102.426799] [<c0bb9c50>] (warn_slowpath_fmt) from [<c017823c>] (free_irq+0x26c/0x29c)
[  102.434676]  r9:600f0013 r8:0000002e r7:c1a17f40 r6:c1978a6c r5:00000000 r4:c1978a00
[  102.442471] [<c0177fd0>] (free_irq) from [<c017ac40>] (devm_free_irq+0x74/0x9c)
[  102.449832]  r10:c197ec54 r9:c0e44260 r8:c128d018 r7:c12ea690 r6:c1a17f40 r5:0000002e
[  102.457700]  r4:c353c000
[  102.460278] [<c017abcc>] (devm_free_irq) from [<c051e934>] (rk_iommu_shutdown+0x58/0x5c)
[  102.468414]  r6:c1a17f40 r5:00000001 r4:c197ec00
[  102.473075] [<c051e8dc>] (rk_iommu_shutdown) from [<c05b134c>] (platform_drv_shutdown+0x2c/0x30)
[  102.481901]  r7:c12ea690 r6:c12285f0 r5:c197ec10 r4:c197ec14
[  102.487611] [<c05b1320>] (platform_drv_shutdown) from [<c05ad464>] (device_shutdown+0x15c/0x1dc)
[  102.496453] [<c05ad308>] (device_shutdown) from [<c01441d4>] (kernel_restart_prepare+0x3c/0x48)
[  102.505200]  r10:00000058 r9:01234567 r8:00000010 r7:c353c000 r6:4321fedc r5:c11168ec
[  102.513066]  r4:00000000
[  102.515642] [<c0144198>] (kernel_restart_prepare) from [<c01442f4>] (kernel_restart+0x1c/0x60)
[  102.524302] [<c01442d8>] (kernel_restart) from [<c01445d8>] (__do_sys_reboot+0x154/0x1e0)
[  102.532521]  r5:c11168ec r4:00000000
[  102.536134] [<c0144484>] (__do_sys_reboot) from [<c01446d4>] (sys_reboot+0x18/0x1c)
[  102.543833]  r9:c353c000 r8:c01002c4 r7:00000058 r6:005241c4 r5:00524140 r4:00000000
[  102.551626] [<c01446bc>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  102.559324] Exception stack(0xc353dfa8 to 0xc353dff0)
[  102.564414] dfa0:                   00000000 00524140 fee1dead 28121969 01234567 00000010
[  102.572641] dfc0: 00000000 00524140 005241c4 00000058 beddbefc 0044b1a8 b6f39d00 b6f3a010
[  102.580864] dfe0: 00523b5c beddbc90 004dadf4 b6e584c8
[  102.585947] ---[ end trace adc766c58fa66350 ]---

Change-Id: Ic0603d4d00528dc6b5ef6d480b15d3c14585dec3
Signed-off-by: Simon Xue <xxm@rock-chips.com>
Kwiboo pushed a commit that referenced this issue Oct 23, 2023
…roy()

After the commit in Fixes:, if a module that created a slab cache does not
release all of its allocated objects before destroying the cache (at rmmod
time), we might end up releasing the kmem_cache object without removing it
from the slab_caches list thus corrupting the list as kmem_cache_destroy()
ignores the return value from shutdown_cache(), which in turn never removes
the kmem_cache object from slabs_list in case __kmem_cache_shutdown() fails
to release all of the cache's slabs.

This is easily observable on a kernel built with CONFIG_DEBUG_LIST=y
as after that ill release the system will immediately trip on list_add,
or list_del, assertions similar to the one shown below as soon as another
kmem_cache gets created, or destroyed:

  [ 1041.213632] list_del corruption. next->prev should be ffff89f596fb5768, but was 52f1e5016aeee75d. (next=ffff89f595a1b268)
  [ 1041.219165] ------------[ cut here ]------------
  [ 1041.221517] kernel BUG at lib/list_debug.c:62!
  [ 1041.223452] invalid opcode: 0000 [#1] PREEMPT SMP PTI
  [ 1041.225408] CPU: 2 PID: 1852 Comm: rmmod Kdump: loaded Tainted: G    B   W  OE      6.5.0 #15
  [ 1041.228244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023
  [ 1041.231212] RIP: 0010:__list_del_entry_valid+0xae/0xb0

Another quick way to trigger this issue, in a kernel with CONFIG_SLUB=y,
is to set slub_debug to poison the released objects and then just run
cat /proc/slabinfo after removing the module that leaks slab objects,
in which case the kernel will panic:

  [   50.954843] general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 [#1] PREEMPT SMP PTI
  [   50.961545] CPU: 2 PID: 1495 Comm: cat Kdump: loaded Tainted: G    B   W  OE      6.5.0 #15
  [   50.966808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023
  [   50.972663] RIP: 0010:get_slabinfo+0x42/0xf0

This patch fixes this issue by properly checking shutdown_cache()'s
return value before taking the kmem_cache_release() branch.

Fixes: 0495e33 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock")
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Kwiboo pushed a commit that referenced this issue Oct 23, 2023
Fix an error detected by memory sanitizer:
```
==4033==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6
    #1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2
    #2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9
    #3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32
    #4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6
    #5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8
    #6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8
    #7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9
    #8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8
    #9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15
    #10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9
    #11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8
    #12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5
    #13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9
    #14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11
    #15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8
    #16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2
    #17 0x55fb0f941dab in main tools/perf/perf.c:535:3
```

Fixes: 7b723db ("perf pmu: Be lazy about loading event info files from sysfs")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230914022425.1489035-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Kwiboo pushed a commit that referenced this issue Oct 23, 2023
The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kwiboo pushed a commit that referenced this issue Mar 5, 2024
[ Upstream commit a154f5f ]

The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants