Added openvswitch kernel module compilation #22

carlosedp · 2018-03-21T20:31:10Z

No description provided.

carlosedp · 2018-06-19T17:53:30Z

Hi @ayufan , is it possible to port this to the mainline kernel?
Thanks

Change-Id: I9f2c1b0d8d56a8d3da1774f51a4701212ac6ffff added IPGRE module to enable: (#18) ip tunnel add grx mode gre Change-Id: I26bda9ece68f9f6be1277da0779076694d5d845c ayufan: rockchip_linux_defconfig compile ZRAM as module Change-Id: I2ee48973cb370130cc5d75008a07d5777c69208a ayufan: rockchip_defconfig: disable CONFIG_SCHED_WALT as it makes system unstable Change-Id: Id2a150311fcaad9f11127320558dc3caafb250e4 config: add more USB wifi chipsets (#19) * Add some more USB wifi chipsets to default config. * Fix typo in default config. Added openvswitch kernel module compilation (#22) ayufan: defconfig: add RK805 pinctrl Change-Id: I9144206544db4e86e73a4d553fb8db3aa194c619 ayufan: enable additional realtek devices Change-Id: Ie3598c47154b648540f161ad4aedd597bc33f83f config: Add support for modules/features requested by issues #24 (AUTOFS4), rockchip-linux#54 (DRBD), rockchip-linux#64 (Multiple Routing Tables), rockchip-linux#87 (RTL8188EU), rockchip-linux#107 (iSCSI), rockchip-linux#148 (1-W GPIO) (#24) ayufan: rtl8812au: add Edimax 600 USB adapter Add requested kernel modules/features for issue rockchip-linux#153 and LIRC, PPS GPIO support, remove wifi staging drivers. (#25) Fix compile error (#26) Multiple definition error due to midgard and bifrost both selected ayufan: defconfig: add a bunch of kernel modules Change-Id: I77f5c4809c35b6c74a3bb668487eba93a0a15169 ayufan: rockchip_wlan: revert changes Change-Id: I6aa0bbc24e2fdfb6da350ed167d59c4eea836c2a ayufan: defconfig: enable CONFIG_MEMTEST Change-Id: I82c3d1b2d35fb10172f891571ef600a692ac4e61 ayufan: defconfig: remove unusued kernel configs Change-Id: I2a4a2e5785dd5796bb7404718898127718b86425 ayufan: defconfig: enable initrd in bzip2/lzma/lzo/lz4 Change-Id: I88ea1f8127d8fdadf90dd4428a51a1582adc2687 ayufan: defconfig: enable PWM FAN Change-Id: I87a364e95d937c354e06eb41fead43d8d5cb0b4a external: defconfig: Add support for f2fs and crc32 (#32) Add support for f2fs and it's dependency crc32 to rock64. This patch may or may not need to be included as well due to this bug https://bugs.debian.org/819725 A official patch was added by Debian https://salsa.debian.org/kernel-team/linux/blob/master/debian/patches/bugfix/all/fs-add-module_softdep-declarations-for-hard-coded-cr.patch external: defconfig: enable kernel modules for RBD and IPVS (#33) external: defconfig: Make crc, crc32, f2fs to be compiled into kernel. (#35) This changes crc, crc32, and f2fs to be compiled into the kernel instead of as module. This also fixes the issue of crc32-arm64 not being compiled due to a incorrect driver name. CONFIG_CRYPTO_CRC32_ARM64 became CONFIG_CRYPTO_CRC32_ARM64_CE in newer kernel sources. That's why I had to change that here. ayufan: defconfig: use CONFIG_HZ=250 Change-Id: I8009db93ee7c5af5e7804a004ab03d0ba97d20e2 CONFIG_SQUASHFS_XZ=y (#41) cyberp: defconfig: squashfs xz for snap support defconfig: enable binfmt-misc (#36) ayufan: defconfig: organize with savedefconfig ayufan: defconfig: support tehuti 10gbps adapter ayufan: defconfig: enable usb gadget via configfs ayufan: dts: compile-in audio codecs Change-Id: I7c70870395a10a0eafff006aabc17faf1d33355e

commit 68faa67 upstream. 'chrdev_open()' calls 'cdev_get()' to obtain a reference to the 'struct cdev *' stashed in the 'i_cdev' field of the target inode structure. If the pointer is NULL, then it is initialised lazily by looking up the kobject in the 'cdev_map' and so the whole procedure is protected by the 'cdev_lock' spinlock to serialise initialisation of the shared pointer. Unfortunately, it is possible for the initialising thread to fail *after* installing the new pointer, for example if the subsequent '->open()' call on the file fails. In this case, 'cdev_put()' is called, the reference count on the kobject is dropped and, if nobody else has taken a reference, the release function is called which finally clears 'inode->i_cdev' from 'cdev_purge()' before potentially freeing the object. The problem here is that a racing thread can happily take the 'cdev_lock' and see the non-NULL pointer in the inode, which can result in a refcount increment from zero and a warning: | ------------[ cut here ]------------ | refcount_t: addition on 0; use-after-free. | WARNING: CPU: 2 PID: 6385 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0 | Modules linked in: | CPU: 2 PID: 6385 Comm: repro Not tainted 5.5.0-rc2+ ayufan-rock64#22 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 | RIP: 0010:refcount_warn_saturate+0x6d/0xf0 | Code: 05 55 9a 15 01 01 e8 9d aa c8 ff 0f 0b c3 80 3d 45 9a 15 01 00 75 ce 48 c7 c7 00 9c 62 b3 c6 08 | RSP: 0018:ffffb524c1b9bc70 EFLAGS: 00010282 | RAX: 0000000000000000 RBX: ffff9e9da1f71390 RCX: 0000000000000000 | RDX: ffff9e9dbbd27618 RSI: ffff9e9dbbd18798 RDI: ffff9e9dbbd18798 | RBP: 0000000000000000 R08: 000000000000095f R09: 0000000000000039 | R10: 0000000000000000 R11: ffffb524c1b9bb20 R12: ffff9e9da1e8c700 | R13: ffffffffb25ee8b0 R14: 0000000000000000 R15: ffff9e9da1e8c700 | FS: 00007f3b87d26700(0000) GS:ffff9e9dbbd00000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 00007fc16909c000 CR3: 000000012df9c000 CR4: 00000000000006e0 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 | Call Trace: | kobject_get+0x5c/0x60 | cdev_get+0x2b/0x60 | chrdev_open+0x55/0x220 | ? cdev_put.part.3+0x20/0x20 | do_dentry_open+0x13a/0x390 | path_openat+0x2c8/0x1470 | do_filp_open+0x93/0x100 | ? selinux_file_ioctl+0x17f/0x220 | do_sys_open+0x186/0x220 | do_syscall_64+0x48/0x150 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 | RIP: 0033:0x7f3b87efcd0e | Code: 89 54 24 08 e8 a3 f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f4 | RSP: 002b:00007f3b87d259f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 | RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b87efcd0e | RDX: 0000000000000000 RSI: 00007f3b87d25a80 RDI: 00000000ffffff9c | RBP: 00007f3b87d25e90 R08: 0000000000000000 R09: 0000000000000000 | R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe188f504e | R13: 00007ffe188f504f R14: 00007f3b87d26700 R15: 0000000000000000 | ---[ end trace 24f53ca58db8180a ]--- Since 'cdev_get()' can already fail to obtain a reference, simply move it over to use 'kobject_get_unless_zero()' instead of 'kobject_get()', which will cause the racing thread to return -ENXIO if the initialising thread fails unexpectedly. Cc: Hillf Danton <hdanton@sina.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+82defefbbd8527e1c2cb@syzkaller.appspotmail.com Signed-off-by: Will Deacon <will@kernel.org> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191219120203.32691-1-will@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 ayufan-rock64#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 ayufan-rock64#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 ayufan-rock64#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 ayufan-rock64#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 ayufan-rock64#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 ayufan-rock64#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 ayufan-rock64#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 ayufan-rock64#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 ayufan-rock64#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 ayufan-rock64#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 ayufan-rock64#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 ayufan-rock64#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 ayufan-rock64#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 ayufan-rock64#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 ayufan-rock64#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 ayufan-rock64#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 ayufan-rock64#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 ayufan-rock64#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 ayufan-rock64#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 ayufan-rock64#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 ayufan-rock64#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4ff753f ] When an UE or memory error exception is encountered the MCE handler tries to find the pfn using addr_to_pfn() which takes effective address as an argument, later pfn is used to poison the page where memory error occurred, recent rework in this area made addr_to_pfn to run in real mode, which can be fatal as it may try to access memory outside RMO region. Have two helper functions to separate things to be done in real mode and virtual mode without changing any functionality. This also fixes the following error as the use of addr_to_pfn is now moved to virtual mode. Without this change following kernel crash is seen on hitting UE. [ 485.128036] Oops: Kernel access of bad area, sig: 11 [#1] [ 485.128040] LE SMP NR_CPUS=2048 NUMA pSeries [ 485.128047] Modules linked in: [ 485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 ayufan-rock64#22 [ 485.128074] NIP: c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0 [ 485.128078] REGS: c000000003f1f970 TRAP: 0300 Tainted: G OE (5.7.0) [ 485.128082] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008284 XER: 00000001 [ 485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1 [ 485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000 [ 485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022 [ 485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14 [ 485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000 [ 485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100 [ 485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50 [ 485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a [ 485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a [ 485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310 [ 485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170 [ 485.128133] Call Trace: [ 485.128135] Instruction dump: [ 485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff [ 485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040 [ 485.128152] ---[ end trace d34b27e29ae0e340 ]--- Fixes: 9ca766f ("powerpc/64s/pseries: machine check convert to use common event code") Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724063946.21378-1-ganeshgr@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4e79f02 ] When running in BE mode on LPAE hardware with a PA-to-VA translation that exceeds 4 GB, we patch bits 39:32 of the offset into the wrong byte of the opcode. So fix that, by rotating the offset in r0 to the right by 8 bits, which will put the 8-bit immediate in bits 31:24. Note that this will also move bit ayufan-rock64#22 in its correct place when applying the rotation to the constant #0x400000. Fixes: d9a790d ("ARM: 7883/1: fix mov to mvn conversion in case of 64 bit phys_addr_t and BE") Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 45caeaa upstream. As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: ayufan-rock64#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . ayufan-rock64#9 [] tcp_rcv_established at ffffffff81580b64 ayufan-rock64#10 [] tcp_v4_do_rcv at ffffffff8158b54a ayufan-rock64#11 [] tcp_v4_rcv at ffffffff8158cd02 ayufan-rock64#12 [] ip_local_deliver_finish at ffffffff815668f4 ayufan-rock64#13 [] ip_local_deliver at ffffffff81566bd9 ayufan-rock64#14 [] ip_rcv_finish at ffffffff8156656d ayufan-rock64#15 [] ip_rcv at ffffffff81566f06 ayufan-rock64#16 [] __netif_receive_skb_core at ffffffff8152b3a2 ayufan-rock64#17 [] __netif_receive_skb at ffffffff8152b608 ayufan-rock64#18 [] netif_receive_skb at ffffffff8152b690 ayufan-rock64#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] ayufan-rock64#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] ayufan-rock64#21 [] net_rx_action at ffffffff8152bac2 ayufan-rock64#22 [] __do_softirq at ffffffff81084b4f ayufan-rock64#23 [] call_softirq at ffffffff8164845c ayufan-rock64#24 [] do_softirq at ffffffff81016fc5 ayufan-rock64#25 [] irq_exit at ffffffff81084ee5 ayufan-rock64#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)â�© 225 {â�© 226 â�¹ const struct inet_connection_sock *icsk = inet_csk(sk);â�© 227 â�¹ const struct dst_entry *dst = __sk_dst_get(sk);â�© 228 â�© 229 â�¹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||â�© 230 â�¹ â�¹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);â�© 231 }â�© But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>

[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... ayufan-rock64#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ayufan-rock64#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] ayufan-rock64#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] ayufan-rock64#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] ayufan-rock64#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] ayufan-rock64#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] ayufan-rock64#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] ayufan-rock64#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] ayufan-rock64#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 ayufan-rock64#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 ayufan-rock64#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 ayufan-rock64#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf ayufan-rock64#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 ayufan-rock64#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 ayufan-rock64#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 ayufan-rock64#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff ayufan-rock64#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f ayufan-rock64#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 8cfb085 ] Li Huafei reports that mcount-based ftrace with module PLTs was broken by commit: a625357 ("arm64: ftrace: consistently handle PLTs.") When a module PLTs are used and a module is loaded sufficiently far away from the kernel, we'll create PLTs for any branches which are out-of-range. These are separate from the special ftrace trampoline PLTs, which the module PLT code doesn't directly manipulate. When mcount is in use this is a problem, as each mcount callsite in a module will be initialized to point to a module PLT, but since commit a625357 ftrace_make_nop() will assume that the callsite has been initialized to point to the special ftrace trampoline PLT, and ftrace_find_callable_addr() rejects other cases. This means that when ftrace tries to initialize a callsite via ftrace_make_nop(), the call to ftrace_find_callable_addr() will find that the `_mcount` stub is out-of-range and is not handled by the ftrace PLT, resulting in a splat: | ftrace_test: loading out-of-tree module taints kernel. | ftrace: no module PLT for _mcount | ------------[ ftrace bug ]------------ | ftrace failed to modify | [<ffff800029180014>] 0xffff800029180014 | actual: 44:00:00:94 | Initializing ftrace call sites | ftrace record flags: 2000000 | (0) | expected tramp: ffff80000802eb3c | ------------[ cut here ]------------ | WARNING: CPU: 3 PID: 157 at kernel/trace/ftrace.c:2120 ftrace_bug+0x94/0x270 | Modules linked in: | CPU: 3 PID: 157 Comm: insmod Tainted: G O 6.0.0-rc6-00151-gcd722513a189-dirty ayufan-rock64#22 | Hardware name: linux,dummy-virt (DT) | pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : ftrace_bug+0x94/0x270 | lr : ftrace_bug+0x21c/0x270 | sp : ffff80000b2bbaf0 | x29: ffff80000b2bbaf0 x28: 0000000000000000 x27: ffff0000c4d38000 | x26: 0000000000000001 x25: ffff800009d7e000 x24: ffff0000c4d86e00 | x23: 0000000002000000 x22: ffff80000a62b000 x21: ffff8000098ebea8 | x20: ffff0000c4d38000 x19: ffff80000aa24158 x18: ffffffffffffffff | x17: 0000000000000000 x16: 0a0d2d2d2d2d2d2d x15: ffff800009aa9118 | x14: 0000000000000000 x13: 6333626532303830 x12: 3030303866666666 | x11: 203a706d61727420 x10: 6465746365707865 x9 : 3362653230383030 | x8 : c0000000ffffefff x7 : 0000000000017fe8 x6 : 000000000000bff4 | x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000001 | x2 : ad2cb14bb5438900 x1 : 0000000000000000 x0 : 0000000000000022 | Call trace: | ftrace_bug+0x94/0x270 | ftrace_process_locs+0x308/0x430 | ftrace_module_init+0x44/0x60 | load_module+0x15b4/0x1ce8 | __do_sys_init_module+0x1ec/0x238 | __arm64_sys_init_module+0x24/0x30 | invoke_syscall+0x54/0x118 | el0_svc_common.constprop.4+0x84/0x100 | do_el0_svc+0x3c/0xd0 | el0_svc+0x1c/0x50 | el0t_64_sync_handler+0x90/0xb8 | el0t_64_sync+0x15c/0x160 | ---[ end trace 0000000000000000 ]--- | ---------test_init----------- Fix this by reverting to the old behaviour of ignoring the old instruction when initialising an mcount callsite in a module, which was the behaviour prior to commit a625357. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Fixes: a625357 ("arm64: ftrace: consistently handle PLTs.") Reported-by: Li Huafei <lihuafei1@huawei.com> Link: https://lore.kernel.org/linux-arm-kernel/20220929094134.99512-1-lihuafei1@huawei.com Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220929134525.798593-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

After waking up a suspended VM, the kernel prints the following trace for virtio drivers which do not directly call virtio_device_ready() in the .restore: PM: suspend exit irq 22: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 __report_bad_irq+0x3a/0xaf note_interrupt.cold+0xb/0x60 handle_irq_event+0x71/0x80 handle_fasteoi_irq+0x95/0x1e0 __common_interrupt+0x6b/0x110 common_interrupt+0x63/0xe0 asm_common_interrupt+0x1e/0x40 ? __do_softirq+0x75/0x2f3 irq_exit_rcu+0x93/0xe0 sysvec_apic_timer_interrupt+0xac/0xd0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch_cpu_idle+0x12/0x20 default_idle_call+0x39/0xf0 do_idle+0x1b5/0x210 cpu_startup_entry+0x20/0x30 start_secondary+0xf3/0x100 secondary_startup_64_no_verify+0xc3/0xcb </TASK> handlers: [<000000008f9bac49>] vp_interrupt [<000000008f9bac49>] vp_interrupt Disabling IRQ ayufan-rock64#22 This happens because we don't invoke .enable_cbs callback in virtio_device_restore(). That callback is used by some transports (e.g. virtio-pci) to enable interrupts. Let's fix it, by calling virtio_device_ready() as we do in virtio_dev_probe(). This function calls .enable_cts callback and sets DRIVER_OK status bit. This fix also avoids setting DRIVER_OK twice for those drivers that call virtio_device_ready() in the .restore. Bug: 254441685 Fixes: d50497e ("virtio_config: introduce a new .enable_cbs method") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220322114313.116516-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 8d65bc9) Signed-off-by: Lee Jones <joneslee@google.com> Change-Id: Id60e3e75c8ca30e2dc282567e8455d4bc2811210

[ Upstream commit 4e264be ] When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 ayufan-rock64#6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa ayufan-rock64#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc ayufan-rock64#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e ayufan-rock64#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 ayufan-rock64#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 ayufan-rock64#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] ayufan-rock64#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] ayufan-rock64#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] ayufan-rock64#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 ayufan-rock64#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 ayufan-rock64#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 ayufan-rock64#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 ayufan-rock64#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 ayufan-rock64#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc ayufan-rock64#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d ayufan-rock64#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 ayufan-rock64#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove") Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <mcornea@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 0b0747d upstream. The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 ayufan-rock64#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 ayufan-rock64#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 ayufan-rock64#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 ayufan-rock64#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 ayufan-rock64#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 ayufan-rock64#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 ayufan-rock64#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 ayufan-rock64#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 ayufan-rock64#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 ayufan-rock64#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 ayufan-rock64#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 ayufan-rock64#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 ayufan-rock64#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 ayufan-rock64#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 ayufan-rock64#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 ayufan-rock64#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 ayufan-rock64#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 ayufan-rock64#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 ayufan-rock64#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 ayufan-rock64#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 ayufan-rock64#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 ayufan-rock64#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 ayufan-rock64#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 ayufan-rock64#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 ayufan-rock64#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 ayufan-rock64#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 0b0747d ] The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 ayufan-rock64#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 ayufan-rock64#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 ayufan-rock64#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 ayufan-rock64#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 ayufan-rock64#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 ayufan-rock64#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 ayufan-rock64#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 ayufan-rock64#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 ayufan-rock64#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 ayufan-rock64#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 ayufan-rock64#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 ayufan-rock64#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 ayufan-rock64#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 ayufan-rock64#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 ayufan-rock64#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 ayufan-rock64#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 ayufan-rock64#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 ayufan-rock64#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 ayufan-rock64#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 ayufan-rock64#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 ayufan-rock64#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 ayufan-rock64#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 ayufan-rock64#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 ayufan-rock64#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 ayufan-rock64#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 ayufan-rock64#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 79d72c6 upstream. When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid. E.g: Taking the following steps: fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC); fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error: ... ... case Opt_pagesize: ps = memparse(param->string, &rest); ctx->hstate = h; if (!ctx->hstate) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } return 0; ... ... This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super() ... ... sb->s_blocksize = huge_page_size(ctx->hstate); ... ... Causing below Oops. Fix this by replacing cxt->hstate value only when then pagesize is known to be valid. kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ ayufan-rock64#22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace: kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000 kernel: </TASK> kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E) kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E) kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1 kernel: CR2: 0000000000000028 kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de Fixes: 3202198 ("hugetlbfs: Convert to fs_context") Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit c319882 ] When the skb is reorganized during esp_output (!esp->inline), the pages coming from the original skb fragments are supposed to be released back to the system through put_page. But if the skb fragment pages are originating from a page_pool, calling put_page on them will trigger a page_pool leak which will eventually result in a crash. This leak can be easily observed when using CONFIG_DEBUG_VM and doing ipsec + gre (non offloaded) forwarding: BUG: Bad page state in process ksoftirqd/16 pfn:1451b6 page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6 flags: 0x200000000000000(node=0|zone=2) page_type: 0xffffffff() raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000 raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000 page dumped because: page_pool leak Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core] CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ ayufan-rock64#22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x36/0x50 bad_page+0x70/0xf0 free_unref_page_prepare+0x27a/0x460 free_unref_page+0x38/0x120 esp_ssg_unref.isra.0+0x15f/0x200 esp_output_tail+0x66d/0x780 esp_xmit+0x2c5/0x360 validate_xmit_xfrm+0x313/0x370 ? validate_xmit_skb+0x1d/0x330 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x23e/0x350 __dev_queue_xmit+0x337/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x25e/0x580 iptunnel_xmit+0x19b/0x240 ip_tunnel_xmit+0x5fb/0xb60 ipgre_xmit+0x14d/0x280 [ip_gre] dev_hard_start_xmit+0xc3/0x1c0 __dev_queue_xmit+0x208/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x1ca/0x580 ip_sublist_rcv_finish+0x32/0x40 ip_sublist_rcv+0x1b2/0x1f0 ? ip_rcv_finish_core.constprop.0+0x460/0x460 ip_list_rcv+0x103/0x130 __netif_receive_skb_list_core+0x181/0x1e0 netif_receive_skb_list_internal+0x1b3/0x2c0 napi_gro_receive+0xc8/0x200 gro_cell_poll+0x52/0x90 __napi_poll+0x25/0x1a0 net_rx_action+0x28e/0x300 __do_softirq+0xc3/0x276 ? sort_range+0x20/0x20 run_ksoftirqd+0x1e/0x30 smpboot_thread_fn+0xa6/0x130 kthread+0xcd/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x31/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> The suggested fix is to introduce a new wrapper (skb_page_unref) that covers page refcounting for page_pool pages as well. Cc: stable@vger.kernel.org Fixes: 6a5bcd8 ("page_pool: Allow drivers to hint on SKB recycling") Reported-and-tested-by: Anatoli N.Chechelnickiy <Anatoli.Chechelnickiy@m.interpipe.biz> Reported-by: Ian Kumlien <ian.kumlien@gmail.com> Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit 79d72c6 upstream. When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid. E.g: Taking the following steps: fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC); fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error: ... ... case Opt_pagesize: ps = memparse(param->string, &rest); ctx->hstate = h; if (!ctx->hstate) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } return 0; ... ... This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super() ... ... sb->s_blocksize = huge_page_size(ctx->hstate); ... ... Causing below Oops. Fix this by replacing cxt->hstate value only when then pagesize is known to be valid. kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ ayufan-rock64#22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace: kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000 kernel: </TASK> kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E) kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E) kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1 kernel: CR2: 0000000000000028 kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de Fixes: 3202198 ("hugetlbfs: Convert to fs_context") Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vamsi Krishna Brahmajosyula <vamsi-krishna.brahmajosyula@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 ayufan-rock64#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 ayufan-rock64#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 ayufan-rock64#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 ayufan-rock64#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 ayufan-rock64#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 ayufan-rock64#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 ayufan-rock64#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 ayufan-rock64#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 ayufan-rock64#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 ayufan-rock64#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 ayufan-rock64#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 ayufan-rock64#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 ayufan-rock64#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 ayufan-rock64#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 ayufan-rock64#20 0x55c9400e53ad in main tools/perf/perf.c:561 ayufan-rock64#21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 ayufan-rock64#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 ayufan-rock64#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

…uddy pages commit 8cf360b upstream. When I did memory failure tests recently, below panic occurs: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page)) ------------[ cut here ]------------ kernel BUG at include/linux/page-flags.h:1009! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__del_page_from_free_list+0x151/0x180 RSP: 0018:ffffa49c90437998 EFLAGS: 00000046 RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0 RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69 R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80 R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009 FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0 Call Trace: <TASK> __rmqueue_pcplist+0x23b/0x520 get_page_from_freelist+0x26b/0xe40 __alloc_pages_noprof+0x113/0x1120 __folio_alloc_noprof+0x11/0xb0 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130 __alloc_fresh_hugetlb_folio+0xe7/0x140 alloc_pool_huge_folio+0x68/0x100 set_max_huge_pages+0x13d/0x340 hugetlb_sysctl_handler_common+0xe8/0x110 proc_sys_call_handler+0x194/0x280 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff916114887 RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887 RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003 RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0 R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00 </TASK> Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- And before the panic, there had an warning about bad page state: BUG: Bad page state in process page-types pfn:8cee00 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) page_type: 0xffffff7f(buddy) raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty ayufan-rock64#22 Call Trace: <TASK> dump_stack_lvl+0x83/0xa0 bad_page+0x63/0xf0 free_unref_page+0x36e/0x5c0 unpoison_memory+0x50b/0x630 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xcd/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f189a514887 RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887 RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003 RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8 R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040 </TASK> The root cause should be the below race: memory_failure try_memory_failure_hugetlb me_huge_page __page_handle_poison dissolve_free_hugetlb_folio drain_all_pages -- Buddy page can be isolated e.g. for compaction. take_page_off_buddy -- Failed as page is not in the buddy list. -- Page can be putback into buddy after compaction. page_ref_inc -- Leads to buddy page with refcnt = 1. Then unpoison_memory() can unpoison the page and send the buddy page back into buddy list again leading to the above bad page state warning. And bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to allocate this page. Fix this issue by only treating __page_handle_poison() as successful when it returns 1. Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com Fixes: ceaf8fb ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Added openvswitch kernel module compilation

c6d2fd0

ayufan merged commit 91d9eef into ayufan-rock64:release-4.4 Mar 21, 2018

ayufan pushed a commit that referenced this pull request Mar 28, 2018

Added openvswitch kernel module compilation (#22)

f0336e8

ayufan pushed a commit that referenced this pull request May 6, 2018

Added openvswitch kernel module compilation (#22)

f68136d

ayufan pushed a commit that referenced this pull request May 26, 2018

Added openvswitch kernel module compilation (#22)

e8bdd73

ayufan pushed a commit that referenced this pull request May 30, 2018

Added openvswitch kernel module compilation (#22)

41db3b7

ayufan pushed a commit that referenced this pull request Jun 2, 2018

Added openvswitch kernel module compilation (#22)

7016e9d

ayufan pushed a commit that referenced this pull request Jun 10, 2018

Added openvswitch kernel module compilation (#22)

40948ba

ayufan pushed a commit that referenced this pull request Jun 14, 2018

Added openvswitch kernel module compilation (#22)

dc671f7

ayufan pushed a commit that referenced this pull request Jul 3, 2018

Added openvswitch kernel module compilation (#22)

a9ea597

ayufan pushed a commit that referenced this pull request Jul 20, 2018

Added openvswitch kernel module compilation (#22)

2537077

ayufan pushed a commit that referenced this pull request Aug 9, 2018

Added openvswitch kernel module compilation (#22)

674a736

ayufan pushed a commit that referenced this pull request Sep 29, 2018

Added openvswitch kernel module compilation (#22)

11f1f48

ayufan pushed a commit that referenced this pull request Dec 11, 2018

Added openvswitch kernel module compilation (#22)

408d677

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added openvswitch kernel module compilation #22

Added openvswitch kernel module compilation #22

carlosedp commented Mar 21, 2018

carlosedp commented Jun 19, 2018

Added openvswitch kernel module compilation #22

Added openvswitch kernel module compilation #22

Conversation

carlosedp commented Mar 21, 2018

carlosedp commented Jun 19, 2018