forked from linux-sunxi/linux-sunxi
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sunxi audio h3 #1
Closed
sergey-suloev
wants to merge
115
commits into
codekipper:sunxi-audio-h3
from
sergey-suloev:sunxi-audio-h3
Closed
Sunxi audio h3 #1
sergey-suloev
wants to merge
115
commits into
codekipper:sunxi-audio-h3
from
sergey-suloev:sunxi-audio-h3
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This STB has a type A socket which acts as OTG. Signed-off-by: Marcus Cooper <codekipper@gmail.com>
The dwmac-sun8i hardware is present on the Beelink X2. It uses the internal PHY. This patch create the needed emac node. Signed-off-by: Marcus Cooper <codekipper@gmail.com>
Orange Pi 2 features a MicroUSB port that can work in peripheral mode. In peripheral mode, its VBUS cannot be used to power up the board. Signed-off-by: Marcus Cooper <codekipper@gmail.com>
But haven't as I've not tested since rebasing or just plain lazy
SATA clock on sun4i/sun7i is of type (parent) / M / 6 where 6 is fixed post-divider. Signed-off-by: Priit Laes <plaes@plaes.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Introduce a clock controller driver for sun4i A10 and sun7i A20 series SoCs. Signed-off-by: Priit Laes <plaes@plaes.org>
Allwinner A20 is now driven by sunxi-ng CCU driver. Add devicetree binding for it. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Priit Laes <plaes@plaes.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Allwinner A10 is now driven by sunxi-ng CCU driver. Add devicetree binding for it. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Priit Laes <plaes@plaes.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Convert sun7i-a20.dtsi to new CCU driver. Tested on Cubietruck. Signed-off-by: Priit Laes <plaes@plaes.org>
Convert sun4i-a10.dtsi to new CCU driver. Tested on Gemei G9 tablet. Signed-off-by: Priit Laes <plaes@plaes.org>
The documentation for drm_do_get_edid in drivers/gpu/drm/drm_edid.c states: "As in the general case the DDC bus is accessible by the kernel at the I2C level, drivers must make all reasonable efforts to expose it as an I2C adapter and use drm_get_edid() instead of abusing this function." Exposing the DDC bus as an I2C adapter is more beneficial as it can be used for purposes other than reading the EDID such as modifying the EDID or using the HDMI DDC pins as an I2C bus through the I2C dev interface from userspace (e.g. i2c-tools). Implement this for A10s. Signed-off-by: Jonathan Liu <net147@gmail.com> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Until now we were not providing a way to read back the status of our reset controls. Consumers had no real way to be certain whether a peripheral was held in reset or not. Implement the status callback to complete the API support. Fixes: 1d80c14 ("clk: sunxi-ng: Add common infrastructure") Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The 2x outputs of the 2 video PLL clocks are directly used by the HDMI controller block. Export them so they can be referenced in the device tree. Fixes: c6e6c96 ("clk: sunxi-ng: Add A31/A31s clocks") Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The HDMI DDC clock found in the CCU is the parent of the actual DDC clock within the HDMI controller. That clock is also named "hdmi-ddc". Rename the one in the CCU to "hdmi-ddc-parent". This makes more sense than renaming the one in the HDMI controller to something else. Fixes: c6e6c96 ("clk: sunxi-ng: Add A31/A31s clocks") Signed-off-by: Chen-Yu Tsai <wens@csie.org>
When binding the TCON, we were checking the reset control status and asserting reset if it wasn't in reset. The check failed to account for the reset control API returning error codes if the status callback was not implemented. Since we want the TCON to be reset in all cases, and re-asserting the reset control does no harm, just assert the reset unconditionally. Fixes: 9026e0d ("drm: Add Allwinner A10 Display Engine support") Signed-off-by: Chen-Yu Tsai <wens@csie.org>
If we want to have vblank on both pipelines at the same time, we need to call drm_vblank_init with num_crtcs = 2. Instead, since the crtc init calls correctly set mode_config.num_crtc, we can move the drm_vblank_init call to after the crtc init code is called, which is the component bind part. Then we can just pass mode_config.num_crtc in. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
… pass The encoder drivers use drm_of_find_possible_crtcs to get upstream crtcs from the device tree using of_graph. For the results to be correct, encoders must be probed/bound after _all_ crtcs have been created. The existing code uses a depth first recursive traversal of the of_graph, which means the encoders downstream of the TCON get add right after the first TCON. The second TCON or CRTC will never be properly associated with encoders connected to it. Other platforms, such as Rockchip, deal with this by probing all crtcs first, then all subsequent components. This is easy to do since the crtcs correspond to just one device node, and are the first nodes in the pipeline. However with Allwinner SoCs, the function of the CRTC is split between the display backend (DE 1.0) or mixer (DE 2.0), which does scan-out and compositing, and the TCON, which generating the display timing signals. Further complicating the process, there may be a Dynamic Range Controller between the backend and the TCON. Also, the backend is preceded by the frontend, with a Display Enhancement Unit possibly in between. One solution would be, instead of a depth first traversal of the component of_graph, we do a breadth first traversal, so that components at the same depth are grouped together. This however requires us to implement extra code for a queue structure that is only used here. Instead, since we can identify TCON device nodes, and since the component system can gracefully deal with duplicate entries, we can add components in two passes, using the existing recursive depth code. The first pass stops right after the TCON is added. The second pass will re-add all components up to the TCON, but these will be skipped since they will have already been bound with the entries from the first pass. The encoders added in the second pass will be the last entries in the list. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
On systems with 2 TCONs such as the A31, it is possible to demux the output of the TCONs to one encoder. Add support for this for the A31. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
…function The HDMI driver enables the bus and mod clocks in the bind function, but does not disable them if it then bails our due to any errors. Neither does it disable the clocks in the unbind function. Fix this by adding a proper error path to the bind function, and clk_disable_unprepare calls to the unbind function. Also rename the err_cleanup_connector label to err_cleanup_encoder, since it is the encoder that gets cleaned up. Fixes: 9c56810 ("drm/sun4i: Add HDMI support") Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The HDMI driver is written with readl/writel I/O to the registers. However, to support the A31 variant, which has a different layout for the DDC registers, it was recommended to use regfields to have a cleaner implementation. To use regfields, we need to create an underlying regmap. This patch only adds the regmap. It does not convert the existing driver accesses to use regmap. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
On SoCs with two display pipelines, it is possible that the two pipelines are active at the same time, with potentially incompatible dot clocks. Let the HDMI encoder's TMDS clock go through all of its parents when calculating possible clock rates. This allows usage of the second video PLL as its parent. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The HDMI controller in the A31 SoC is slightly different from the earlier version. In addition to the TMDS clock and DDC controls, this version now takes a second DDC clock input. Add a compatible string for it, and add the DDC clock input to the list of clocks required. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Rob Herring <robh@kernel.org>
This patch adds a macro regmap_field_read_poll_timeout that works similar to the readx_poll_timeout defined in linux/iopoll.h, except that this can also return the error value returned by a failed regmap_field_read. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The HDMI controller found in earlier Allwinner SoCs have slight differences between the A10, A10s, and the A31: - Need different initial values for the PLL related registers - Different behavior of the DDC and TMDS clocks - Different register layout for the DDC portion - Separate DDC parent clock on the A31 - Explicit reset control For the A31, the HDMI TMDS clock has a different value offset for the divider. The HDMI DDC block is different from the one in the other SoCs. As far as the DDC clock goes, it has no pre-divider, as it is clocked from a slower parent clock, not the TMDS clock. The divider offset from the register value is different. And the clock control register is at a different offset. A new variant data structure is created to store pointers to the above functions, structures, and the different initial values. Another flag notates whether there is a separate DDC parent clock. If not, the TMDS clock is passed to the DDC clock create function, as before. Regmap fields are used to deal with the different register layout of the DDC block. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The DDC block for the HDMI controller is different on the A31. This patch adds the register definitions. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The HDMI controller found in the A31 SoCs is slightly different from the one already supported, which is found in the A10s: - Need different initial values for the PLL related registers - Different behavior of the DDC and TMDS clocks - Different register layout for the DDC portion - Separate DDC parent clock This patch adds support for it. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The TCON has a mux to select the source of the data to display. This mux includes selecting the display backends. On the A31, which has two display pipelines, this mux can let the TCON select either backend as its data source. Although the muxing can be changed on the fly, DRM needs to be able to group a bunch of layers such that they get switched to another crtc together. This is because the display backend does the layer compositing, while the TCON generates the display timings. This constraint is not supported by DRM. Here we simply pair up backends and TCONs with the same ID. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Now that we support the HDMI controller on the A31 SoC, we can add it to the device tree. This adds a device node for the HDMI controller, and the of_graph nodes connecting it to the 2 TCONs. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The A31 Humminbird has an HDMI connector wired to the HDMI pins on the SoC. Enable HDMI support for this board. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The Sinlinx SinA31s has an HDMI connector wired to the HDMI pins from the SoC. Enable the display pipeline and the HDMI output. Signed-off-by: Chen-Yu Tsai <wens@csie.org>
codekipper
pushed a commit
that referenced
this pull request
Mar 7, 2023
Eduard Zingerman says: ==================== Changes v1 -> v2, suggested by Alexei: - Resolved conflict with recent commit: 6fcd486 ("bpf: Refactor RCU enforcement in the verifier"); - Variable `ctx_access` removed in function `convert_ctx_accesses()`; - Macro `BPF_COPY_STORE` renamed to `BPF_EMIT_STORE` and fixed to correctly extract original store instruction class from code. Original message follows: The function verifier.c:convert_ctx_access() applies some rewrites to BPF instructions that read from or write to the BPF program context. For example, the write instruction for the `struct bpf_sockopt::retval` field: *(u32 *)(r1 + offsetof(struct bpf_sockopt, retval)) = r2 Is transformed to: *(u64 *)(r1 + offsetof(struct bpf_sockopt_kern, tmp_reg)) = r9 r9 = *(u64 *)(r1 + offsetof(struct bpf_sockopt_kern, current_task)) r9 = *(u64 *)(r9 + offsetof(struct task_struct, bpf_ctx)) *(u32 *)(r9 + offsetof(struct bpf_cg_run_ctx, retval)) = r2 r9 = *(u64 *)(r1 + offsetof(struct bpf_sockopt_kern, tmp_reg)) Currently, the verifier only supports such transformations for LDX (memory-to-register read) and STX (register-to-memory write) instructions. Error is reported for ST instructions (immediate-to-memory write). This is fine because clang does not currently emit ST instructions. However, new `-mcpu=v4` clang flag is planned, which would allow to emit ST instructions (discussed in [1]). This patch-set adjusts the verifier to support ST instructions in `verifier.c:convert_ctx_access()`. The patches #1 and linux-sunxi#2 were previously shared as part of RFC [2]. The changes compared to that RFC are: - In patch #1, a bug in the handling of the `struct __sk_buff::queue_mapping` field was fixed. - Patch linux-sunxi#3 is added, which is a set of disassembler-based test cases for context access rewrites. The test cases cover all fields for which the handling code is modified in patch #1. [1] Propose some new instructions for -mcpu=v4 https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/ [2] RFC Support for BPF_ST instruction in LLVM C compiler https://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/ [3] v1 https://lore.kernel.org/bpf/20230302225507.3413720-1-eddyz87@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Mar 7, 2023
The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: <TASK> xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 </TASK> .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Mar 7, 2023
The locking in phy_probe() and phy_remove() does very little to prevent any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA warnings. Remove it. ====================================================== WARNING: possible circular locking dependency detected 6.2.0-dirty #1108 Tainted: G W E ------------------------------------------------------ ip/415 is trying to acquire lock: ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy] but task is already holding lock: ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 rtnl_lock+0x24/0x30 sfp_bus_add_upstream+0x34/0x150 phy_sfp_probe+0x4c/0x94 [libphy] mv3310_probe+0x148/0x184 [marvell10g] phy_probe+0x8c/0x200 [libphy] call_driver_probe+0xbc/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __device_attach_driver+0xc4/0x160 bus_for_each_drv+0x80/0xe0 __device_attach+0xb0/0x1f0 device_initial_probe+0x1c/0x2c bus_probe_device+0xa4/0xb0 device_add+0x360/0x53c phy_device_register+0x60/0xa4 [libphy] fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio] fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio] of_mdiobus_register+0x140/0x340 [of_mdio] orion_mdio_probe+0x298/0x3c0 [mvmdio] platform_probe+0x70/0xe0 call_driver_probe+0x34/0x15c really_probe+0xc0/0x320 __driver_probe_device+0x84/0x120 driver_probe_device+0x44/0x120 __driver_attach+0x104/0x210 bus_for_each_dev+0x78/0xdc driver_attach+0x2c/0x3c bus_add_driver+0x184/0x240 driver_register+0x80/0x13c __platform_driver_register+0x30/0x3c xt_compat_calc_jump+0x28/0xa4 [x_tables] do_one_initcall+0x50/0x1b0 do_init_module+0x50/0x1fc load_module+0x684/0x744 __do_sys_finit_module+0xc4/0x140 __arm64_sys_finit_module+0x28/0x34 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 -> #0 (&dev->lock){+.+.}-{3:3}: check_prev_add+0xb4/0xc80 validate_chain+0x414/0x47c __lock_acquire+0x35c/0x6c0 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x84 __mutex_lock+0x8c/0x414 mutex_lock_nested+0x34/0x40 phy_attach_direct+0x17c/0x3a0 [libphy] phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink] phylink_fwnode_phy_connect+0x48/0x60 [phylink] mvpp2_open+0xec/0x2e0 [mvpp2] __dev_open+0x104/0x214 __dev_change_flags+0x1d4/0x254 dev_change_flags+0x2c/0x7c do_setlink+0x254/0xa50 __rtnl_newlink+0x430/0x514 rtnl_newlink+0x58/0x8c rtnetlink_rcv_msg+0x17c/0x560 netlink_rcv_skb+0x64/0x150 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x1d4/0x2b4 netlink_sendmsg+0x1a4/0x400 ____sys_sendmsg+0x228/0x290 ___sys_sendmsg+0x88/0xec __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x6c/0x1b0 do_el0_svc+0x34/0x44 el0_svc+0x48/0xf0 el0t_64_sync_handler+0xb8/0xc0 el0t_64_sync+0x1a0/0x1a4 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&dev->lock); lock(rtnl_mutex); lock(&dev->lock); *** DEADLOCK *** Fixes: 298e54f ("net: phy: add core phylib sfp support") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
codekipper
pushed a commit
that referenced
this pull request
Mar 7, 2023
The driver needs to keep track of all the possible concurrent TPA (GRO/LRO) completions on the aggregation ring. On P5 chips, the maximum number of concurrent TPA is 256 and the amount of memory we allocate is order-5 on systems using 4K pages. Memory allocation failure has been reported: NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1 Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022 Call Trace: dump_stack+0x57/0x6e warn_alloc.cold.120+0x7b/0xdd ? _cond_resched+0x15/0x30 ? __alloc_pages_direct_compact+0x15f/0x170 __alloc_pages_slowpath.constprop.108+0xc58/0xc70 __alloc_pages_nodemask+0x2d0/0x300 kmalloc_order+0x24/0xe0 kmalloc_order_trace+0x19/0x80 bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en] ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en] __bnxt_open_nic+0x12e/0x780 [bnxt_en] bnxt_open+0x10b/0x240 [bnxt_en] __dev_open+0xe9/0x180 __dev_change_flags+0x1af/0x220 dev_change_flags+0x21/0x60 do_setlink+0x35c/0x1100 Instead of allocating this big chunk of memory and dividing it up for the concurrent TPA instances, allocate each small chunk separately for each TPA instance. This will reduce it to order-0 allocations. Fixes: 79632e9 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
codekipper
pushed a commit
that referenced
this pull request
Mar 7, 2023
On a 16 node virtual cluster with e1000 NICs joining the 12th node prints SYN flood warnings for the DLM port: Dec 21 01:46:41 localhost kernel: [ 2146.516664] TCP: request_sock_TCP: Possible SYN flooding on port 21064. Sending cookies. Check SNMP counters. And then joining a DLM lockspace hangs: ``` Dec 21 01:49:00 localhost kernel: [ 2285.780913] INFO: task xapi-clusterd:17638 blocked for more than 120 seconds. Dec 21 01:49:00 localhost kernel: [ 2285.786476] Not tainted 4.4.0+10 #1 Dec 21 01:49:00 localhost kernel: [ 2285.789043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 21 01:49:00 localhost kernel: [ 2285.794611] xapi-clusterd D ffff88001930bc58 0 17638 1 0x00000000 Dec 21 01:49:00 localhost kernel: [ 2285.794615] ffff88001930bc58 ffff880025593800 ffff880022433800 ffff88001930c000 Dec 21 01:49:00 localhost kernel: [ 2285.794617] ffff88000ef4a660 ffff88000ef4a658 ffff880022433800 ffff88000ef4a000 Dec 21 01:49:00 localhost kernel: [ 2285.794619] ffff88001930bc70 ffffffff8159f6b4 7fffffffffffffff ffff88001930bd10 Dec 21 01:49:00 localhost kernel: [ 2285.794644] [<ffffffff811570fe>] ? printk+0x4d/0x4f Dec 21 01:49:00 localhost kernel: [ 2285.794647] [<ffffffff810b1741>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 Dec 21 01:49:00 localhost kernel: [ 2285.794649] [<ffffffff815a085d>] wait_for_completion+0x9d/0x110 Dec 21 01:49:00 localhost kernel: [ 2285.794653] [<ffffffff810979e0>] ? wake_up_q+0x80/0x80 Dec 21 01:49:00 localhost kernel: [ 2285.794661] [<ffffffffa03fa4b8>] dlm_new_lockspace+0x908/0xac0 [dlm] Dec 21 01:49:00 localhost kernel: [ 2285.794665] [<ffffffff810aaa60>] ? prepare_to_wait_event+0x100/0x100 Dec 21 01:49:00 localhost kernel: [ 2285.794670] [<ffffffffa0402e37>] device_write+0x497/0x6b0 [dlm] Dec 21 01:49:00 localhost kernel: [ 2285.794673] [<ffffffff811834f0>] ? handle_mm_fault+0x7f0/0x13b0 Dec 21 01:49:00 localhost kernel: [ 2285.794677] [<ffffffff811b4438>] __vfs_write+0x28/0xd0 Dec 21 01:49:00 localhost kernel: [ 2285.794679] [<ffffffff811b4b7f>] ? rw_verify_area+0x6f/0xd0 Dec 21 01:49:00 localhost kernel: [ 2285.794681] [<ffffffff811b4dc1>] vfs_write+0xb1/0x190 Dec 21 01:49:00 localhost kernel: [ 2285.794686] [<ffffffff8105ffc2>] ? __do_page_fault+0x302/0x420 Dec 21 01:49:00 localhost kernel: [ 2285.794688] [<ffffffff811b5986>] SyS_write+0x46/0xa0 Dec 21 01:49:00 localhost kernel: [ 2285.794690] [<ffffffff815a31ae>] entry_SYSCALL_64_fastpath+0x12/0x71 ``` The previous limit of 5 seems like an arbitrary number, that doesn't match any known DLM cluster size upper bound limit. Signed-off-by: Edwin Török <edvin.torok@citrix.com> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 8d21155 ] adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled when unlock. DEADLOCK might happen if we have held other locks and disabled IRQ before invoking it. Fix it by using spin_lock_irqsave() instead, which can keep IRQ state consistent with before when unlock. ================================ WARNING: inconsistent lock state 5.10.0-02758-g8e5f91fd772f linux-sunxi#26 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 {IN-HARDIRQ-W} state was registered at: __lock_acquire+0x3d7/0x1070 lock_acquire+0x197/0x4a0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3b/0x60 bfq_idle_slice_timer_body bfq_idle_slice_timer+0x53/0x1d0 __run_hrtimer+0x477/0xa70 __hrtimer_run_queues+0x1c6/0x2d0 hrtimer_interrupt+0x302/0x9e0 local_apic_timer_interrupt __sysvec_apic_timer_interrupt+0xfd/0x420 run_sysvec_on_irqstack_cond sysvec_apic_timer_interrupt+0x46/0xa0 asm_sysvec_apic_timer_interrupt+0x12/0x20 irq event stamp: 837522 hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40 hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50 softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bfqd->lock); <Interrupt> lock(&bfqd->lock); *** DEADLOCK *** 3 locks held by kworker/2:3/388: #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0 #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0 linux-sunxi#2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq linux-sunxi#2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 stack backtrace: CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f linux-sunxi#26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kthrotld blk_throtl_dispatch_work_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 print_usage_bug valid_state mark_lock_irq.cold+0x32/0x3a mark_lock+0x693/0xbc0 mark_held_locks+0x9e/0xe0 __trace_hardirqs_on_caller lockdep_hardirqs_on_prepare.part.0+0x151/0x360 trace_hardirqs_on+0x5b/0x180 __raw_spin_unlock_irq _raw_spin_unlock_irq+0x24/0x40 spin_unlock_irq adjust_inuse_and_calc_cost+0x4fb/0x970 ioc_rqos_merge+0x277/0x740 __rq_qos_merge+0x62/0xb0 rq_qos_merge bio_attempt_back_merge+0x12c/0x4a0 blk_mq_sched_try_merge+0x1b6/0x4d0 bfq_bio_merge+0x24a/0x390 __blk_mq_sched_bio_merge+0xa6/0x460 blk_mq_sched_bio_merge blk_mq_submit_bio+0x2e7/0x1ee0 __submit_bio_noacct_mq+0x175/0x3b0 submit_bio_noacct+0x1fb/0x270 blk_throtl_dispatch_work_fn+0x1ef/0x2b0 process_one_work+0x83e/0x13f0 process_scheduled_works worker_thread+0x7e3/0xd80 kthread+0x353/0x470 ret_from_fork+0x1f/0x30 Fixes: b0853ab ("blk-iocost: revamp in-period donation snapbacks") Signed-off-by: Li Nan <linan122@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230527091904.3001833-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 23fc8df ] Running the 'kfree_rcu_test' test case [1] results in a splat [2]. The root cause is the kfree_scale_thread thread(s) continue running after unloading the rcuscale module. This commit fixes that isue by invoking kfree_scale_cleanup() from rcu_scale_cleanup() when removing the rcuscale module. [1] modprobe rcuscale kfree_rcu_test=1 // After some time rmmod rcuscale rmmod torture [2] BUG: unable to handle page fault for address: ffffffffc0601a87 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0 Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:0xffffffffc0601a87 Code: Unable to access opcode bytes at 0xffffffffc0601a5d. RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe FS: 0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? kvfree_call_rcu+0xf0/0x3a0 ? kthread+0xf3/0x120 ? kthread_complete_and_exit+0x20/0x20 ? ret_from_fork+0x1f/0x30 </TASK> Modules linked in: rfkill sunrpc ... [last unloaded: torture] CR2: ffffffffc0601a87 ---[ end trace 0000000000000000 ]--- Fixes: e6e78b0 ("rcuperf: Add kfree_rcu() performance Tests") Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit f655bad ] Fix propagate_precision() logic to perform propagation of all necessary registers and stack slots across all active frames *in one batch step*. Doing this for each register/slot in each individual frame is wasteful, but the main problem is that backtracking of instruction in any frame except the deepest one just doesn't work. This is due to backtracking logic relying on jump history, and available jump history always starts (or ends, depending how you view it) in current frame. So, if prog A (frame #0) called subprog B (frame #1) and we need to propagate precision of, say, register R6 (callee-saved) within frame #0, we actually don't even know where jump history that corresponds to prog A even starts. We'd need to skip subprog part of jump history first to be able to do this. Luckily, with struct backtrack_state and __mark_chain_precision() handling bitmasks tracking/propagation across all active frames at the same time (added in previous patch), propagate_precision() can be both fixed and sped up by setting all the necessary bits across all frames and then performing one __mark_chain_precision() pass. This makes it unnecessary to skip subprog parts of jump history. We also improve logging along the way, to clearly specify which registers' and slots' precision markings are propagated within which frame. Each frame will have dedicated line and all registers and stack slots from that frame will be reported in format similar to precision backtrack regs/stack logging. E.g.: frame 1: propagating r1,r2,r3,fp-8,fp-16 frame 0: propagating r3,r9,fp-120 Fixes: 529409e ("bpf: propagate precision across all frames, not just the last one") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 904e6dd ] Change mark_chain_precision() to track precision in situations like below: r2 = unknown value ... --- state #0 --- ... r1 = r2 // r1 and r2 now share the same ID ... --- state #1 {r1.id = A, r2.id = A} --- ... if (r2 > 10) goto exit; // find_equal_scalars() assigns range to r1 ... --- state linux-sunxi#2 {r1.id = A, r2.id = A} --- r3 = r10 r3 += r1 // need to mark both r1 and r2 At the beginning of the processing of each state, ensure that if a register with a scalar ID is marked as precise, all registers sharing this ID are also marked as precise. This property would be used by a follow-up change in regsafe(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230613153824.3324830-2-eddyz87@gmail.com Stable-dep-of: 1ffc85d ("bpf: Verify scalar ids mapping in regsafe() using check_ids()") Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 99d4850 ] Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 linux-sunxi#2 0x556701d70589 in perf_env__cpuid util/env.c:465 linux-sunxi#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 linux-sunxi#4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 linux-sunxi#5 0x556701d8f78b in evsel__config util/evsel.c:1366 linux-sunxi#6 0x556701ef5872 in evlist__config util/record.c:108 linux-sunxi#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 linux-sunxi#8 0x556701cacd07 in run_test tests/builtin-test.c:236 linux-sunxi#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 linux-sunxi#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 linux-sunxi#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 linux-sunxi#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 linux-sunxi#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 linux-sunxi#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 linux-sunxi#15 0x556701d3c3f8 in main tools/perf/perf.c:537 linux-sunxi#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20230613235416.1650755-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit b684c09 ] ppc_save_regs() skips one stack frame while saving the CPU register states. Instead of saving current R1, it pulls the previous stack frame pointer. When vmcores caused by direct panic call (such as `echo c > /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the backtrace correctly. On further analysis, it was found that it was because of mismatch between r1 and NIP. GDB uses NIP to get current function symbol and uses corresponding debug info of that function to unwind previous frames, but due to the mismatching r1 and NIP, the unwinding does not work, and it fails to unwind to the 2nd frame and hence does not show the backtrace. GDB backtrace with vmcore of kernel without this patch: --------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974 linux-sunxi#2 0x0000000000000063 in ?? () linux-sunxi#3 0xc000000003579320 in ?? () --------- Further analysis revealed that the mismatch occurred because "ppc_save_regs" was saving the previous stack's SP instead of the current r1. This patch fixes this by storing current r1 in the saved pt_regs. GDB backtrace with vmcore of patched kernel: -------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974 linux-sunxi#2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n") at kernel/panic.c:358 linux-sunxi#3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155 linux-sunxi#4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602 linux-sunxi#5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163 linux-sunxi#6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340 linux-sunxi#7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352 linux-sunxi#8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00, buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582 linux-sunxi#9 0xc0000000005b4264 in ksys_write (fd=<optimized out>, buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2) at fs/read_write.c:637 linux-sunxi#10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:171 linux-sunxi#11 0xc00000000000c270 in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:192 -------- Nick adds: So this now saves regs as though it was an interrupt taken in the caller, at the instruction after the call to ppc_save_regs, whereas previously the NIP was there, but R1 came from the caller's caller and that mismatch is what causes gdb's dwarf unwinder to go haywire. Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Fixes: d16a58f ("powerpc: Improve ppc_save_regs()") Reivewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230615091047.90433-1-adityag@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
…s list commit babebf0 upstream. When deleting the free space tree we are deleting the free space root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the free space root from that list. Fixes: a5ed918 ("Btrfs: implement the free space B-tree") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 303c9c6 ] Changes in VFIO caused a pseudo-device to be created as child of fsl-mc devices causing a crash [1] when trying to bind a fsl-mc device to VFIO. Fix this by checking the device type when enumerating fsl-mc child devices. [1] Modules linked in: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP CPU: 6 PID: 1289 Comm: sh Not tainted 6.2.0-rc5-00047-g7c46948a6e9c linux-sunxi#2 Hardware name: NXP Layerscape LX2160ARDB (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mc_send_command+0x24/0x1f0 lr : dprc_get_obj_region+0xfc/0x1c0 sp : ffff80000a88b900 x29: ffff80000a88b900 x28: ffff48a9429e1400 x27: 00000000000002b2 x26: ffff48a9429e1718 x25: 0000000000000000 x24: 0000000000000000 x23: ffffd59331ba3918 x22: ffffd59331ba3000 x21: 0000000000000000 x20: ffff80000a88b9b8 x19: 0000000000000000 x18: 0000000000000001 x17: 7270642f636d2d6c x16: 73662e3030303030 x15: ffffffffffffffff x14: ffffd59330f1d668 x13: ffff48a8727dc389 x12: ffff48a8727dc386 x11: 0000000000000002 x10: 00008ceaf02f35d4 x9 : 0000000000000012 x8 : 0000000000000000 x7 : 0000000000000006 x6 : ffff80000a88bab0 x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000a88b9e8 x2 : ffff80000a88b9e8 x1 : 0000000000000000 x0 : ffff48a945142b80 Call trace: mc_send_command+0x24/0x1f0 dprc_get_obj_region+0xfc/0x1c0 fsl_mc_device_add+0x340/0x590 fsl_mc_obj_device_add+0xd0/0xf8 dprc_scan_objects+0x1c4/0x340 dprc_scan_container+0x38/0x60 vfio_fsl_mc_probe+0x9c/0xf8 fsl_mc_driver_probe+0x24/0x70 really_probe+0xbc/0x2a8 __driver_probe_device+0x78/0xe0 device_driver_attach+0x30/0x68 bind_store+0xa8/0x130 drv_attr_store+0x24/0x38 sysfs_kf_write+0x44/0x60 kernfs_fop_write_iter+0x128/0x1b8 vfs_write+0x334/0x448 ksys_write+0x68/0xf0 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x108 el0_svc_common.constprop.1+0x94/0xf8 do_el0_svc+0x38/0xb0 el0_svc+0x20/0x50 el0t_64_sync_handler+0x98/0xc0 el0t_64_sync+0x174/0x178 Code: aa0103f4 a9025bf5 d5384100 b9400801 (79401260) ---[ end trace 0000000000000000 ]--- Fixes: 3c28a76 ("vfio: Add struct device to vfio_device") Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Message-ID: <20230613160718.29500-1-laurentiu.tudor@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit f06cf1e ] vduse_vdpa_set_vq_affinity callback can be called with NULL value as cpu_mask when deleting the vduse device. This patch resets virtqueue's IRQ affinity mask value to set all CPUs instead of dereferencing NULL cpu_mask. [ 4760.952149] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 4760.959110] #PF: supervisor read access in kernel mode [ 4760.964247] #PF: error_code(0x0000) - not-present page [ 4760.969385] PGD 0 P4D 0 [ 4760.971927] Oops: 0000 [#1] PREEMPT SMP PTI [ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ linux-sunxi#4 [ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020 [ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130 [ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66 [ 4761.012793] RSP: 0018:ffffb1d565abb830 EFLAGS: 00010246 [ 4761.018020] RAX: ffff9f4bf6b27898 RBX: ffff9f4be23969c0 RCX: ffff9f4bcadf6400 [ 4761.025152] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff9f4bf6b27898 [ 4761.032286] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000 [ 4761.039416] R10: 0000000000000000 R11: 0000000000000600 R12: 0000000000000000 [ 4761.046549] R13: 0000000000000000 R14: 0000000000000080 R15: ffffb1d565abbb10 [ 4761.053680] FS: 00007f64c2ec2740(0000) GS:ffff9f635f980000(0000) knlGS:0000000000000000 [ 4761.061765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4761.067513] CR2: 0000000000000000 CR3: 0000001875270006 CR4: 00000000007706e0 [ 4761.074645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4761.081775] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4761.088909] PKRU: 55555554 [ 4761.091620] Call Trace: [ 4761.094074] <TASK> [ 4761.096180] ? __die+0x1f/0x70 [ 4761.099238] ? page_fault_oops+0x171/0x4f0 [ 4761.103340] ? exc_page_fault+0x7b/0x180 [ 4761.107265] ? asm_exc_page_fault+0x22/0x30 [ 4761.111460] ? memcpy_orig+0xc5/0x130 [ 4761.115126] vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse] [ 4761.120533] virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net] [ 4761.126635] remove_vq_common+0x1a4/0x250 [virtio_net] [ 4761.131781] virtnet_remove+0x5d/0x70 [virtio_net] [ 4761.136580] virtio_dev_remove+0x3a/0x90 [ 4761.140509] device_release_driver_internal+0x19b/0x200 [ 4761.145742] bus_remove_device+0xc2/0x130 [ 4761.149755] device_del+0x158/0x3e0 [ 4761.153245] ? kernfs_find_ns+0x35/0xc0 [ 4761.157086] device_unregister+0x13/0x60 [ 4761.161010] unregister_virtio_device+0x11/0x20 [ 4761.165543] device_release_driver_internal+0x19b/0x200 [ 4761.170770] bus_remove_device+0xc2/0x130 [ 4761.174782] device_del+0x158/0x3e0 [ 4761.178276] ? __pfx_vdpa_name_match+0x10/0x10 [vdpa] [ 4761.183336] device_unregister+0x13/0x60 [ 4761.187260] vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa] Fixes: 28f6288 ("vduse: Support set_vq_affinity callback") Cc: xieyongji@bytedance.com Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230622204851.318125-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 85fadc0 ] The initial memblock metadata is accessed from kernel image mapping. The regions arrays need to "reallocated" from memblock and accessed through linear mapping to cover more memblock regions. So the resizing should not be allowed until linear mapping is ready. Note that there are memblock allocations when building linear mapping. This patch is similar to 24cc61d ("arm64: memblock: don't permit memblock resizing until linear mapping is up"). In following log, many memblock regions are reserved before create_linear_mapping_page_table(). And then it triggered reallocation of memblock.reserved.regions and memcpy the old array in kernel image mapping to the new array in linear mapping which caused a page fault. [ 0.000000] memblock_reserve: [0x00000000bf01f000-0x00000000bf01ffff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf021000-0x00000000bf021fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf023000-0x00000000bf023fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf025000-0x00000000bf025fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf027000-0x00000000bf027fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf029000-0x00000000bf029fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf02b000-0x00000000bf02bfff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf02d000-0x00000000bf02dfff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf02f000-0x00000000bf02ffff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] memblock_reserve: [0x00000000bf030000-0x00000000bf030fff] early_init_fdt_scan_reserved_mem+0x28c/0x2c6 [ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008007ffff (512 KiB) map non-reusable mmode_resv0@80000000 [ 0.000000] memblock_reserve: [0x00000000bf000000-0x00000000bf001fed] paging_init+0x19a/0x5ae [ 0.000000] memblock_phys_alloc_range: 4096 bytes align=0x1000 from=0x0000000000000000 max_addr=0x0000000000000000 alloc_pmd_fixmap+0x14/0x1c [ 0.000000] memblock_reserve: [0x000000017ffff000-0x000000017fffffff] memblock_alloc_range_nid+0xb8/0x128 [ 0.000000] memblock: reserved is doubled to 256 at [0x000000017fffd000-0x000000017fffe7ff] [ 0.000000] Unable to handle kernel paging request at virtual address ff600000ffffd000 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.4.0-rc1-00011-g99a670b2069c linux-sunxi#66 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] epc : __memcpy+0x60/0xf8 [ 0.000000] ra : memblock_double_array+0x192/0x248 [ 0.000000] epc : ffffffff8081d214 ra : ffffffff80a3dfc0 sp : ffffffff81403bd0 [ 0.000000] gp : ffffffff814fbb38 tp : ffffffff8140dac0 t0 : 0000000001600000 [ 0.000000] t1 : 0000000000000000 t2 : 000000008f001000 s0 : ffffffff81403c60 [ 0.000000] s1 : ffffffff80c0bc98 a0 : ff600000ffffd000 a1 : ffffffff80c0bcd8 [ 0.000000] a2 : 0000000000000c00 a3 : ffffffff80c0c8d8 a4 : 0000000080000000 [ 0.000000] a5 : 0000000000080000 a6 : 0000000000000000 a7 : 0000000080200000 [ 0.000000] s2 : ff600000ffffd000 s3 : 0000000000002000 s4 : 0000000000000c00 [ 0.000000] s5 : ffffffff80c0bc60 s6 : ffffffff80c0bcc8 s7 : 0000000000000000 [ 0.000000] s8 : ffffffff814fd0a8 s9 : 000000017fffe7ff s10: 0000000000000000 [ 0.000000] s11: 0000000000001000 t3 : 0000000000001000 t4 : 0000000000000000 [ 0.000000] t5 : 000000008f003000 t6 : ff600000ffffd000 [ 0.000000] status: 0000000200000100 badaddr: ff600000ffffd000 cause: 000000000000000f [ 0.000000] [<ffffffff8081d214>] __memcpy+0x60/0xf8 [ 0.000000] [<ffffffff80a3e1a2>] memblock_add_range.isra.14+0x12c/0x162 [ 0.000000] [<ffffffff80a3e36a>] memblock_reserve+0x6e/0x8c [ 0.000000] [<ffffffff80a123fc>] memblock_alloc_range_nid+0xb8/0x128 [ 0.000000] [<ffffffff80a1256a>] memblock_phys_alloc_range+0x5e/0x6a [ 0.000000] [<ffffffff80a04732>] alloc_pmd_fixmap+0x14/0x1c [ 0.000000] [<ffffffff80a0475a>] alloc_p4d_fixmap+0xc/0x14 [ 0.000000] [<ffffffff80a04a36>] create_pgd_mapping+0x98/0x17c [ 0.000000] [<ffffffff80a04e9e>] create_linear_mapping_range.constprop.10+0xe4/0x112 [ 0.000000] [<ffffffff80a05bb8>] paging_init+0x3ec/0x5ae [ 0.000000] [<ffffffff80a03354>] setup_arch+0xb2/0x576 [ 0.000000] [<ffffffff80a00726>] start_kernel+0x72/0x57e [ 0.000000] Code: b303 0285 b383 0305 be03 0385 be83 0405 bf03 0485 (b023) 00ef [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- Fixes: 671f9a3 ("RISC-V: Setup initial page tables in two stages") Signed-off-by: Woody Zhang <woodylab@foxmail.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Link: https://lore.kernel.org/r/tencent_FBB94CE615C5CCE7701CD39C15CCE0EE9706@qq.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 160fe8f upstream. Callers of `btrfs_reduce_alloc_profile` expect it to return exactly one allocation profile flag, and failing to do so may ultimately result in a WARN_ON and remount-ro when allocating new blocks, like the below transaction abort on 6.1. `btrfs_reduce_alloc_profile` has two ways of determining the profile, first it checks if a conversion balance is currently running and uses the profile we're converting to. If no balance is currently running, it returns the max-redundancy profile which at least one block in the selected block group has. This works by simply checking each known allocation profile bit in redundancy order. However, `btrfs_reduce_alloc_profile` has not been updated as new flags have been added - first with the `DUP` profile and later with the RAID1C34 profiles. Because of the way it checks, if we have blocks with different profiles and at least one is known, that profile will be selected. However, if none are known we may return a flag set with multiple allocation profiles set. This is currently only possible when a balance from one of the three unhandled profiles to another of the unhandled profiles is canceled after allocating at least one block using the new profile. In that case, a transaction abort like the below will occur and the filesystem will need to be mounted with -o skip_balance to get it mounted rw again (but the balance cannot be resumed without a similar abort). [770.648] ------------[ cut here ]------------ [770.648] BTRFS: Transaction aborted (error -22) [770.648] WARNING: CPU: 43 PID: 1159593 at fs/btrfs/extent-tree.c:4122 find_free_extent+0x1d94/0x1e00 [btrfs] [770.648] CPU: 43 PID: 1159593 Comm: btrfs Tainted: G W 6.1.0-0.deb11.7-powerpc64le #1 Debian 6.1.20-2~bpo11+1a~test [770.648] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [770.648] NIP: c00800000f6784fc LR: c00800000f6784f8 CTR: c000000000d746c0 [770.648] REGS: c000200089afe9a0 TRAP: 0700 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test) [770.648] MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28848282 XER: 20040000 [770.648] CFAR: c000000000135110 IRQMASK: 0 GPR00: c00800000f6784f8 c000200089afec40 c00800000f7ea800 0000000000000026 GPR04: 00000001004820c2 c000200089afea00 c000200089afe9f8 0000000000000027 GPR08: c000200ffbfe7f98 c000000002127f90 ffffffffffffffd8 0000000026d6a6e8 GPR12: 0000000028848282 c000200fff7f3800 5deadbeef0000122 c00000002269d000 GPR16: c0002008c7797c40 c000200089afef17 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000001 c000200008bc5a98 0000000000000001 GPR24: 0000000000000000 c0000003c73088d0 c000200089afef17 c000000016d3a800 GPR28: c0000003c7308800 c00000002269d000 ffffffffffffffea 0000000000000001 [770.648] NIP [c00800000f6784fc] find_free_extent+0x1d94/0x1e00 [btrfs] [770.648] LR [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] [770.648] Call Trace: [770.648] [c000200089afec40] [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] (unreliable) [770.648] [c000200089afed30] [c00800000f681398] btrfs_reserve_extent+0x1a0/0x2f0 [btrfs] [770.648] [c000200089afeea0] [c00800000f681bf0] btrfs_alloc_tree_block+0x108/0x670 [btrfs] [770.648] [c000200089afeff0] [c00800000f66bd68] __btrfs_cow_block+0x170/0x850 [btrfs] [770.648] [c000200089aff100] [c00800000f66c58c] btrfs_cow_block+0x144/0x288 [btrfs] [770.648] [c000200089aff1b0] [c00800000f67113c] btrfs_search_slot+0x6b4/0xcb0 [btrfs] [770.648] [c000200089aff2a0] [c00800000f679f60] lookup_inline_extent_backref+0x128/0x7c0 [btrfs] [770.648] [c000200089aff3b0] [c00800000f67b338] lookup_extent_backref+0x70/0x190 [btrfs] [770.648] [c000200089aff470] [c00800000f67b54c] __btrfs_free_extent+0xf4/0x1490 [btrfs] [770.648] [c000200089aff5a0] [c00800000f67d770] __btrfs_run_delayed_refs+0x328/0x1530 [btrfs] [770.648] [c000200089aff740] [c00800000f67ea2c] btrfs_run_delayed_refs+0xb4/0x3e0 [btrfs] [770.648] [c000200089aff800] [c00800000f699aa4] btrfs_commit_transaction+0x8c/0x12b0 [btrfs] [770.648] [c000200089aff8f0] [c00800000f6dc628] reset_balance_state+0x1c0/0x290 [btrfs] [770.648] [c000200089aff9a0] [c00800000f6e2f7c] btrfs_balance+0x1164/0x1500 [btrfs] [770.648] [c000200089affb40] [c00800000f6f8e4c] btrfs_ioctl+0x2b54/0x3100 [btrfs] [770.648] [c000200089affc80] [c00000000053be14] sys_ioctl+0x794/0x1310 [770.648] [c000200089affd70] [c00000000002af98] system_call_exception+0x138/0x250 [770.648] [c000200089affe10] [c00000000000c654] system_call_common+0xf4/0x258 [770.648] --- interrupt: c00 at 0x7fff94126800 [770.648] NIP: 00007fff94126800 LR: 0000000107e0b594 CTR: 0000000000000000 [770.648] REGS: c000200089affe80 TRAP: 0c00 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test) [770.648] MSR: 900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE> CR: 24002848 XER: 00000000 [770.648] IRQMASK: 0 GPR00: 0000000000000036 00007fffc9439da0 00007fff94217100 0000000000000003 GPR04: 00000000c4009420 00007fffc9439ee8 0000000000000000 0000000000000000 GPR08: 00000000803c7416 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff9467d120 0000000107e64c9c 0000000107e64d0a GPR16: 0000000107e64d06 0000000107e64cf1 0000000107e64cc4 0000000107e64c73 GPR20: 0000000107e64c31 0000000107e64bf1 0000000107e64be7 0000000000000000 GPR24: 0000000000000000 00007fffc9439ee0 0000000000000003 0000000000000001 GPR28: 00007fffc943f713 0000000000000000 00007fffc9439ee8 0000000000000000 [770.648] NIP [00007fff94126800] 0x7fff94126800 [770.648] LR [0000000107e0b594] 0x107e0b594 [770.648] --- interrupt: c00 [770.648] Instruction dump: [770.648] 3b00ffe4 e8898828 481175f5 60000000 4bfff4fc 3be00000 4bfff570 3d220000 [770.648] 7fc4f378 e8698830 4811cd95 e8410018 <0fe00000> f9c10060 f9e10068 fa010070 [770.648] ---[ end trace 0000000000000000 ]--- [770.648] BTRFS: error (device dm-2: state A) in find_free_extent_update_loop:4122: errno=-22 unknown [770.648] BTRFS info (device dm-2: state EA): forced readonly [770.648] BTRFS: error (device dm-2: state EA) in __btrfs_free_extent:3070: errno=-22 unknown [770.648] BTRFS error (device dm-2: state EA): failed to run delayed ref for logical 17838685708288 num_bytes 24576 type 184 action 2 ref_mod 1: -22 [770.648] BTRFS: error (device dm-2: state EA) in btrfs_run_delayed_refs:2144: errno=-22 unknown [770.648] BTRFS: error (device dm-2: state EA) in reset_balance_state:3599: errno=-22 unknown Fixes: 47e6f74 ("btrfs: add support for 3-copy replication (raid1c3)") Fixes: 8d6fac0 ("btrfs: add support for 4-copy replication (raid1c4)") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Matt Corallo <blnxfsl@bluematt.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 5cead54 upstream. There is a fairly unlikely race condition in tree mod log rewind that can result in a kernel panic which has the following trace: [530.569] BTRFS critical (device sda3): unable to find logical 0 length 4096 [530.585] BTRFS critical (device sda3): unable to find logical 0 length 4096 [530.602] BUG: kernel NULL pointer dereference, address: 0000000000000002 [530.618] #PF: supervisor read access in kernel mode [530.629] #PF: error_code(0x0000) - not-present page [530.641] PGD 0 P4D 0 [530.647] Oops: 0000 [#1] SMP [530.654] CPU: 30 PID: 398973 Comm: below Kdump: loaded Tainted: G S O K 5.12.0-0_fbk13_clang_7455_gb24de3bdb045 #1 [530.680] Hardware name: Quanta Mono Lake-M.2 SATA 1HY9U9Z001G/Mono Lake-M.2 SATA, BIOS F20_3A15 08/16/2017 [530.703] RIP: 0010:__btrfs_map_block+0xaa/0xd00 [530.755] RSP: 0018:ffffc9002c2f7600 EFLAGS: 00010246 [530.767] RAX: ffffffffffffffea RBX: ffff888292e41000 RCX: f2702d8b8be15100 [530.784] RDX: ffff88885fda6fb8 RSI: ffff88885fd973c8 RDI: ffff88885fd973c8 [530.800] RBP: ffff888292e410d0 R08: ffffffff82fd7fd0 R09: 00000000fffeffff [530.816] R10: ffffffff82e57fd0 R11: ffffffff82e57d70 R12: 0000000000000000 [530.832] R13: 0000000000001000 R14: 0000000000001000 R15: ffffc9002c2f76f0 [530.848] FS: 00007f38d64af000(0000) GS:ffff88885fd80000(0000) knlGS:0000000000000000 [530.866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [530.880] CR2: 0000000000000002 CR3: 00000002b6770004 CR4: 00000000003706e0 [530.896] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [530.912] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [530.928] Call Trace: [530.934] ? btrfs_printk+0x13b/0x18c [530.943] ? btrfs_bio_counter_inc_blocked+0x3d/0x130 [530.955] btrfs_map_bio+0x75/0x330 [530.963] ? kmem_cache_alloc+0x12a/0x2d0 [530.973] ? btrfs_submit_metadata_bio+0x63/0x100 [530.984] btrfs_submit_metadata_bio+0xa4/0x100 [530.995] submit_extent_page+0x30f/0x360 [531.004] read_extent_buffer_pages+0x49e/0x6d0 [531.015] ? submit_extent_page+0x360/0x360 [531.025] btree_read_extent_buffer_pages+0x5f/0x150 [531.037] read_tree_block+0x37/0x60 [531.046] read_block_for_search+0x18b/0x410 [531.056] btrfs_search_old_slot+0x198/0x2f0 [531.066] resolve_indirect_ref+0xfe/0x6f0 [531.076] ? ulist_alloc+0x31/0x60 [531.084] ? kmem_cache_alloc_trace+0x12e/0x2b0 [531.095] find_parent_nodes+0x720/0x1830 [531.105] ? ulist_alloc+0x10/0x60 [531.113] iterate_extent_inodes+0xea/0x370 [531.123] ? btrfs_previous_extent_item+0x8f/0x110 [531.134] ? btrfs_search_path_in_tree+0x240/0x240 [531.146] iterate_inodes_from_logical+0x98/0xd0 [531.157] ? btrfs_search_path_in_tree+0x240/0x240 [531.168] btrfs_ioctl_logical_to_ino+0xd9/0x180 [531.179] btrfs_ioctl+0xe2/0x2eb0 This occurs when logical inode resolution takes a tree mod log sequence number, and then while backref walking hits a rewind on a busy node which has the following sequence of tree mod log operations (numbers filled in from a specific example, but they are somewhat arbitrary) REMOVE_WHILE_FREEING slot 532 REMOVE_WHILE_FREEING slot 531 REMOVE_WHILE_FREEING slot 530 ... REMOVE_WHILE_FREEING slot 0 REMOVE slot 455 REMOVE slot 454 REMOVE slot 453 ... REMOVE slot 0 ADD slot 455 ADD slot 454 ADD slot 453 ... ADD slot 0 MOVE src slot 0 -> dst slot 456 nritems 533 REMOVE slot 455 REMOVE slot 454 REMOVE slot 453 ... REMOVE slot 0 When this sequence gets applied via btrfs_tree_mod_log_rewind, it allocates a fresh rewind eb, and first inserts the correct key info for the 533 elements, then overwrites the first 456 of them, then decrements the count by 456 via the add ops, then rewinds the move by doing a memmove from 456:988->0:532. We have never written anything past 532, so that memmove writes garbage into the 0:532 range. In practice, this results in a lot of fully 0 keys. The rewind then puts valid keys into slots 0:455 with the last removes, but 456:532 are still invalid. When search_old_slot uses this eb, if it uses one of those invalid slots, it can then read the extent buffer and issue a bio for offset 0 which ultimately panics looking up extent mappings. This bad tree mod log sequence gets generated when the node balancing code happens to do a balance_node_right followed by a push_node_left while logging in the tree mod log. Illustrated for ebs L and R (left and right): L R start: [XXX|YYY|...] [ZZZ|...|...] balance_node_right: [XXX|YYY|...] [...|ZZZ|...] move Z to make room for Y [XXX|...|...] [YYY|ZZZ|...] copy Y from L to R push_node_left: [XXX|YYY|...] [...|ZZZ|...] copy Y from R to L [XXX|YYY|...] [ZZZ|...|...] move Z into emptied space (NOT LOGGED!) This is because balance_node_right logs a move, but push_node_left explicitly doesn't. That is because logging the move would remove the overwritten src < dst range in the right eb, which was already logged when we called btrfs_tree_mod_log_eb_copy. The correct sequence would include a move from 456:988 to 0:532 after remove 0:455 and before removing 0:532. Reversing that sequence would entail creating keys for 0:532, then moving those keys out to 456:988, then creating more keys for 0:455. i.e., REMOVE_WHILE_FREEING slot 532 REMOVE_WHILE_FREEING slot 531 REMOVE_WHILE_FREEING slot 530 ... REMOVE_WHILE_FREEING slot 0 MOVE src slot 456 -> dst slot 0 nritems 533 REMOVE slot 455 REMOVE slot 454 REMOVE slot 453 ... REMOVE slot 0 ADD slot 455 ADD slot 454 ADD slot 453 ... ADD slot 0 MOVE src slot 0 -> dst slot 456 nritems 533 REMOVE slot 455 REMOVE slot 454 REMOVE slot 453 ... REMOVE slot 0 Fix this to log the move but avoid the double remove by putting all the logging logic in btrfs_tree_mod_log_eb_copy which has enough information to detect these cases and properly log moves, removes, and adds. Leave btrfs_tree_mod_log_insert_move to handle insert_ptr and delete_ptr's tree mod logging. (Un)fortunately, this is quite difficult to reproduce, and I was only able to reproduce it by adding sleeps in btrfs_search_old_slot that would encourage more log rewinding during ino_to_logical ioctls. I was able to hit the warning in the previous patch in the series without the fix quite quickly, but not after this patch. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit b31cb5a upstream. When disabling quotas we are deleting the quota root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the quota root from that list. Fixes: bed92ea ("Btrfs: qgroup implementation and prototypes") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 21a235b ] When attempting to run Xen on a QEMU/KVM virtual machine with virtio devices (all x86_64), function xen_dt_get_node() crashes on accessing bus->bridge->parent->of_node because a bridge of the PCI root bus has no parent set: [ 1.694192][ T1] BUG: kernel NULL pointer dereference, address: 0000000000000288 [ 1.695688][ T1] #PF: supervisor read access in kernel mode [ 1.696297][ T1] #PF: error_code(0x0000) - not-present page [ 1.696297][ T1] PGD 0 P4D 0 [ 1.696297][ T1] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1.696297][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.7-1-default #1 openSUSE Tumbleweed a577eae57964bb7e83477b5a5645a1781df990f0 [ 1.696297][ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 1.696297][ T1] RIP: e030:xen_virtio_restricted_mem_acc+0xd9/0x1c0 [ 1.696297][ T1] Code: 45 0c 83 e8 c9 a3 ea ff 31 c0 eb d7 48 8b 87 40 ff ff ff 48 89 c2 48 8b 40 10 48 85 c0 75 f4 48 8b 82 10 01 00 00 48 8b 40 40 <48> 83 b8 88 02 00 00 00 0f 84 45 ff ff ff 66 90 31 c0 eb a5 48 89 [ 1.696297][ T1] RSP: e02b:ffffc90040013cc8 EFLAGS: 00010246 [ 1.696297][ T1] RAX: 0000000000000000 RBX: ffff888006c75000 RCX: 0000000000000029 [ 1.696297][ T1] RDX: ffff888005ed1000 RSI: ffffc900400f100c RDI: ffff888005ee30d0 [ 1.696297][ T1] RBP: ffff888006c75010 R08: 0000000000000001 R09: 0000000330000006 [ 1.696297][ T1] R10: ffff888005850028 R11: 0000000000000002 R12: ffffffff830439a0 [ 1.696297][ T1] R13: 0000000000000000 R14: ffff888005657900 R15: ffff888006e3e1e8 [ 1.696297][ T1] FS: 0000000000000000(0000) GS:ffff88804a000000(0000) knlGS:0000000000000000 [ 1.696297][ T1] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.696297][ T1] CR2: 0000000000000288 CR3: 0000000002e36000 CR4: 0000000000050660 [ 1.696297][ T1] Call Trace: [ 1.696297][ T1] <TASK> [ 1.696297][ T1] virtio_features_ok+0x1b/0xd0 [ 1.696297][ T1] virtio_dev_probe+0x19c/0x270 [ 1.696297][ T1] really_probe+0x19b/0x3e0 [ 1.696297][ T1] __driver_probe_device+0x78/0x160 [ 1.696297][ T1] driver_probe_device+0x1f/0x90 [ 1.696297][ T1] __driver_attach+0xd2/0x1c0 [ 1.696297][ T1] bus_for_each_dev+0x74/0xc0 [ 1.696297][ T1] bus_add_driver+0x116/0x220 [ 1.696297][ T1] driver_register+0x59/0x100 [ 1.696297][ T1] virtio_console_init+0x7f/0x110 [ 1.696297][ T1] do_one_initcall+0x47/0x220 [ 1.696297][ T1] kernel_init_freeable+0x328/0x480 [ 1.696297][ T1] kernel_init+0x1a/0x1c0 [ 1.696297][ T1] ret_from_fork+0x29/0x50 [ 1.696297][ T1] </TASK> [ 1.696297][ T1] Modules linked in: [ 1.696297][ T1] CR2: 0000000000000288 [ 1.696297][ T1] ---[ end trace 0000000000000000 ]--- The PCI root bus is in this case created from ACPI description via acpi_pci_root_add() -> pci_acpi_scan_root() -> acpi_pci_root_create() -> pci_create_root_bus() where the last function is called with parent=NULL. It indicates that no parent is present and then bus->bridge->parent is NULL too. Fix the problem by checking bus->bridge->parent in xen_dt_get_node() for NULL first. Fixes: ef8ae38 ("xen/virtio: Handle PCI devices which Host controller is described in DT") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20230621131214.9398-2-petr.pavlu@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 65e6464 ] Currently the check for NOT_READY flag is performed before obtaining the necessary lock. This opens a possibility for race condition when the flow is concurrently removed from unready_flows list by the workqueue task, which causes a double-removal from the list and a crash[0]. Fix the issue by moving the flag check inside the section protected by uplink_priv->unready_flows_lock mutex. [0]: [44376.389654] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP [44376.391665] CPU: 7 PID: 59123 Comm: tc Not tainted 6.4.0-rc4+ #1 [44376.392984] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [44376.395342] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.396857] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06 [44376.399167] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246 [44376.399680] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00 [44376.400337] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0 [44376.401001] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001 [44376.401663] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000 [44376.402342] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000 [44376.402999] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000 [44376.403787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44376.404343] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0 [44376.405004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44376.405665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [44376.406339] Call Trace: [44376.406651] <TASK> [44376.406939] ? die_addr+0x33/0x90 [44376.407311] ? exc_general_protection+0x192/0x390 [44376.407795] ? asm_exc_general_protection+0x22/0x30 [44376.408292] ? mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.408876] __mlx5e_tc_del_fdb_peer_flow+0xbc/0xe0 [mlx5_core] [44376.409482] mlx5e_tc_del_flow+0x42/0x210 [mlx5_core] [44376.410055] mlx5e_flow_put+0x25/0x50 [mlx5_core] [44376.410529] mlx5e_delete_flower+0x24b/0x350 [mlx5_core] [44376.411043] tc_setup_cb_reoffload+0x22/0x80 [44376.411462] fl_reoffload+0x261/0x2f0 [cls_flower] [44376.411907] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core] [44376.412481] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core] [44376.413044] tcf_block_playback_offloads+0x76/0x170 [44376.413497] tcf_block_unbind+0x7b/0xd0 [44376.413881] tcf_block_setup+0x17d/0x1c0 [44376.414269] tcf_block_offload_cmd.isra.0+0xf1/0x130 [44376.414725] tcf_block_offload_unbind+0x43/0x70 [44376.415153] __tcf_block_put+0x82/0x150 [44376.415532] ingress_destroy+0x22/0x30 [sch_ingress] [44376.415986] qdisc_destroy+0x3b/0xd0 [44376.416343] qdisc_graft+0x4d0/0x620 [44376.416706] tc_get_qdisc+0x1c9/0x3b0 [44376.417074] rtnetlink_rcv_msg+0x29c/0x390 [44376.419978] ? rep_movs_alternative+0x3a/0xa0 [44376.420399] ? rtnl_calcit.isra.0+0x120/0x120 [44376.420813] netlink_rcv_skb+0x54/0x100 [44376.421192] netlink_unicast+0x1f6/0x2c0 [44376.421573] netlink_sendmsg+0x232/0x4a0 [44376.421980] sock_sendmsg+0x38/0x60 [44376.422328] ____sys_sendmsg+0x1d0/0x1e0 [44376.422709] ? copy_msghdr_from_user+0x6d/0xa0 [44376.423127] ___sys_sendmsg+0x80/0xc0 [44376.423495] ? ___sys_recvmsg+0x8b/0xc0 [44376.423869] __sys_sendmsg+0x51/0x90 [44376.424226] do_syscall_64+0x3d/0x90 [44376.424587] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [44376.425046] RIP: 0033:0x7f045134f887 [44376.425403] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [44376.426914] RSP: 002b:00007ffd63a82b98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [44376.427592] RAX: ffffffffffffffda RBX: 000000006481955f RCX: 00007f045134f887 [44376.428195] RDX: 0000000000000000 RSI: 00007ffd63a82c00 RDI: 0000000000000003 [44376.428796] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [44376.429404] R10: 00007f0451208708 R11: 0000000000000246 R12: 0000000000000001 [44376.430039] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400 [44376.430644] </TASK> [44376.430907] Modules linked in: mlx5_ib mlx5_core act_mirred act_tunnel_key cls_flower vxlan dummy sch_ingress openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_g ss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] [44376.433936] ---[ end trace 0000000000000000 ]--- [44376.434373] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.434951] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06 [44376.436452] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246 [44376.436924] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00 [44376.437530] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0 [44376.438179] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001 [44376.438786] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000 [44376.439393] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000 [44376.439998] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000 [44376.440714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44376.441225] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0 [44376.441843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44376.442471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: ad86755 ("net/mlx5e: Protect unready flows with dedicated lock") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit c329b26 ] Ian reported several skb corruptions triggered by rx-gro-list, collecting different oops alike: [ 62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0 [ 62.631083] #PF: supervisor read access in kernel mode [ 62.636312] #PF: error_code(0x0000) - not-present page [ 62.641541] PGD 0 P4D 0 [ 62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 torvalds#364 [ 62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022 [ 62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858 ./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261 net/ipv4/udp_offload.c:277) [ 62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246 [ 62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000 [ 62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4 [ 62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9 [ 62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2 [ 62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9 [ 62.728654] FS: 0000000000000000(0000) GS:ffffa127efa40000(0000) knlGS:0000000000000000 [ 62.736852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0 [ 62.749948] Call Trace: [ 62.752498] <TASK> [ 62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398) [ 62.787605] skb_mac_gso_segment (net/core/gro.c:141) [ 62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2)) [ 62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862 net/core/dev.c:3659) [ 62.804695] validate_xmit_skb_list (net/core/dev.c:3710) [ 62.809158] sch_direct_xmit (net/sched/sch_generic.c:330) [ 62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210) net/netfilter/core.c:626) [ 62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55) [ 62.825652] maybe_deliver (net/bridge/br_forward.c:193) [ 62.829420] br_flood (net/bridge/br_forward.c:233) [ 62.832758] br_handle_frame_finish (net/bridge/br_input.c:215) [ 62.837403] br_handle_frame (net/bridge/br_input.c:298 net/bridge/br_input.c:416) [ 62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387) [ 62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570) [ 62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638 net/core/dev.c:5727) [ 62.876795] napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067) [ 62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191) [ 62.893534] __napi_poll (net/core/dev.c:6498) [ 62.897133] napi_threaded_poll (./include/linux/netpoll.h:89 net/core/dev.c:6640) [ 62.905276] kthread (kernel/kthread.c:379) [ 62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314) [ 62.917119] </TASK> In the critical scenario, rx-gro-list GRO-ed packets are fed, via a bridge, both to the local input path and to an egress device (tun). The segmentation of such packets unsafely writes to the cloned skbs with shared heads. This change addresses the issue by uncloning as needed the to-be-segmented skbs. Reported-by: Ian Kumlien <ian.kumlien@gmail.com> Tested-by: Ian Kumlien <ian.kumlien@gmail.com> Fixes: 3a1296a ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
[ Upstream commit 2aaa8a1 ] With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that has the link-local address as src and dst IP and will be forwarded to an external IP in the IPv6 Ext Hdr. For example, the script below generates a packet whose src IP is the link-local address and dst is updated to 11::. # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done # python3 >>> from socket import * >>> from scapy.all import * >>> >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456" >>> >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR) >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1) >>> >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW) >>> sk.sendto(bytes(pkt), (DST_ADDR, 0)) For such a packet, we call ip6_route_input() to look up a route for the next destination in these three functions depending on the header type. * ipv6_rthdr_rcv() * ipv6_rpl_srh_rcv() * ipv6_srh_rcv() If no route is found, ip6_null_entry is set to skb, and the following dst_input(skb) calls ip6_pkt_drop(). Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev as the input device is the loopback interface. Then, we have to check if skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref for ip6_null_entry. BUG: kernel NULL pointer dereference, address: 0000000000000000 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c linux-sunxi#35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503) Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01 RSP: 0018:ffffc90000003c70 EFLAGS: 00000286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18 RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10 R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0 FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ip6_pkt_drop (net/ipv6/route.c:4513) ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5)) ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483) __netif_receive_skb_one_core (net/core/dev.c:5455) process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895) __napi_poll (net/core/dev.c:6460) net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4231) ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135) rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914) sock_sendmsg (net/socket.c:725 net/socket.c:748) __sys_sendto (net/socket.c:2134) __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f9dc751baea Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003 RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b </TASK> Modules linked in: CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503) Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01 RSP: 0018:ffffc90000003c70 EFLAGS: 00000286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18 RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10 R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0 FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Fixes: 4832c30 ("net: ipv6: put host and anycast routes on device with address") Reported-by: Wang Yufen <wangyufen@huawei.com> Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 458c15d upstream. syzbot reports a bug as below: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942 Call Trace: lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691 __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline] _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300 __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100 f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116 f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664 f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838 vfs_fallocate+0x54b/0x6b0 fs/open.c:324 ksys_fallocate fs/open.c:347 [inline] __do_sys_fallocate fs/open.c:355 [inline] __se_sys_fallocate fs/open.c:353 [inline] __x64_sys_fallocate+0xbd/0x100 fs/open.c:353 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is race condition as below: - since it tries to remount rw filesystem, so that do_remount won't call sb_prepare_remount_readonly to block fallocate, there may be race condition in between remount and fallocate. - in f2fs_remount(), default_options() will reset mount option to default one, and then update it based on result of parse_options(), so there is a hole which race condition can happen. Thread A Thread B - f2fs_fill_super - parse_options - clear_opt(READ_EXTENT_CACHE) - f2fs_remount - default_options - set_opt(READ_EXTENT_CACHE) - f2fs_fallocate - f2fs_insert_range - f2fs_drop_extent_tree - __drop_extent_tree - __may_extent_tree - test_opt(READ_EXTENT_CACHE) return true - write_lock(&et->lock) access NULL pointer - parse_options - clear_opt(READ_EXTENT_CACHE) Cc: <stable@vger.kernel.org> Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 5eda1ad upstream. Thread #1: [122554.641906][ T92] f2fs_getxattr+0xd4/0x5fc -> waiting for f2fs_down_read(&F2FS_I(inode)->i_xattr_sem); [122554.641927][ T92] __f2fs_get_acl+0x50/0x284 [122554.641948][ T92] f2fs_init_acl+0x84/0x54c [122554.641969][ T92] f2fs_init_inode_metadata+0x460/0x5f0 [122554.641990][ T92] f2fs_add_inline_entry+0x11c/0x350 -> Locked dir->inode_page by f2fs_get_node_page() [122554.642009][ T92] f2fs_do_add_link+0x100/0x1e4 [122554.642025][ T92] f2fs_create+0xf4/0x22c [122554.642047][ T92] vfs_create+0x130/0x1f4 Thread linux-sunxi#2: [123996.386358][ T92] __get_node_page+0x8c/0x504 -> waiting for dir->inode_page lock [123996.386383][ T92] read_all_xattrs+0x11c/0x1f4 [123996.386405][ T92] __f2fs_setxattr+0xcc/0x528 [123996.386424][ T92] f2fs_setxattr+0x158/0x1f4 -> f2fs_down_write(&F2FS_I(inode)->i_xattr_sem); [123996.386443][ T92] __f2fs_set_acl+0x328/0x430 [123996.386618][ T92] f2fs_set_acl+0x38/0x50 [123996.386642][ T92] posix_acl_chmod+0xc8/0x1c8 [123996.386669][ T92] f2fs_setattr+0x5e0/0x6bc [123996.386689][ T92] notify_change+0x4d8/0x580 [123996.386717][ T92] chmod_common+0xd8/0x184 [123996.386748][ T92] do_fchmodat+0x60/0x124 [123996.386766][ T92] __arm64_sys_fchmodat+0x28/0x3c Cc: <stable@vger.kernel.org> Fixes: 27161f1 "f2fs: avoid race in between read xattr & write xattr" Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 8bbe9fe upstream. Lockdep warns that the use of the hpte_lock in native_hpte_remove() is not safe against an IRQ coming in: ================================ WARNING: inconsistent lock state 6.4.0-rc2-g0c54f4d30ecc #1 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. qemu-system-ppc/93865 [HC0[0]:SC0[0]:HE1:SE1] takes: c0000000021f5180 (hpte_lock){+.?.}-{0:0}, at: native_lock_hpte+0x8/0xd0 {IN-SOFTIRQ-W} state was registered at: lock_acquire+0x134/0x3f0 native_lock_hpte+0x44/0xd0 native_hpte_insert+0xd4/0x2a0 __hash_page_64K+0x218/0x4f0 hash_page_mm+0x464/0x840 do_hash_fault+0x11c/0x260 data_access_common_virt+0x210/0x220 __ip_select_ident+0x140/0x150 ... net_rx_action+0x3bc/0x440 __do_softirq+0x180/0x534 ... sys_sendmmsg+0x34/0x50 system_call_exception+0x128/0x320 system_call_common+0x160/0x2e4 ... Possible unsafe locking scenario: CPU0 ---- lock(hpte_lock); <Interrupt> lock(hpte_lock); *** DEADLOCK *** ... Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) print_usage_bug.part.0+0x250/0x278 mark_lock+0xc9c/0xd30 __lock_acquire+0x440/0x1ca0 lock_acquire+0x134/0x3f0 native_lock_hpte+0x44/0xd0 native_hpte_remove+0xb0/0x190 kvmppc_mmu_map_page+0x650/0x698 [kvm_pr] kvmppc_handle_pagefault+0x534/0x6e8 [kvm_pr] kvmppc_handle_exit_pr+0x6d8/0xe90 [kvm_pr] after_sprg3_load+0x80/0x90 [kvm_pr] kvmppc_vcpu_run_pr+0x108/0x270 [kvm_pr] kvmppc_vcpu_run+0x34/0x48 [kvm] kvm_arch_vcpu_ioctl_run+0x340/0x470 [kvm] kvm_vcpu_ioctl+0x338/0x8b8 [kvm] sys_ioctl+0x7c4/0x13e0 system_call_exception+0x128/0x320 system_call_common+0x160/0x2e4 I suspect kvm_pr is the only caller that doesn't already have IRQs disabled, which is why this hasn't been reported previously. Fix it by disabling IRQs in native_hpte_remove(). Fixes: 35159b5 ("powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230517123033.18430-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit e4de205 upstream. After commit 45c7e8a ("MIPS: Remove KVM_TE support") we get a NULL pointer dereference when creating a KVM guest: [ 146.243409] Starting KVM with MIPS VZ extensions [ 149.849151] CPU 3 Unable to handle kernel paging request at virtual address 0000000000000300, epc == ffffffffc06356ec, ra == ffffffffc063568c [ 149.849177] Oops[#1]: [ 149.849182] CPU: 3 PID: 2265 Comm: qemu-system-mip Not tainted 6.4.0-rc3+ #1671 [ 149.849188] Hardware name: THTF CX TL630 Series/THTF-LS3A4000-7A1000-ML4A, BIOS KL4.1F.TF.D.166.201225.R 12/25/2020 [ 149.849192] $ 0 : 0000000000000000 000000007400cce0 0000000000400004 ffffffff8119c740 [ 149.849209] $ 4 : 000000007400cce1 000000007400cce1 0000000000000000 0000000000000000 [ 149.849221] $ 8 : 000000240058bb36 ffffffff81421ac0 0000000000000000 0000000000400dc0 [ 149.849233] $12 : 9800000102a07cc8 ffffffff80e40e38 0000000000000001 0000000000400dc0 [ 149.849245] $16 : 0000000000000000 9800000106cd0000 9800000106cd0000 9800000100cce000 [ 149.849257] $20 : ffffffffc0632b28 ffffffffc05b31b0 9800000100ccca00 0000000000400000 [ 149.849269] $24 : 9800000106cd09ce ffffffff802f69d0 [ 149.849281] $28 : 9800000102a04000 9800000102a07cd0 98000001106a8000 ffffffffc063568c [ 149.849293] Hi : 00000335b2111e66 [ 149.849295] Lo : 6668d90061ae0ae9 [ 149.849298] epc : ffffffffc06356ec kvm_vz_vcpu_setup+0xc4/0x328 [kvm] [ 149.849324] ra : ffffffffc063568c kvm_vz_vcpu_setup+0x64/0x328 [kvm] [ 149.849336] Status: 7400cce3 KX SX UX KERNEL EXL IE [ 149.849351] Cause : 1000000c (ExcCode 03) [ 149.849354] BadVA : 0000000000000300 [ 149.849357] PrId : 0014c004 (ICT Loongson-3) [ 149.849360] Modules linked in: kvm nfnetlink_queue nfnetlink_log nfnetlink fuse sha256_generic libsha256 cfg80211 rfkill binfmt_misc vfat fat snd_hda_codec_hdmi input_leds led_class snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer snd serio_raw xhci_pci radeon drm_suballoc_helper drm_display_helper xhci_hcd ip_tables x_tables [ 149.849432] Process qemu-system-mip (pid: 2265, threadinfo=00000000ae2982d2, task=0000000038e09ad4, tls=000000ffeba16030) [ 149.849439] Stack : 9800000000000003 9800000100ccca00 9800000100ccc000 ffffffffc062cef4 [ 149.849453] 9800000102a07d18 c89b63a7ab338e00 0000000000000000 ffffffff811a0000 [ 149.849465] 0000000000000000 9800000106cd0000 ffffffff80e59938 98000001106a8920 [ 149.849476] ffffffff80e57f30 ffffffffc062854c ffffffff811a0000 9800000102bf4240 [ 149.849488] ffffffffc05b0000 ffffffff80e3a798 000000ff78000000 000000ff78000010 [ 149.849500] 0000000000000255 98000001021f7de0 98000001023f0078 ffffffff81434000 [ 149.849511] 0000000000000000 0000000000000000 9800000102ae0000 980000025e92ae28 [ 149.849523] 0000000000000000 c89b63a7ab338e00 0000000000000001 ffffffff8119dce0 [ 149.849535] 000000ff78000010 ffffffff804f3d3c 9800000102a07eb0 0000000000000255 [ 149.849546] 0000000000000000 ffffffff8049460c 000000ff78000010 0000000000000255 [ 149.849558] ... [ 149.849565] Call Trace: [ 149.849567] [<ffffffffc06356ec>] kvm_vz_vcpu_setup+0xc4/0x328 [kvm] [ 149.849586] [<ffffffffc062cef4>] kvm_arch_vcpu_create+0x184/0x228 [kvm] [ 149.849605] [<ffffffffc062854c>] kvm_vm_ioctl+0x64c/0xf28 [kvm] [ 149.849623] [<ffffffff805209c0>] sys_ioctl+0xc8/0x118 [ 149.849631] [<ffffffff80219eb0>] syscall_common+0x34/0x58 The root cause is the deletion of kvm_mips_commpage_init() leaves vcpu ->arch.cop0 NULL. So fix it by making cop0 from a pointer to an embedded object. Fixes: 45c7e8a ("MIPS: Remove KVM_TE support") Cc: stable@vger.kernel.org Reported-by: Yu Zhao <yuzhao@google.com> Suggested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit fcced95 upstream. PAGE_ALIGN(x) macro gives the next highest value which is multiple of pagesize. But if x is already page aligned then it simply returns x. So, if x passed is 0 in dax_zero_range() function, that means the length gets passed as 0 to ->iomap_begin(). In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on here in ext2_get_blocks(). BUG_ON(maxblocks == 0); Instead we should be calling dax_truncate_page() here which takes care of it. i.e. it only calls dax_zero_range if the offset is not page/block aligned. This can be easily triggered with following on fsdax mounted pmem device. dd if=/dev/zero of=file count=1 bs=512 truncate -s 0 file [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff) [93.793207] ------------[ cut here ]------------ [93.795102] kernel BUG at fs/ext2/inode.c:637! [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 linux-sunxi#139 [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610 <...> [93.835298] Call Trace: [93.836253] <TASK> [93.837103] ? lock_acquire+0xf8/0x110 [93.838479] ? d_lookup+0x69/0xd0 [93.839779] ext2_iomap_begin+0xa7/0x1c0 [93.841154] iomap_iter+0xc7/0x150 [93.842425] dax_zero_range+0x6e/0xa0 [93.843813] ext2_setsize+0x176/0x1b0 [93.845164] ext2_setattr+0x151/0x200 [93.846467] notify_change+0x341/0x4e0 [93.847805] ? lock_acquire+0xf8/0x110 [93.849143] ? do_truncate+0x74/0xe0 [93.850452] ? do_truncate+0x84/0xe0 [93.851739] do_truncate+0x84/0xe0 [93.852974] do_sys_ftruncate+0x2b4/0x2f0 [93.854404] do_syscall_64+0x3f/0x90 [93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc CC: stable@vger.kernel.org Fixes: 2aa3048 ("iomap: switch iomap_zero_range to use iomap_iter") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 66843b1 upstream. Since commit 096b52f ("perf: RISC-V: throttle perf events") the perf_sample_event_took() function was added to report time spent in overflow interrupts. If the interrupt takes too long, the perf framework will lower the sysctl_perf_event_sample_rate and max_samples_per_tick. When hwc->interrupts is larger than max_samples_per_tick, the hwc->interrupts will be set to MAX_INTERRUPTS, and events will be throttled within the __perf_event_account_interrupt() function. However, the RISC-V PMU driver doesn't call riscv_pmu_stop() to update the PERF_HES_STOPPED flag after perf_event_overflow() in pmu_sbi_ovf_handler() function to avoid throttling. When the perf framework unthrottled the event in the timer interrupt handler, it triggers riscv_pmu_start() function and causes a WARN_ON_ONCE() warning, as shown below: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e Modules linked in: CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1 Hardware name: SiFive (DT) epc : riscv_pmu_start+0x7c/0x8e ra : riscv_pmu_start+0x28/0x8e epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0 gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0 t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720 s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000 a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030 s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00 s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000 s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340 s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796 t5 : 0000000700000000 t6 : ffffaf8005269870 status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174 [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c [<ffffffff800b5640>] update_process_times+0x9a/0xba [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66 [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0 [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4 [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42 [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2 [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36 After referring other PMU drivers like Arm, Loongarch, Csky, and Mips, they don't call *_pmu_stop() to update with PERF_HES_STOPPED flag after perf_event_overflow() function nor do they add PERF_HES_STOPPED flag checking in *_pmu_start() which don't cause this warning. Thus, it's recommended to remove this unnecessary check in riscv_pmu_start() function to prevent this warning. Signed-off-by: Eric Lin <eric.lin@sifive.com> Link: https://lore.kernel.org/r/20230710154328.19574-1-eric.lin@sifive.com Fixes: 096b52f ("perf: RISC-V: throttle perf events") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 5f81018 upstream. While running bpf selftests it's possible to get following fault: general protection fault, probably for non-canonical address \ 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI ... Call Trace: <TASK> fprobe_handler+0xc1/0x270 ? __pfx_bpf_testmod_init+0x10/0x10 ? __pfx_bpf_testmod_init+0x10/0x10 ? bpf_fentry_test1+0x5/0x10 ? bpf_fentry_test1+0x5/0x10 ? bpf_testmod_init+0x22/0x80 ? do_one_initcall+0x63/0x2e0 ? rcu_is_watching+0xd/0x40 ? kmalloc_trace+0xaf/0xc0 ? do_init_module+0x60/0x250 ? __do_sys_finit_module+0xac/0x120 ? do_syscall_64+0x37/0x90 ? entry_SYSCALL_64_after_hwframe+0x72/0xdc </TASK> In unregister_fprobe function we can't release fp->rethook while it's possible there are some of its users still running on another cpu. Moving rethook_free call after fp->ops is unregistered with unregister_ftrace_function call. Link: https://lore.kernel.org/all/20230615115236.3476617-1-jolsa@kernel.org/ Fixes: 5b0ab78 ("fprobe: Add exit_handler support") Cc: stable@vger.kernel.org Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
codekipper
pushed a commit
that referenced
this pull request
Jul 26, 2023
commit 20fce50 upstream. System crash when qla2x00_start_sp(sp) returns error code EGAIN and wake_up gets called for uninitialized wait queue sp->nvme_ls_waitq. qla2xxx [0000:37:00.1]-2121:5: Returning existing qpair of ffff8ae2c0513400 for idx=0 qla2xxx [0000:37:00.1]-700e:5: qla2x00_start_sp failed = 11 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc] RIP: 0010:__wake_up_common+0x4c/0x190 RSP: 0018:ffff95f3e0cb7cd0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff8b08d3b26328 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8b08d3b26320 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffe8 R10: 0000000000000000 R11: ffff95f3e0cb7a60 R12: ffff95f3e0cb7d20 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8b2fdf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000002f1e410002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __wake_up_common_lock+0x7c/0xc0 qla_nvme_ls_req+0x355/0x4c0 [qla2xxx] ? __nvme_fc_send_ls_req+0x260/0x380 [nvme_fc] ? nvme_fc_send_ls_req.constprop.42+0x1a/0x45 [nvme_fc] ? nvme_fc_connect_ctrl_work.cold.63+0x1e3/0xa7d [nvme_fc] Remove unused nvme_ls_waitq wait queue. nvme_ls_waitq logic was removed previously in the commits tagged Fixed: below. Fixes: 219d27d ("scsi: qla2xxx: Fix race conditions in the code for aborting SCSI commands") Fixes: 5621b0d ("scsi: qla2xxx: Simpify unregistration of FC-NVMe local/remote ports") Cc: stable@vger.kernel.org Signed-off-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20230615074633.12721-1-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.