Florian-Fainel…
Commits on Mar 19, 2021
-
MIPS: Add support for CONFIG_DEBUG_VIRTUAL
Provide hooks to intercept bad usages of virt_to_phys() and __pa_symbol() throughout the kernel. To make this possible, we need to rename the current implement of virt_to_phys() into __virt_to_phys_nodebug() and wrap it around depending on CONFIG_DEBUG_VIRTUAL. A similar thing is needed for __pa_symbol() which is now aliased to __phys_addr_symbol() whose implementation is either the direct return of RELOC_HIDE or goes through the debug version. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
-
Merge tag 'drm-fixes-2021-03-19' of git://anongit.freedesktop.org/drm…
…/drm Pull drm fixes from Dave Airlie: "Regular fixes pull, pretty small set of fixes, a couple of i915 and amdgpu, one ttm, one nouveau and one omap. Probably smaller than usual for this time, so we'll see if something pops up next week or if this will continue to stay small. Summary: ttm: - Make ttm_bo_unpin() not wraparound on too many unpins omap: - Fix coccicheck warning in omap amdgpu: - DCN 3.0 gamma fixes - DCN 2.1 corrupt screen fix i915: - Workaround async flip + VT-d frame corruption on HSW/BDW - Fix NMI watchdog crash due to uninitialized OA buffer use on gen12+ nouveau: - workaround oops with bo syncing" * tag 'drm-fixes-2021-03-19' of git://anongit.freedesktop.org/drm/drm: nouveau: Skip unvailable ttm page entries drm/amd/display: Remove MPC gamut remap logic for DCN30 drm/amd/display: Correct algorithm for reversed gamma drm/omap: dsi: fix unsigned expression compared with zero i915/perf: Start hrtimer only if sampling the OA buffer drm/i915: Workaround async flip + VT-d corruption on HSW/BDW drm/amd/display: Copy over soc values before bounding box creation drm/ttm: make ttm_bo_unpin more defensive
-
nouveau: Skip unvailable ttm page entries
Starting with commit f295c8c ("drm/nouveau: fix dma syncing warning with debugging on.") the following oops occures: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 1013 Comm: Xorg.bin Tainted: G E 5.11.0-desktop-rc0+ #2 Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau] Call Trace: nouveau_bo_validate+0x5d/0x80 [nouveau] nouveau_gem_ioctl_pushbuf+0x662/0x1120 [nouveau] ? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau] drm_ioctl_kernel+0xa6/0xf0 [drm] drm_ioctl+0x1f4/0x3a0 [drm] ? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau] nouveau_drm_ioctl+0x50/0xa0 [nouveau] __x64_sys_ioctl+0x7e/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae ---[ end trace ccfb1e7f4064374f ]--- RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau] The underlying problem is not introduced by the commit, yet it uncovered the underlying issue. The cited commit relies on valid pages. This is not given for due to some bugs. For now, just warn and work around the issue by just ignoring the bad ttm objects. Below is some debug info gathered while debugging this issue: nouveau 0000:01:00.0: DRM: ttm_dma->num_pages: 2048 nouveau 0000:01:00.0: DRM: ttm_dma->pages is NULL nouveau 0000:01:00.0: DRM: ttm_dma: 00000000e96058e7 nouveau 0000:01:00.0: DRM: ttm_dma->page_flags: nouveau 0000:01:00.0: DRM: ttm_dma: Populated: 1 nouveau 0000:01:00.0: DRM: ttm_dma: No Retry: 0 nouveau 0000:01:00.0: DRM: ttm_dma: SG: 256 nouveau 0000:01:00.0: DRM: ttm_dma: Zero Alloc: 0 nouveau 0000:01:00.0: DRM: ttm_dma: Swapped: 0 Signed-off-by: Tobias Klausmann <tobias.klausmann@freenet.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210313222159.3346-1-tobias.klausmann@freenet.de
-
Merge tag 'drm-intel-fixes-2021-03-18' of git://anongit.freedesktop.o…
…rg/drm/drm-intel into drm-fixes drm/i915 fixes for v5.12-rc4: - Workaround async flip + VT-d frame corruption on HSW/BDW - Fix NMI watchdog crash due to uninitialized OA buffer use on gen12+ Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87blbg8y5t.fsf@intel.com
-
Merge tag 'amd-drm-fixes-5.12-2021-03-18' of https://gitlab.freedeskt…
…op.org/agd5f/linux into drm-fixes amdgpu: - DCN 3.0 gamma fixes - DCN 2.1 corrupt screen fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210318042858.3810-1-alexander.deucher@amd.com
-
Merge tag 'drm-misc-fixes-2021-03-18' of git://anongit.freedesktop.or…
…g/drm/drm-misc into drm-fixes drm-misc-fixes for v5.12-rc4: - Make ttm_bo_unpin() not wraparound on too many unpins. - Fix coccicheck warning in omap. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a0e13bbb-6ba6-ff24-4db8-0e02e605de18@linux.intel.com
Commits on Mar 18, 2021
-
Merge tag 'for-5.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/ke…
…rnel/git/kdave/linux Pull btrfs fixes from David Sterba: "There are still regressions being found and fixed in the zoned mode and subpage code, the rest are fixes for bugs reported by users. Regressions: - subpage block support: - readahead works on the proper block size - fix last page zeroing - zoned mode: - linked list corruption for tree log Fixes: - qgroup leak after falloc failure - tree mod log and backref resolving: - extent buffer cloning race when resolving backrefs - pin deleted leaves with active tree mod log users - drop debugging flag from slab cache" * tag 'for-5.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: always pin deleted leaves when there are active tree mod log users btrfs: fix race when cloning extent buffer during rewind of an old root btrfs: fix slab cache flags for free space tree bitmap btrfs: subpage: make readahead work properly btrfs: subpage: fix wild pointer access during metadata read failure btrfs: zoned: fix linked list corruption after log root tree allocation failure btrfs: fix qgroup data rsv leak caused by falloc failure btrfs: track qgroup released data in own variable in insert_prealloc_file_extent btrfs: fix wrong offset to zero out range beyond i_size -
Merge tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson: - Fix 32-bit issue with new unmap-all flag (Steve Sistare) - Various Kconfig changes for better coverage (Jason Gunthorpe) - Fix to batch pinning support (Daniel Jordan) * tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio: vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external() vfio: Depend on MMU ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM vfio: IOMMU_API should be selected vfio/type1: fix unmap all on ILP32
-
Merge tag 'xfs-5.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/x…
…fs-linux Pull xfs fixes from Darrick Wong: "A couple of minor corrections for the new idmapping functionality, and a fix for a theoretical hang that could occur if we decide to abort a mount after dirtying the quota inodes. Summary: - Fix quota accounting on creat() when id mapping is enabled - Actually reclaim dirty quota inodes when mount fails - Typo fixes for documentation - Restrict both bulkstat calls on idmapped/namespaced mounts" * tag 'xfs-5.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: also reject BULKSTAT_SINGLE in a mount user namespace docs: ABI: Fix the spelling oustanding to outstanding in the file sysfs-fs-xfs xfs: force log and push AIL to clear pinned inodes when aborting mount xfs: fix quota accounting when a mount is idmapped
-
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/gi…
…t/mst/vhost Pull virtio fixes from Michael Tsirkin: "Some fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() fails vhost-vdpa: fix use-after-free of v->config_ctx vhost: Fix vhost_vq_reset() vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation vdpa_sim: Skip typecasting from void* virtio: remove export for virtio_config_{enable, disable} virtio-mmio: Use to_virtio_mmio_device() to simply code vdpa: set the virtqueue num during register -
Merge branch 'iomap-5.12-fixes' of git://git.kernel.org/pub/scm/fs/xf…
…s/xfs-linux Pull iomap fix from Darrick Wong: "A single fix to the iomap code which fixes some drama when someone gives us a {de,ma}liciously fragmented swap file" * 'iomap-5.12-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate -
drm/amd/display: Remove MPC gamut remap logic for DCN30
[Why?] Should only reroute gamut remap to mpc unless 3D LUT is not used and all planes are using the same src->dest. [How?] Remove DCN30 specific logic for rerouting gamut remap to mpc. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1513 Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Acked-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Dillon Varone authored and Alex Deucher committedMar 18, 2021 -
drm/amd/display: Correct algorithm for reversed gamma
[Why] DCN30 needs to correctly program reversed gamma curve, which DCN20 already has. Also needs to fix a bug that 252-255 values are clipped. [How] Apply two fixes into DCN30. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1513 Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Calvin Hou <Calvin.Hou@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Acked-by: Solomon Chiu <solomon.chiu@amd.com> Acked-by: Vladimir Stempen <Vladimir.Stempen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Calvin Hou authored and Alex Deucher committedMar 18, 2021
Commits on Mar 17, 2021
-
module: remove never implemented MODULE_SUPPORTED_DEVICE
MODULE_SUPPORTED_DEVICE was added in pre-git era and never was implemented. We can safely remove it, because the kernel has grown to have many more reliable mechanisms to determine if device is supported or not. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Merge tag 'mips-fixes_5.12_2' of git://git.kernel.org/pub/scm/linux/k…
…ernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Fix for fdt alignment when image is compressed" * tag 'mips-fixes_5.12_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: vmlinux.lds.S: Fix appended dtb not properly aligned
-
Merge tag 'thermal-v5.12-rc4' of git://git.kernel.org/pub/scm/linux/k…
…ernel/git/thermal/linux Pull thermal framework fix from Daniel Lezcano: "Fix NULL pointer access when the cooling device transition stats table failed to allocate due to a big number of states (Manaf Meethalavalappu Pallikunhi)" * tag 'thermal-v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/core: Add NULL pointer check before using cooling device stats
-
drm/omap: dsi: fix unsigned expression compared with zero
r is "u32" always >= 0,mipi_dsi_create_packet may return little than zero. so r < 0 condition is never accessible. Fixes coccicheck warnings: ./drivers/gpu/drm/omapdrm/dss/dsi.c:2155:5-6: WARNING: Unsigned expression compared with zero: r < 0 Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210312071445.1721-1-angkery@163.com
-
i915/perf: Start hrtimer only if sampling the OA buffer
SAMPLE_OA parameter enables sampling of OA buffer and results in a call to init the OA buffer which initializes the OA unit head/tail pointers. The OA_EXPONENT parameter controls the periodicity of the OA reports in the OA buffer and results in starting a hrtimer. Before gen12, all use cases required the use of the OA buffer and i915 enforced this setting when vetting out the parameters passed. In these platforms the hrtimer was enabled if OA_EXPONENT was passed. This worked fine since it was implied that SAMPLE_OA is always passed. With gen12, this changed. Users can use perf without enabling the OA buffer as in OAR use cases. While an OAR use case should ideally not start the hrtimer, we see that passing an OA_EXPONENT parameter will start the hrtimer even though SAMPLE_OA is not specified. This results in an uninitialized OA buffer, so the head/tail pointers used to track the buffer are zero. This itself does not fail, but if we ran a use-case that SAMPLED the OA buffer previously, then the OA_TAIL register is still pointing to an old value. When the timer callback runs, it ends up calculating a wrong/large number of available reports. Since we do a spinlock_irq_save and start processing a large number of reports, NMI watchdog fires and causes a crash. Start the timer only if SAMPLE_OA is specified. v2: - Drop SAMPLE OA check when appending samples (Ashutosh) - Prevent read if OA buffer is not being sampled Fixes: 00a7f0d ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210305210947.58751-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit be0bdd6) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
-
drm/i915: Workaround async flip + VT-d corruption on HSW/BDW
On HSW/BDW with VT-d active the first tile row scanned out after the first async flip of the frame often ends up corrupted. Whether the corruption happens or not depends on the scanline on which the async flip happens, but the behaviour seems very consistent. Ie. the same set of scanlines (which are most scanlines) always show the corruption. And another set of scanlines (far less of them) never shows the corruption. I discovered that disabling the fetch-stride stretching feature cures the corruption. This is some kind of TLB related prefetch thing AFAIK. We already disable it on SNB primary planes due to a documented workaround. The hardware folks indicated that disabling this should be fine, so let's go with that. And while we're here, let's document the relevant bits on all pre-skl platforms. Fixes: 2a636e2 ("drm/i915: Implement async flip for ivb/hsw") Fixes: cda195f ("drm/i915: Implement async flips for bdw") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210220103303.3448-1-ville.syrjala@linux.intel.com Reviewed-by: Karthik B S <karthik.b.s@intel.com> (cherry picked from commit b7a7053) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
-
thermal/core: Add NULL pointer check before using cooling device stats
There is a possible chance that some cooling device stats buffer allocation fails due to very high cooling device max state value. Later cooling device update sysfs can try to access stats data for the same cooling device. It will lead to NULL pointer dereference issue. Add a NULL pointer check before accessing thermal cooling device stats data. It fixes the following bug [ 26.812833] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [ 27.122960] Call trace: [ 27.122963] do_raw_spin_lock+0x18/0xe8 [ 27.122966] _raw_spin_lock+0x24/0x30 [ 27.128157] thermal_cooling_device_stats_update+0x24/0x98 [ 27.128162] cur_state_store+0x88/0xb8 [ 27.128166] dev_attr_store+0x40/0x58 [ 27.128169] sysfs_kf_write+0x50/0x68 [ 27.133358] kernfs_fop_write+0x12c/0x1c8 [ 27.133362] __vfs_write+0x54/0x160 [ 27.152297] vfs_write+0xcc/0x188 [ 27.157132] ksys_write+0x78/0x108 [ 27.162050] ksys_write+0xf8/0x108 [ 27.166968] __arm_smccc_hvc+0x158/0x4b0 [ 27.166973] __arm_smccc_hvc+0x9c/0x4b0 [ 27.186005] el0_svc+0x8/0xc Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/1607367181-24589-1-git-send-email-manafm@codeaurora.org
Commits on Mar 16, 2021
-
MIPS: vmlinux.lds.S: Fix appended dtb not properly aligned
Commit 6654111 ("MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes") changed the alignment from STRUCT_ALIGNMENT bytes to 8 bytes. The commit's message makes it sound like it was actually done on purpose, but this is not the case. The commit was written when raw appended dtb were not aligned at all. The STRUCT_ALIGN() was added a few days before, in commit 7a05293 ("MIPS: boot/compressed: Copy DTB to aligned address"). The true purpose of the commit was not to align specifically to 8 bytes, but to make sure that the generated vmlinux' size was properly padded to the alignment required for DTBs. While the switch to 8-byte alignment worked for vmlinux-appended dtb blobs, it broke vmlinuz-appended dtb blobs, as the decompress routine moves the blob to a STRUCT_ALIGNMENT aligned address. Fix this by changing the raw appended dtb blob alignment from 8 bytes back to STRUCT_ALIGNMENT bytes in vmlinux.lds.S. Fixes: 6654111 ("MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes") Cc: Bjørn Mork <bjorn@mork.no> Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
btrfs: always pin deleted leaves when there are active tree mod log u…
…sers When freeing a tree block we may end up adding its extent back to the free space cache/tree, as long as there are no more references for it, it was created in the current transaction and writeback for it never happened. This is generally fine, however when we have tree mod log operations it can result in inconsistent versions of a btree after unwinding extent buffers with the recorded tree mod log operations. This is because: * We only log operations for nodes (adding and removing key/pointers), for leaves we don't do anything; * This means that we can log a MOD_LOG_KEY_REMOVE_WHILE_FREEING operation for a node that points to a leaf that was deleted; * Before we apply the logged operation to unwind a node, we can have that leaf's extent allocated again, either as a node or as a leaf, and possibly for another btree. This is possible if the leaf was created in the current transaction and writeback for it never started, in which case btrfs_free_tree_block() returns its extent back to the free space cache/tree; * Then, before applying the tree mod log operation, some task allocates the metadata extent just freed before, and uses it either as a leaf or as a node for some btree (can be the same or another one, it does not matter); * After applying the MOD_LOG_KEY_REMOVE_WHILE_FREEING operation we now get the target node with an item pointing to the metadata extent that now has content different from what it had before the leaf was deleted. It might now belong to a different btree and be a node and not a leaf anymore. As a consequence, the results of searches after the unwinding can be unpredictable and produce unexpected results. So make sure we pin extent buffers corresponding to leaves when there are tree mod log users. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: fix race when cloning extent buffer during rewind of an old root
While resolving backreferences, as part of a logical ino ioctl call or fiemap, we can end up hitting a BUG_ON() when replaying tree mod log operations of a root, triggering a stack trace like the following: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1210! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G W 5.11.0-2d11c0084b02-misc-next+ torvalds#89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0 Code: 05 48 8d 74 10 (...) RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20 R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c FS: 00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0 Call Trace: btrfs_search_old_slot+0x265/0x10d0 ? lock_acquired+0xbb/0x600 ? btrfs_search_slot+0x1090/0x1090 ? free_extent_buffer.part.61+0xd7/0x140 ? free_extent_buffer+0x13/0x20 resolve_indirect_refs+0x3e9/0xfc0 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? add_prelim_ref.part.11+0x150/0x150 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? do_raw_spin_unlock+0xa8/0x140 ? rb_insert_color+0x30/0x360 ? prelim_ref_insert+0x12d/0x430 find_parent_nodes+0x5c3/0x1830 ? resolve_indirect_refs+0xfc0/0xfc0 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x160/0x210 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? poison_range+0x38/0x40 ? unpoison_range+0x14/0x40 ? trace_hardirqs_on+0x55/0x120 btrfs_find_all_roots_safe+0x142/0x1e0 ? find_parent_nodes+0x1830/0x1830 ? btrfs_inode_flags_to_xflags+0x50/0x50 iterate_extent_inodes+0x20e/0x580 ? tree_backref_for_extent+0x230/0x230 ? lock_downgrade+0x3d0/0x3d0 ? read_extent_buffer+0xdd/0x110 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? _raw_spin_unlock+0x22/0x30 ? __kasan_check_write+0x14/0x20 iterate_inodes_from_logical+0x129/0x170 ? iterate_inodes_from_logical+0x129/0x170 ? btrfs_inode_flags_to_xflags+0x50/0x50 ? iterate_extent_inodes+0x580/0x580 ? __vmalloc_node+0x92/0xb0 ? init_data_container+0x34/0xb0 ? init_data_container+0x34/0xb0 ? kvmalloc_node+0x60/0x80 btrfs_ioctl_logical_to_ino+0x158/0x230 btrfs_ioctl+0x205e/0x4040 ? __might_sleep+0x71/0xe0 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? getrusage+0x4b6/0x9c0 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __might_fault+0x64/0xd0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __task_pid_nr_ns+0xd3/0x250 ? lock_acquire+0xc7/0x510 ? __fget_files+0x160/0x230 ? __fget_light+0xf2/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fd1976e2427 Code: 00 00 90 48 8b 05 (...) RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427 RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004 RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120 R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004 R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8 Modules linked in: ---[ end trace ec8931a1c36e57be ]--- (gdb) l *(__tree_mod_log_rewind+0x3b1) 0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210). 1205 * the modification. as we're going backwards, we do the 1206 * opposite of each operation here. 1207 */ 1208 switch (tm->op) { 1209 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: 1210 BUG_ON(tm->slot < n); 1211 fallthrough; 1212 case MOD_LOG_KEY_REMOVE_WHILE_MOVING: 1213 case MOD_LOG_KEY_REMOVE: 1214 btrfs_set_node_key(eb, &tm->key, tm->slot); Here's what happens to hit that BUG_ON(): 1) We have one tree mod log user (through fiemap or the logical ino ioctl), with a sequence number of 1, so we have fs_info->tree_mod_seq == 1; 2) Another task is at ctree.c:balance_level() and we have eb X currently as the root of the tree, and we promote its single child, eb Y, as the new root. Then, at ctree.c:balance_level(), we call: tree_mod_log_insert_root(eb X, eb Y, 1); 3) At tree_mod_log_insert_root() we create tree mod log elements for each slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each with a ->logical pointing to ebX->start. These are placed in an array named tm_list. Lets assume there are N elements (N pointers in eb X); 4) Then, still at tree_mod_log_insert_root(), we create a tree mod log element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set to the level of eb X and ->generation set to the generation of eb X; 5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with tm_list as argument. After that, tree_mod_log_free_eb() calls __tree_mod_log_insert() for each member of tm_list in reverse order, from highest slot in eb X, slot N - 1, to slot 0 of eb X; 6) __tree_mod_log_insert() sets the sequence number of each given tree mod log operation - it increments fs_info->tree_mod_seq and sets fs_info->tree_mod_seq as the sequence number of the given tree mod log operation. This means that for the tm_list created at tree_mod_log_insert_root(), the element corresponding to slot 0 of eb X has the highest sequence number (1 + N), and the element corresponding to the last slot has the lowest sequence number (2); 7) Then, after inserting tm_list's elements into the tree mod log rbtree, the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest sequence number, which is N + 2; 8) Back to ctree.c:balance_level(), we free eb X by calling btrfs_free_tree_block() on it. Because eb X was created in the current transaction, has no other references and writeback did not happen for it, we add it back to the free space cache/tree; 9) Later some other task T allocates the metadata extent from eb X, since it is marked as free space in the space cache/tree, and uses it as a node for some other btree; 10) The tree mod log user task calls btrfs_search_old_slot(), which calls get_old_root(), and finally that calls __tree_mod_log_oldest_root() with time_seq == 1 and eb_root == eb Y; 11) First iteration of the while loop finds the tree mod log element with sequence number N + 2, for the logical address of eb Y and of type MOD_LOG_ROOT_REPLACE; 12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out of the loop, and set root_logical to point to tm->old_root.logical which corresponds to the logical address of eb X; 13) On the next iteration of the while loop, the call to tree_mod_log_search_oldest() returns the smallest tree mod log element for the logical address of eb X, which has a sequence number of 2, an operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to the old slot N - 1 of eb X (eb X had N items in it before being freed); 14) We then break out of the while loop and return the tree mod log operation of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of eb X, to get_old_root(); 15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation and set "logical" to the logical address of eb X, which was the old root. We then call tree_mod_log_search() passing it the logical address of eb X and time_seq == 1; 16) Then before calling tree_mod_log_search(), task T adds a key to eb X, which results in adding a tree mod log operation of type MOD_LOG_KEY_ADD to the tree mod log - this is done at ctree.c:insert_ptr() - but after adding the tree mod log operation and before updating the number of items in eb X from 0 to 1... 17) The task at get_old_root() calls tree_mod_log_search() and gets the tree mod log operation of type MOD_LOG_KEY_ADD just added by task T. Then it enters the following if branch: if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) { (...) } (...) Calls read_tree_block() for eb X, which gets a reference on eb X but does not lock it - task T has it locked. Then it clones eb X while it has nritems set to 0 in its header, before task T sets nritems to 1 in eb X's header. From hereupon we use the clone of eb X which no other task has access to; 18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD mod log operation we just got from tree_mod_log_search() in the previous step and the cloned version of eb X; 19) At __tree_mod_log_rewind(), we set the local variable "n" to the number of items set in eb X's clone, which is 0. Then we enter the while loop, and in its first iteration we process the MOD_LOG_KEY_ADD operation, which just decrements "n" from 0 to (u32)-1, since "n" is declared with a type of u32. At the end of this iteration we call rb_next() to find the next tree mod log operation for eb X, that gives us the mod log operation of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence number of N + 1 (steps 3 to 6); 20) Then we go back to the top of the while loop and trigger the following BUG_ON(): (...) switch (tm->op) { case MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n); fallthrough; (...) Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0. Fix this by taking a read lock on the extent buffer before cloning it at ctree.c:get_old_root(). This should be done regardless of the extent buffer having been freed and reused, as a concurrent task might be modifying it (while holding a write lock on it). Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/ Fixes: 834328a ("Btrfs: tree mod log's old roots could still be part of the tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: fix slab cache flags for free space tree bitmap
The free space tree bitmap slab cache is created with SLAB_RED_ZONE but that's a debugging flag and not always enabled. Also the other slabs are created with at least SLAB_MEM_SPREAD that we want as well to average the memory placement cost. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 3acd485 ("btrfs: fix allocation of free space cache v1 bitmap pages") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: David Sterba <dsterba@suse.com>
-
Merge tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux…
…/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a deadlock and a couple of other bugs" * tag 'fuse-fixes-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: 32-bit user space ioctl compat for fuse device virtiofs: Fail dax mount if device does not support it fuse: fix live lock in fuse_iget()
-
Merge tag 'nfsd-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/…
…git/cel/linux Pull nfsd fixes from Chuck Lever: "Miscellaneous NFSD fixes for v5.12-rc" * tag 'nfsd-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: svcrdma: Revert "svcrdma: Reduce Receive doorbell rate" NFSD: fix error handling in NFSv4.0 callbacks NFSD: fix dest to src mount in inter-server COPY Revert "nfsd4: a client's own opens needn't prevent delegations" Revert "nfsd4: remove check_conflicting_opens warning" rpc: fix NULL dereference on kmalloc failure sunrpc: fix refcount leak for rpc auth modules NFSD: Repair misuse of sv_lock in 5.10.16-rt30. nfsd: don't abort copies early fs: nfsd: fix kconfig dependency warning for NFSD_V4 svcrdma: disable timeouts on rdma backchannel nfsd: Don't keep looking up unhashed files in the nfsd file cache
-
vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()
vaddr_get_pfns() now returns the positive number of pfns successfully gotten instead of zero. vfio_pin_page_external() might return 1 to vfio_iommu_type1_pin_pages(), which will treat it as an error, if vaddr_get_pfns() is successful but vfio_pin_page_external() doesn't reach vfio_lock_acct(). Fix it up in vfio_pin_page_external(). Found by inspection. Fixes: be16c1f ("vfio/type1: Change success value of vaddr_get_pfn()") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Message-Id: <20210308172452.38864-1-daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
VFIO_IOMMU_TYPE1 does not compile with !MMU: ../drivers/vfio/vfio_iommu_type1.c: In function 'follow_fault_pfn': ../drivers/vfio/vfio_iommu_type1.c:536:22: error: implicit declaration of function 'pte_write'; did you mean 'vfs_write'? [-Werror=implicit-function-declaration] So require it. Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-02cb5500df6e+78-vfio_no_mmu_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST
CONFIG_VFIO_AMBA has a light use of AMBA, adding some inline fallbacks when AMBA is disabled will allow it to be compiled under COMPILE_TEST and make VFIO easier to maintain. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <3-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM
x86 can build platform bus code too, so vfio-platform and all the platform reset implementations compile successfully on x86. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <2-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
vfio: IOMMU_API should be selected
As IOMMU_API is a kconfig without a description (eg does not show in the menu) the correct operator is select not 'depends on'. Using 'depends on' for this kind of symbol means VFIO is not selectable unless some other random kconfig has already enabled IOMMU_API for it. Fixes: cba3345 ("vfio: VFIO core") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <1-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
vfio/type1: fix unmap all on ILP32
Some ILP32 architectures support mapping a 32-bit vaddr within a 64-bit iova space. The unmap-all code uses 32-bit SIZE_MAX as an upper bound on the extent of the mappings within iova space, so mappings above 4G cannot be found and unmapped. Use U64_MAX instead, and use u64 for size variables. This also fixes a static analysis bug found by the kernel test robot running smatch for ILP32. Fixes: 0f53afa ("vfio/type1: unmap cleanup") Fixes: c196509 ("vfio/type1: implement unmap all") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Message-Id: <1614281102-230747-1-git-send-email-steven.sistare@oracle.com> Link: https://lore.kernel.org/linux-mm/20210222141043.GW2222@kadam Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
fuse: 32-bit user space ioctl compat for fuse device
With a 64-bit kernel build the FUSE device cannot handle ioctl requests coming from 32-bit user space. This is due to the ioctl command translation that generates different command identifiers that thus cannot be used for direct comparisons without proper manipulation. Explicitly extract type and number from the ioctl command to enable 32-bit user space compatibility on 64-bit kernel builds. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-
btrfs: subpage: make readahead work properly
In readahead infrastructure, we are using a lot of hard coded PAGE_SHIFT while we're not doing anything specific to PAGE_SIZE. One of the most affected part is the radix tree operation of btrfs_fs_info::reada_tree. If using PAGE_SHIFT, subpage metadata readahead is broken and does no help reading metadata ahead. Fix the problem by using btrfs_fs_info::sectorsize_bits so that readahead could work for subpage. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
btrfs: subpage: fix wild pointer access during metadata read failure
[BUG] When running fstests for btrfs subpage read-write test, it has a very high chance to crash at generic/475 with the following stack: BTRFS warning (device dm-8): direct IO failed ino 510 rw 1,34817 sector 0xcdf0 len 94208 err no 10 Unable to handle kernel paging request at virtual address ffff80001157e7c0 CPU: 2 PID: 687125 Comm: kworker/u12:4 Tainted: G WC 5.12.0-rc2-custom+ #5 Hardware name: Khadas VIM3 (DT) Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs] pc : queued_spin_lock_slowpath+0x1a0/0x390 lr : do_raw_spin_lock+0xc4/0x11c Call trace: queued_spin_lock_slowpath+0x1a0/0x390 _raw_spin_lock+0x68/0x84 btree_readahead_hook+0x38/0xc0 [btrfs] end_bio_extent_readpage+0x504/0x5f4 [btrfs] bio_endio+0x170/0x1a4 end_workqueue_fn+0x3c/0x60 [btrfs] btrfs_work_helper+0x1b0/0x1b4 [btrfs] process_one_work+0x22c/0x430 worker_thread+0x70/0x3a0 kthread+0x13c/0x140 ret_from_fork+0x10/0x30 Code: 910020e0 8b0200c2 f861d884 aa0203e1 (f8246827) [CAUSE] In end_bio_extent_readpage(), if we hit an error during read, we will handle the error differently for data and metadata. For data we queue a repair, while for metadata, we record the error and let the caller choose what to do. But the code is still using page->private to grab extent buffer, which no longer points to extent buffer for subpage metadata pages. Thus this wild pointer access leads to above crash. [FIX] Introduce a helper, find_extent_buffer_readpage(), to grab extent buffer. The difference against find_extent_buffer_nospinlock() is: - Also handles regular sectorsize == PAGE_SIZE case - No extent buffer refs increase/decrease As extent buffer under IO must have non-zero refs, so this is safe Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>