Abhishek-Sahu/…
Commits on Jan 24, 2022
-
vfio/pci: add the support for PCI D3cold state
Currently, if the runtime power management is enabled for vfio-pci device in the guest OS, then guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state which will help in saving maximum power. 1. Since D3cold state can't be achieved by writing PCI standard PM config registers, so this patch adds a new IOCTL which change the PCI device from D3hot to D3cold state and then D3cold to D0 state. 2. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS makes the _OFF call during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then do the D3cold related IOCTL in the vfio driver. 3. The vfio driver uses runtime PM framework to achieve the D3cold state. For the D3cold transition, decrement the usage count and during D0 transition increment the usage count. 4. For D3cold, the device current power state should be D3hot. Then during runtime suspend, the pci_platform_power_transition() is required for D3cold state. If the D3cold state is not supported, then the device will still be in D3hot state. But with the runtime PM, the root port can now also go into suspended state. 5. For most of the systems, the D3cold is supported at the root port level. So, when root port will transition to D3cold state, then the vfio PCI device will go from D3hot to D3cold state during its runtime suspend. If root port does not support D3cold, then the root will go into D3hot state. 6. The runtime suspend callback can now happen for 2 cases: there is no user of vfio device and the case where user has initiated D3cold. The 'runtime_suspend_pending' flag can help to distinguish this case. 7. There are cases where guest has put PCI device into D3cold state and then on the host side, user has run lspci or any other command which requires access of the PCI config register. In this case, the kernel runtime PM framework will resume the PCI device internally, read the config space and put the device into D3cold state again. Some PCI device needs the SW involvement before going into D3cold state. For the first D3cold state, the driver running in guest side does the SW side steps. But the second D3cold transition will be without guest driver involvement. So, prevent this second d3cold transition by incrementing the device usage count. This will make the device unnecessary in D0 but it's better than failure. In future, we can some mechanism by which we can forward these wake-up request to guest and then the mentioned case can be handled also. 8. In D3cold, all kind of BAR related access needs to be disabled like D3hot. Additionally, the config space will also be disabled in D3cold state. To prevent access of config space in the D3cold state, increment the runtime PM usage count before doing any config space access. Also, most of the IOCTLs do the config space access, so maintain one safe list and skip the resume only for these safe IOCTLs alone. For other IOCTLs, the runtime PM usage count will be incremented first. 9. Now, runtime suspend/resume callbacks need to get the vdev reference which can be obtained by dev_get_drvdata(). Currently, the dev_set_drvdata() is being set after returning from vfio_pci_core_register_device(). The runtime callbacks can come anytime after enabling runtime PM so dev_set_drvdata() must happen before that. We can move dev_set_drvdata() inside vfio_pci_core_register_device() itself. 10. The vfio device user can close the device after putting the device into runtime suspended state so inside vfio_pci_core_disable(), increment the runtime PM usage count. 11. Runtime PM will be possible only if CONFIG_PM is enabled on the host. So, the IOCTL related code can be put under CONFIG_PM Kconfig. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> -
vfio/pci: Invalidate mmaps and block the access in D3hot power state
According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error needs to be returned for all kinds of BAR related access (memory, IO and ROM). Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. For the few cases, this 'memory_lock' will be already acquired by callers so introduce a separate function vfio_pci_set_power_state_locked(). The original vfio_pci_set_power_state() now contains the code to do the locking related operations. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> -
vfio/pci: fix memory leak during D3hot to D0 tranistion
If needs_pm_restore is set (PCI device does not have support for no soft reset), then the current PCI state will be saved during D0->D3hot transition and same will be restored back during D3hot->D0 transition. For saving the PCI state locally, pci_store_saved_state() is being used and the pci_load_and_free_saved_state() will free the allocated memory. But for reset related IOCTLs, vfio driver calls PCI reset related API's which will internally change the PCI power state back to D0. So, when the guest resumes, then it will get the current state as D0 and it will skip the call to vfio_pci_set_power_state() for changing the power state to D0 explicitly. In this case, the memory pointed by pm_save will never be freed. Also, in malicious sequence, the state changing to D3hot followed by VFIO_DEVICE_RESET/VFIO_DEVICE_PCI_HOT_RESET can be run in loop and it can cause an OOM situation. This patch stores the power state locally and uses the same for comparing the current power state. For the places where D0 transition can happen, call vfio_pci_set_power_state() to transition to D0 state. Since the vfio power state is still D3hot, so this D0 transition will help in running the logic required from D3hot->D0 transition. Also, to prevent any miss during future development to detect this condition, this patch puts a check and frees the memory after printing warning. This locally saved power state will help in subsequent patches also. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
-
vfio/pci: virtualize PME related registers bits and initialize to zero
If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
-
vfio/pci: register vfio-pci driver with runtime PM framework
Currently, there is very limited power management support available in the upstream vfio-pci driver. If there is no user of vfio-pci device, then the PCI device will be moved into D3Hot state by writing directly into PCI PM registers. This D3Hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio-pci driver with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to register the runtime suspend/resume callbacks. For vfio-pci based driver, these callback routines can be stubbed in this patch since the vfio-pci driver is not doing the PCI device initialization. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks. 2. Inside pci_reset_bus(), all the devices in bus/slot will be moved out of D0 state. This state change to D0 can happen directly without going through the runtime PM framework. So if runtime PM is enabled, then pm_runtime_resume() makes the runtime state active. Since the PCI device power state is already D0, so it should return early when it tries to change the state with pci_set_power_state(). Then pm_request_idle() can be used which will internally check for device usage count and will move the device again into the low power state. 3. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 4. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3Cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3Cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 5. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 6. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Commits on Dec 21, 2021
-
vfio/iommu_type1: replace kfree with kvfree
Variables allocated by kvzalloc should not be freed by kfree. Because they may be allocated by vmalloc. So we replace kfree with kvfree here. Fixes: d6a4c18 ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Jiacheng Shi <billsjc@sjtu.edu.cn> Link: https://lore.kernel.org/r/20211212091600.2560-1-billsjc@sjtu.edu.cn Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
vfio/pci: Resolve sparse endian warnings in IGD support
Sparse warns: sparse warnings: (new ones prefixed by >>) >> drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned short [addressable] [usertype] val @@ got restricted __le16 [usertype] @@ drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: expected unsigned short [addressable] [usertype] val drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: got restricted __le16 [usertype] >> drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned int [addressable] [usertype] val @@ got restricted __le32 [usertype] @@ drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: expected unsigned int [addressable] [usertype] val drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: got restricted __le32 [usertype] drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned short [addressable] [usertype] val @@ got restricted __le16 [usertype] @@ drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: expected unsigned short [addressable] [usertype] val drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: got restricted __le16 [usertype] These are due to trying to use an unsigned to store the result of a cpu_to_leXX() conversion. These are small variables, so pointer tricks are wasteful and casting just generates different sparse warnings. Store to and copy results from a separate little endian variable. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/202111290026.O3vehj03-lkp@intel.com/ Link: https://lore.kernel.org/r/163840226123.138003.7668320168896210328.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Commits on Dec 19, 2021
-
-
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini: "Two small fixes, one of which was being worked around in selftests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Retry page fault if MMU reload is pending and root has no sp KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
-
Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block
Pull block revert from Jens Axboe: "It turns out that the fix for not hammering on the delayed work timer too much caused a performance regression for BFQ, so let's revert the change for now. I've got some ideas on how to fix it appropriately, but they should wait for 5.17" * tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block: Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
-
Merge tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/…
…linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it is restored to its reset state - Mask MSI-X vectors late on the init path in order to handle out-of-spec Marvell NVME devices which apparently look at the MSI-X mask even when MSI-X is disabled * tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error PCI/MSI: Mask MSI-X vectors only on success
-
Merge tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/s…
…cm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never positive * tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Really make sure wall_to_monotonic isn't positive
-
Merge tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/…
…scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix the rtmutex condition checking when the optimistic spinning of a waiter needs to be terminated * tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
-
Merge tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm…
…/linux/kernel/git/tip/tip Pull signal handlign fix from Borislav Petkov: - Prevent lock contention on the new sigaltstack lock on the common-case path, when no changes have been made to the alternative signal stack. * tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: signal: Skip the altstack update when not needed
-
Merge tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/k…
…ernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: - only enable pci_remap_iospace() for Ralink devices * tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Only define pci_remap_iospace() for Ralink
-
Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kern…
…el/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix a recently introduced oops at boot on 85xx in some configurations. Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX. Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni" * tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/module_64: Fix livepatching for RO modules powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
-
Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench…
…/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable" * tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: sanitize multiple delimiters in prepath cifs: ignore resource_id while getting fscache super cookie
-
KVM: x86: Retry page fault if MMU reload is pending and root has no sp
Play nice with a NULL shadow page when checking for an obsolete root in the page fault handler by flagging the page fault as stale if there's no shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending. Invalidating memslots, which is the only case where _all_ roots need to be reloaded, requests all vCPUs to reload their MMUs while holding mmu_lock for lock. The "special" roots, e.g. pae_root when KVM uses PAE paging, are not backed by a shadow page. Running with TDP disabled or with nested NPT explodes spectaculary due to dereferencing a NULL shadow page pointer. Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the root. Zapping shadow pages in response to guest activity, e.g. when the guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current vCPU isn't using the affected root. I.e. KVM_REQ_MMU_RELOAD can be seen with a completely valid root shadow page. This is a bit of a moot point as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will be cleaned up in the future. Fixes: a955cad ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211209060552.2956723-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible …
…CPUIDs Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend on guest visible CPUIDs and (incorrect) KVM logic implementing it is about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden and causes test to fail. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: feb627e ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -
KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA3…
…2_PERF_CAPABILITIES The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should not depend on guest visible CPUID entries, even if just to allow creating/restoring guest MSRs and CPUIDs in any sequence. Fixes: 27461da ("KVM: x86/pmu: Support full width counting") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
This reverts commit cb2ac29. Alex and the kernel test robot report that this causes a significant performance regression with BFQ. I can reproduce that result, so let's revert this one as we're close to -rc6 and we there's no point in trying to rush a fix. Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/ Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/ Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commits on Dec 18, 2021
-
Merge tag 'tty-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel…
…/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are two small tty/serial fixes for 5.16-rc6. They include: - n_hdlc fix for syzbot reported problem that you were previously copied on. - 8250_fintek driver fix that resolved a console problem by removing a previous change. Both have been in linux-next with no reported issues" * tag 'tty-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_fintek: Fix garbled text for console tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous -
Merge tag 'usb-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel…
…/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of small USB driver fixes for reported problems. They include: - dwc2 driver fixes - xhci driver fixes - cdnsp driver fixes - typec driver fix - gadget u_ether driver fix - new quirk additions - usb gadget endpoint calculation fix - usb serial new device ids - revert of a xhci-dbg change that broke early debug booting All changes, except for the revert, have been in linux-next with no reported problems. The revert was from yesterday, and it was reported by the developers affected that it resolved their problem" * tag 'usb-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Revert "usb: early: convert to readl_poll_timeout_atomic()" usb: typec: tcpm: fix tcpm unregister port but leave a pending timer usb: cdnsp: Fix lack of spin_lock_irqsave/spin_lock_restore USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04) usb: xhci: Extend support for runtime power management for AMD's Yellow carp. usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe USB: gadget: bRequestType is a bitfield, not a enum USB: serial: option: add Telit FN990 compositions USB: serial: cp210x: fix CP2105 GPIO registration usb: cdnsp: Fix incorrect status for control request usb: cdnsp: Fix issue in cdnsp_log_ep trace event usb: cdnsp: Fix incorrect calling of cdnsp_died function usb: xhci-mtk: fix list_del warning when enable list debug usb: gadget: u_ether: fix race in setting MAC address in setup phase
-
Merge tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git.kernel…
….org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix segfaults in 'perf inject' related to usage of unopened files - The return value of hashmap__new() should be checked using IS_ERR() * tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf inject: Fix segfault due to perf_data__fd() without open perf inject: Fix segfault due to close without open perf expr: Fix missing check for return value of hashmap__new()
-
perf inject: Fix segfault due to perf_data__fd() without open
The fixed commit attempts to get the output file descriptor even if the file was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. __GI___fileno (fp=0x0) at fileno.c:35 35 fileno.c: No such file or directory. (gdb) bt #0 __GI___fileno (fp=0x0) at fileno.c:35 #1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72 #2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69 #3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017 #4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313 #5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 torvalds#6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 torvalds#7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539 (gdb) Fixes: 0ae0389 ("perf tools: Pass a fd to perf_file_header__read_pipe()") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf inject: Fix segfault due to close without open
The fixed commit attempts to close inject.output even if it was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 48 iofclose.c: No such file or directory. (gdb) bt #0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 #1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376 #2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085 #3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313 #4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 #5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 torvalds#6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539 (gdb) Fixes: 02e6246 ("perf inject: Close inject.output on exit") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
perf expr: Fix missing check for return value of hashmap__new()
The hashmap__new() function may return ERR_PTR(-ENOMEM) when malloc() fails, add IS_ERR() checking for ctx->ids. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211212062504.25841-1-linmq006@gmail.com [ s/kfree()/free()/ and add missing linux/err.h include ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Optimistic spinning needs to be terminated when the spinning waiter is not longer the top waiter on the lock, but the condition is negated. It terminates if the waiter is the top waiter, which is defeating the whole purpose. Fixes: c3123c4 ("locking/rtmutex: Dont dereference waiter lockless") Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com
-
Merge tag 'libata-5.16-rc6' of git://git.kernel.org/pub/scm/linux/ker…
…nel/git/dlemoal/libata Pull libata fix from Damien Le Moal: "A single fix for this cycle: - Check that ATA16 passthrough commands that do not transfer any data have a DMA direction set to DMA_NONE (From George)" * tag 'libata-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: libata: if T_LENGTH is zero, dma direction should be DMA_NONE -
Merge tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/ker…
…nel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: "One fix and one trivial update for rc6: - Add MODULE_ALIAS_FS to get automatic module loading on mount (Naohiro) - Update Damien's email address in the MAINTAINERS file (me)" * tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: MAITAINERS: Change zonefs maintainer email address zonefs: add MODULE_ALIAS_FS -
cifs: sanitize multiple delimiters in prepath
mount.cifs can pass a device with multiple delimiters in it. This will cause rename(2) to fail with ENOENT. V2: - Make sanitize_path more readable. - Fix multiple delimiters between UNC and prepath. - Avoid a memory leak if a bad user starts putting a lot of delimiters in the path on purpose. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2031200 Fixes: 24e0a1e ("cifs: switch to new mount api") Cc: stable@vger.kernel.org # 5.11+ Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> -
cifs: ignore resource_id while getting fscache super cookie
We have a cyclic dependency between fscache super cookie and root inode cookie. The super cookie relies on tcon->resource_id, which gets populated from the root inode number. However, fetching the root inode initializes inode cookie as a child of super cookie, which is yet to be populated. resource_id is only used as auxdata to check the validity of super cookie. We can completely avoid setting resource_id to remove the circular dependency. Since vol creation time and vol serial numbers are used for auxdata, we should be fine. Additionally, there will be auxiliary data check for each inode cookie as well. Fixes: 5bf91ef ("cifs: wait for tcon resource_id before getting fscache super") CC: David Howells <dhowells@redhat.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Commits on Dec 17, 2021
-
timekeeping: Really make sure wall_to_monotonic isn't positive
Even after commit e1d7ba8 ("time: Always make sure wall_to_monotonic isn't positive") it is still possible to make wall_to_monotonic positive by running the following code: int main(void) { struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); time.tv_nsec = 0; clock_settime(CLOCK_REALTIME, &time); return 0; } The reason is that the second parameter of timespec64_compare(), ts_delta, may be unnormalized because the delta is calculated with an open coded substraction which causes the comparison of tv_sec to yield the wrong result: wall_to_monotonic = { .tv_sec = -10, .tv_nsec = 900000000 } ts_delta = { .tv_sec = -9, .tv_nsec = -900000000 } That makes timespec64_compare() claim that wall_to_monotonic < ts_delta, but actually the result should be wall_to_monotonic > ts_delta. After normalization, the result of timespec64_compare() is correct because the tv_sec comparison is not longer misleading: wall_to_monotonic = { .tv_sec = -10, .tv_nsec = 900000000 } ts_delta = { .tv_sec = -10, .tv_nsec = 100000000 } Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the issue. Fixes: e1d7ba8 ("time: Always make sure wall_to_monotonic isn't positive") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
-
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…
…it/jejb/scsi Pull SCSI fix from James Bottomley: "One driver fix: the pm8001 has never actually worked on a system with an IOMMU and this fixes that use case" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: pm8001: Fix phys_to_virt() usage on dma_addr_t
-
Merge tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/ke…
…rnel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes, almost all error handling one-liners and for stable. - regression fix in directory logging items - regression fix of extent buffer status bits handling after an error - fix memory leak in error handling path in tree-log - fix freeing invalid anon device number when handling errors during subvolume creation - fix warning when freeing leaf after subvolume creation failure - fix missing blkdev put in device scan error handling - fix invalid delayed ref after subvolume creation failure" * tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix missing blkdev_put() call in btrfs_scan_one_device() btrfs: fix warning when freeing leaf after subvolume creation failure btrfs: fix invalid delayed ref after subvolume creation failure btrfs: check WRITE_ERR when trying to read an extent buffer btrfs: fix missing last dir item offset update when logging directory btrfs: fix double free of anon_dev after failure to create subvolume btrfs: fix memory leak in __add_inode_ref()