Skip to content
Permalink
Abhishek-Sahu/…
Switch branches/tags

Commits on Jan 24, 2022

  1. vfio/pci: add the support for PCI D3cold state

    Currently, if the runtime power management is enabled for vfio-pci
    device in the guest OS, then guest OS will do the register write for
    PCI_PM_CTRL register. This write request will be handled in
    vfio_pm_config_write() where it will do the actual register write
    of PCI_PM_CTRL register. With this, the maximum D3hot state can be
    achieved for low power. If we can use the runtime PM framework,
    then we can achieve the D3cold state which will help in saving
    maximum power.
    
    1. Since D3cold state can't be achieved by writing PCI standard
       PM config registers, so this patch adds a new IOCTL which change the
       PCI device from D3hot to D3cold state and then D3cold to D0 state.
    
    2. The hypervisors can implement virtual ACPI methods. For
       example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0
       power resources with _ON/_OFF method, then guest linux OS makes the
       _OFF call during D3cold transition and then _ON during D0 transition.
       The hypervisor can tap these virtual ACPI calls and then do the D3cold
       related IOCTL in the vfio driver.
    
    3. The vfio driver uses runtime PM framework to achieve the
       D3cold state. For the D3cold transition, decrement the usage count and
       during D0 transition increment the usage count.
    
    4. For D3cold, the device current power state should be D3hot.
       Then during runtime suspend, the pci_platform_power_transition() is
       required for D3cold state. If the D3cold state is not supported, then
       the device will still be in D3hot state. But with the runtime PM, the
       root port can now also go into suspended state.
    
    5. For most of the systems, the D3cold is supported at the root
       port level. So, when root port will transition to D3cold state, then
       the vfio PCI device will go from D3hot to D3cold state during its
       runtime suspend. If root port does not support D3cold, then the root
       will go into D3hot state.
    
    6. The runtime suspend callback can now happen for 2 cases: there
       is no user of vfio device and the case where user has initiated
       D3cold. The 'runtime_suspend_pending' flag can help to distinguish
       this case.
    
    7. There are cases where guest has put PCI device into D3cold
       state and then on the host side, user has run lspci or any other
       command which requires access of the PCI config register. In this case,
       the kernel runtime PM framework will resume the PCI device internally,
       read the config space and put the device into D3cold state again. Some
       PCI device needs the SW involvement before going into D3cold state.
       For the first D3cold state, the driver running in guest side does the SW
       side steps. But the second D3cold transition will be without guest
       driver involvement. So, prevent this second d3cold transition by
       incrementing the device usage count. This will make the device
       unnecessary in D0 but it's better than failure. In future, we can some
       mechanism by which we can forward these wake-up request to guest and
       then the mentioned case can be handled also.
    
    8. In D3cold, all kind of BAR related access needs to be disabled
       like D3hot. Additionally, the config space will also be disabled in
       D3cold state. To prevent access of config space in the D3cold state,
       increment the runtime PM usage count before doing any config space
       access. Also, most of the IOCTLs do the config space access, so
       maintain one safe list and skip the resume only for these safe IOCTLs
       alone. For other IOCTLs, the runtime PM usage count will be
       incremented first.
    
    9. Now, runtime suspend/resume callbacks need to get the vdev
       reference which can be obtained by dev_get_drvdata(). Currently, the
       dev_set_drvdata() is being set after returning from
       vfio_pci_core_register_device(). The runtime callbacks can come
       anytime after enabling runtime PM so dev_set_drvdata() must happen
       before that. We can move dev_set_drvdata() inside
       vfio_pci_core_register_device() itself.
    
    10. The vfio device user can close the device after putting
        the device into runtime suspended state so inside
        vfio_pci_core_disable(), increment the runtime PM usage count.
    
    11. Runtime PM will be possible only if CONFIG_PM is enabled on
        the host. So, the IOCTL related code can be put under CONFIG_PM
        Kconfig.
    
    Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
    Abhishek Sahu authored and intel-lab-lkp committed Jan 24, 2022
  2. vfio/pci: Invalidate mmaps and block the access in D3hot power state

    According to [PCIe v5 5.3.1.4.1] for D3hot state
    
     "Configuration and Message requests are the only TLPs accepted by a
      Function in the D3Hot state. All other received Requests must be
      handled as Unsupported Requests, and all received Completions may
      optionally be handled as Unexpected Completions."
    
    Currently, if the vfio PCI device has been put into D3hot state and if
    user makes non-config related read/write request in D3hot state, these
    requests will be forwarded to the host and this access may cause
    issues on a few systems.
    
    This patch leverages the memory-disable support added in commit
    'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on
    disabled memory")' to generate page fault on mmap access and
    return error for the direct read/write. If the device is D3hot state,
    then the error needs to be returned for all kinds of BAR
    related access (memory, IO and ROM). Also, the power related structure
    fields need to be protected so we can use the same 'memory_lock' to
    protect these fields also. For the few cases, this 'memory_lock' will be
    already acquired by callers so introduce a separate function
    vfio_pci_set_power_state_locked(). The original
    vfio_pci_set_power_state() now contains the code to do the locking
    related operations.
    
    Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
    Abhishek Sahu authored and intel-lab-lkp committed Jan 24, 2022
  3. vfio/pci: fix memory leak during D3hot to D0 tranistion

    If needs_pm_restore is set (PCI device does not have support for no
    soft reset), then the current PCI state will be saved during D0->D3hot
    transition and same will be restored back during D3hot->D0 transition.
    For saving the PCI state locally, pci_store_saved_state() is being
    used and the pci_load_and_free_saved_state() will free the allocated
    memory.
    
    But for reset related IOCTLs, vfio driver calls PCI reset related
    API's which will internally change the PCI power state back to D0. So,
    when the guest resumes, then it will get the current state as D0 and it
    will skip the call to vfio_pci_set_power_state() for changing the
    power state to D0 explicitly. In this case, the memory pointed by
    pm_save will never be freed.
    
    Also, in malicious sequence, the state changing to D3hot followed by
    VFIO_DEVICE_RESET/VFIO_DEVICE_PCI_HOT_RESET can be run in loop and
    it can cause an OOM situation. This patch stores the power state locally
    and uses the same for comparing the current power state. For the
    places where D0 transition can happen, call vfio_pci_set_power_state()
    to transition to D0 state. Since the vfio power state is still D3hot,
    so this D0 transition will help in running the logic required
    from D3hot->D0 transition. Also, to prevent any miss during
    future development to detect this condition, this patch puts a
    check and frees the memory after printing warning.
    
    This locally saved power state will help in subsequent patches
    also.
    
    Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
    Abhishek Sahu authored and intel-lab-lkp committed Jan 24, 2022
  4. vfio/pci: virtualize PME related registers bits and initialize to zero

    If any PME event will be generated by PCI, then it will be mostly
    handled in the host by the root port PME code. For example, in the case
    of PCIe, the PME event will be sent to the root port and then the PME
    interrupt will be generated. This will be handled in
    drivers/pci/pcie/pme.c at the host side. Inside this, the
    pci_check_pme_status() will be called where PME_Status and PME_En bits
    will be cleared. So, the guest OS which is using vfio-pci device will
    not come to know about this PME event.
    
    To handle these PME events inside guests, we need some framework so
    that if any PME events will happen, then it needs to be forwarded to
    virtual machine monitor. We can virtualize PME related registers bits
    and initialize these bits to zero so vfio-pci device user will assume
    that it is not capable of asserting the PME# signal from any power state.
    
    Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
    Abhishek Sahu authored and intel-lab-lkp committed Jan 24, 2022
  5. vfio/pci: register vfio-pci driver with runtime PM framework

    Currently, there is very limited power management support
    available in the upstream vfio-pci driver. If there is no user of vfio-pci
    device, then the PCI device will be moved into D3Hot state by writing
    directly into PCI PM registers. This D3Hot state help in saving power
    but we can achieve zero power consumption if we go into the D3cold state.
    The D3cold state cannot be possible with native PCI PM. It requires
    interaction with platform firmware which is system-specific.
    To go into low power states (including D3cold), the runtime PM framework
    can be used which internally interacts with PCI and platform firmware and
    puts the device into the lowest possible D-States.
    
    This patch registers vfio-pci driver with the runtime PM framework.
    
    1. The PCI core framework takes care of most of the runtime PM
       related things. For enabling the runtime PM, the PCI driver needs to
       decrement the usage count and needs to register the runtime
       suspend/resume callbacks. For vfio-pci based driver, these callback
       routines can be stubbed in this patch since the vfio-pci driver
       is not doing the PCI device initialization. All the config state
       saving, and PCI power management related things will be done by
       PCI core framework itself inside its runtime suspend/resume callbacks.
    
    2. Inside pci_reset_bus(), all the devices in bus/slot will be moved
       out of D0 state. This state change to D0 can happen directly without
       going through the runtime PM framework. So if runtime PM is enabled,
       then pm_runtime_resume() makes the runtime state active. Since the PCI
       device power state is already D0, so it should return early when it
       tries to change the state with pci_set_power_state(). Then
       pm_request_idle() can be used which will internally check for
       device usage count and will move the device again into the low power
       state.
    
    3. Inside vfio_pci_core_disable(), the device usage count always needs
       to be decremented which was incremented in vfio_pci_core_enable().
    
    4. Since the runtime PM framework will provide the same functionality,
       so directly writing into PCI PM config register can be replaced with
       the use of runtime PM routines. Also, the use of runtime PM can help
       us in more power saving.
    
       In the systems which do not support D3Cold,
    
       With the existing implementation:
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3hot
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D0
    
       With runtime PM:
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3hot
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D3hot
    
       So, with runtime PM, the upstream bridge or root port will also go
       into lower power state which is not possible with existing
       implementation.
    
       In the systems which support D3Cold,
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3hot
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D0
    
       With runtime PM:
    
       // PCI device
       # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
       D3cold
       // upstream bridge
       # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
       D3cold
    
       So, with runtime PM, both the PCI device and upstream bridge will
       go into D3cold state.
    
    5. If 'disable_idle_d3' module parameter is set, then also the runtime
       PM will be enabled, but in this case, the usage count should not be
       decremented.
    
    6. vfio_pci_dev_set_try_reset() return value is unused now, so this
       function return type can be changed to void.
    
    Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
    Abhishek Sahu authored and intel-lab-lkp committed Jan 24, 2022

Commits on Dec 21, 2021

  1. vfio/iommu_type1: replace kfree with kvfree

    Variables allocated by kvzalloc should not be freed by kfree.
    Because they may be allocated by vmalloc.
    So we replace kfree with kvfree here.
    
    Fixes: d6a4c18 ("vfio iommu: Implementation of ioctl for dirty pages tracking")
    Signed-off-by: Jiacheng Shi <billsjc@sjtu.edu.cn>
    Link: https://lore.kernel.org/r/20211212091600.2560-1-billsjc@sjtu.edu.cn
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    Jiacheng Shi authored and awilliam committed Dec 21, 2021
  2. vfio/pci: Resolve sparse endian warnings in IGD support

    Sparse warns:
    
    sparse warnings: (new ones prefixed by >>)
    >> drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned short [addressable] [usertype] val @@     got restricted __le16 [usertype] @@
       drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse:     expected unsigned short [addressable] [usertype] val
       drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse:     got restricted __le16 [usertype]
    >> drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned int [addressable] [usertype] val @@     got restricted __le32 [usertype] @@
       drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse:     expected unsigned int [addressable] [usertype] val
       drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse:     got restricted __le32 [usertype]
       drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned short [addressable] [usertype] val @@     got restricted __le16 [usertype] @@
       drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse:     expected unsigned short [addressable] [usertype] val
       drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse:     got restricted __le16 [usertype]
    
    These are due to trying to use an unsigned to store the result of
    a cpu_to_leXX() conversion.  These are small variables, so pointer
    tricks are wasteful and casting just generates different sparse
    warnings.  Store to and copy results from a separate little endian
    variable.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Link: https://lore.kernel.org/r/202111290026.O3vehj03-lkp@intel.com/
    Link: https://lore.kernel.org/r/163840226123.138003.7668320168896210328.stgit@omen
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
    awilliam committed Dec 21, 2021

Commits on Dec 19, 2021

  1. Linux 5.16-rc6

    torvalds committed Dec 19, 2021
  2. Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

    Pull kvm fixes from Paolo Bonzini:
     "Two small fixes, one of which was being worked around in selftests"
    
    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
      KVM: x86: Retry page fault if MMU reload is pending and root has no sp
      KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs
      KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES
    torvalds committed Dec 19, 2021
  3. Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block

    Pull block revert from Jens Axboe:
     "It turns out that the fix for not hammering on the delayed work timer
      too much caused a performance regression for BFQ, so let's revert the
      change for now.
    
      I've got some ideas on how to fix it appropriately, but they should
      wait for 5.17"
    
    * tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block:
      Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"
    torvalds committed Dec 19, 2021
  4. Merge tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull irq fixes from Borislav Petkov:
    
     - Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it
       is restored to its reset state
    
     - Mask MSI-X vectors late on the init path in order to handle
       out-of-spec Marvell NVME devices which apparently look at the MSI-X
       mask even when MSI-X is disabled
    
    * tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
      PCI/MSI: Mask MSI-X vectors only on success
    torvalds committed Dec 19, 2021
  5. Merge tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/s…

    …cm/linux/kernel/git/tip/tip
    
    Pull timer fix from Borislav Petkov:
    
     - Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never
       positive
    
    * tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      timekeeping: Really make sure wall_to_monotonic isn't positive
    torvalds committed Dec 19, 2021
  6. Merge tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/…

    …scm/linux/kernel/git/tip/tip
    
    Pull locking fix from Borislav Petkov:
    
     - Fix the rtmutex condition checking when the optimistic spinning of a
       waiter needs to be terminated
    
    * tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
    torvalds committed Dec 19, 2021
  7. Merge tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull signal handlign fix from Borislav Petkov:
    
     - Prevent lock contention on the new sigaltstack lock on the
       common-case path, when no changes have been made to the alternative
       signal stack.
    
    * tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      signal: Skip the altstack update when not needed
    torvalds committed Dec 19, 2021
  8. Merge tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/mips/linux
    
    Pull MIPS fix from Thomas Bogendoerfer:
    
     - only enable pci_remap_iospace() for Ralink devices
    
    * tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
      MIPS: Only define pci_remap_iospace() for Ralink
    torvalds committed Dec 19, 2021
  9. Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/powerpc/linux
    
    Pull powerpc fixes from Michael Ellerman:
     "Fix a recently introduced oops at boot on 85xx in some configurations.
    
      Fix crashes when loading some livepatch modules with
      STRICT_MODULE_RWX.
    
      Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"
    
    * tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/module_64: Fix livepatching for RO modules
      powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
    torvalds committed Dec 19, 2021
  10. Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench…

    …/cifs-2.6
    
    Pull cifs fixes from Steve French:
     "Two cifs/smb3 fixes, one fscache related, and one mount parsing
      related for stable"
    
    * tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
      cifs: sanitize multiple delimiters in prepath
      cifs: ignore resource_id while getting fscache super cookie
    torvalds committed Dec 19, 2021
  11. KVM: x86: Retry page fault if MMU reload is pending and root has no sp

    Play nice with a NULL shadow page when checking for an obsolete root in
    the page fault handler by flagging the page fault as stale if there's no
    shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
    Invalidating memslots, which is the only case where _all_ roots need to
    be reloaded, requests all vCPUs to reload their MMUs while holding
    mmu_lock for lock.
    
    The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
    backed by a shadow page.  Running with TDP disabled or with nested NPT
    explodes spectaculary due to dereferencing a NULL shadow page pointer.
    
    Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
    root.  Zapping shadow pages in response to guest activity, e.g. when the
    guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
    vCPU isn't using the affected root.  I.e. KVM_REQ_MMU_RELOAD can be seen
    with a completely valid root shadow page.  This is a bit of a moot point
    as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
    be cleaned up in the future.
    
    Fixes: a955cad ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
    Cc: stable@vger.kernel.org
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20211209060552.2956723-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Dec 19, 2021
  12. KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible …

    …CPUIDs
    
    Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend
    on guest visible CPUIDs and (incorrect) KVM logic implementing it is
    about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden
    and causes test to fail.
    
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Fixes: feb627e ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Message-Id: <20211216165213.338923-2-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    vittyvk authored and bonzini committed Dec 19, 2021
  13. KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA3…

    …2_PERF_CAPABILITIES
    
    The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should
    not depend on guest visible CPUID entries, even if just to allow
    creating/restoring guest MSRs and CPUIDs in any sequence.
    
    Fixes: 27461da ("KVM: x86/pmu: Support full width counting")
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Message-Id: <20211216165213.338923-3-vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    vittyvk authored and bonzini committed Dec 19, 2021
  14. Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"

    This reverts commit cb2ac29.
    
    Alex and the kernel test robot report that this causes a significant
    performance regression with BFQ. I can reproduce that result, so let's
    revert this one as we're close to -rc6 and we there's no point in trying
    to rush a fix.
    
    Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/
    Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/
    Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Dec 19, 2021

Commits on Dec 18, 2021

  1. Merge tag 'tty-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/gregkh/tty
    
    Pull tty/serial fixes from Greg KH:
     "Here are two small tty/serial fixes for 5.16-rc6.  They include:
    
       - n_hdlc fix for syzbot reported problem that you were previously
         copied on.
    
       - 8250_fintek driver fix that resolved a console problem by removing
         a previous change.
    
      Both have been in linux-next with no reported issues"
    
    * tag 'tty-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
      serial: 8250_fintek: Fix garbled text for console
      tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
    torvalds committed Dec 18, 2021
  2. Merge tag 'usb-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/gregkh/usb
    
    Pull USB fixes from Greg KH:
     "Here are a number of small USB driver fixes for reported problems.
      They include:
    
       - dwc2 driver fixes
    
       - xhci driver fixes
    
       - cdnsp driver fixes
    
       - typec driver fix
    
       - gadget u_ether driver fix
    
       - new quirk additions
    
       - usb gadget endpoint calculation fix
    
       - usb serial new device ids
    
       - revert of a xhci-dbg change that broke early debug booting
    
      All changes, except for the revert, have been in linux-next with no
      reported problems. The revert was from yesterday, and it was reported
      by the developers affected that it resolved their problem"
    
    * tag 'usb-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
      Revert "usb: early: convert to readl_poll_timeout_atomic()"
      usb: typec: tcpm: fix tcpm unregister port but leave a pending timer
      usb: cdnsp: Fix lack of spin_lock_irqsave/spin_lock_restore
      USB: NO_LPM quirk Lenovo USB-C to Ethernet Adapher(RTL8153-04)
      usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
      usb: dwc2: fix STM ID/VBUS detection startup delay in dwc2_driver_probe
      USB: gadget: bRequestType is a bitfield, not a enum
      USB: serial: option: add Telit FN990 compositions
      USB: serial: cp210x: fix CP2105 GPIO registration
      usb: cdnsp: Fix incorrect status for control request
      usb: cdnsp: Fix issue in cdnsp_log_ep trace event
      usb: cdnsp: Fix incorrect calling of cdnsp_died function
      usb: xhci-mtk: fix list_del warning when enable list debug
      usb: gadget: u_ether: fix race in setting MAC address in setup phase
    torvalds committed Dec 18, 2021
  3. Merge tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git.kernel…

    ….org/pub/scm/linux/kernel/git/acme/linux
    
    Pull perf tools fixes from Arnaldo Carvalho de Melo:
    
     - Fix segfaults in 'perf inject' related to usage of unopened files
    
     - The return value of hashmap__new() should be checked using IS_ERR()
    
    * tag 'perf-tools-fixes-for-v5.16-2021-12-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
      perf inject: Fix segfault due to perf_data__fd() without open
      perf inject: Fix segfault due to close without open
      perf expr: Fix missing check for return value of hashmap__new()
    torvalds committed Dec 18, 2021
  4. perf inject: Fix segfault due to perf_data__fd() without open

    The fixed commit attempts to get the output file descriptor even if the
    file was never opened e.g.
    
      $ perf record uname
      Linux
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
      $ perf inject -i perf.data --vm-time-correlation=dry-run
      Segmentation fault (core dumped)
      $ gdb --quiet perf
      Reading symbols from perf...
      (gdb) r inject -i perf.data --vm-time-correlation=dry-run
      Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    
      Program received signal SIGSEGV, Segmentation fault.
      __GI___fileno (fp=0x0) at fileno.c:35
      35      fileno.c: No such file or directory.
      (gdb) bt
      #0  __GI___fileno (fp=0x0) at fileno.c:35
      #1  0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72
      #2  perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69
      #3  cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017
      #4  0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313
      #5  0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
      torvalds#6  run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
      torvalds#7  main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539
      (gdb)
    
    Fixes: 0ae0389 ("perf tools: Pass a fd to perf_file_header__read_pipe()")
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Riccardo Mancini <rickyman7@gmail.com>
    Cc: stable@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    ahunter6 authored and Arnaldo Carvalho de Melo committed Dec 18, 2021
  5. perf inject: Fix segfault due to close without open

    The fixed commit attempts to close inject.output even if it was never
    opened e.g.
    
      $ perf record uname
      Linux
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
      $ perf inject -i perf.data --vm-time-correlation=dry-run
      Segmentation fault (core dumped)
      $ gdb --quiet perf
      Reading symbols from perf...
      (gdb) r inject -i perf.data --vm-time-correlation=dry-run
      Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    
      Program received signal SIGSEGV, Segmentation fault.
      0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
      48      iofclose.c: No such file or directory.
      (gdb) bt
      #0  0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
      #1  0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376
      #2  0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085
      #3  0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313
      #4  0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
      #5  run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
      torvalds#6  main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539
      (gdb)
    
    Fixes: 02e6246 ("perf inject: Close inject.output on exit")
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Riccardo Mancini <rickyman7@gmail.com>
    Cc: stable@vger.kernel.org
    Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    ahunter6 authored and Arnaldo Carvalho de Melo committed Dec 18, 2021
  6. perf expr: Fix missing check for return value of hashmap__new()

    The hashmap__new() function may return ERR_PTR(-ENOMEM) when malloc()
    fails, add IS_ERR() checking for ctx->ids.
    
    Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lore.kernel.org/lkml/20211212062504.25841-1-linmq006@gmail.com
    [ s/kfree()/free()/ and add missing linux/err.h include ]
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Yuuoniy authored and Arnaldo Carvalho de Melo committed Dec 18, 2021
  7. locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()

    Optimistic spinning needs to be terminated when the spinning waiter is not
    longer the top waiter on the lock, but the condition is negated. It
    terminates if the waiter is the top waiter, which is defeating the whole
    purpose.
    
    Fixes: c3123c4 ("locking/rtmutex: Dont dereference waiter lockless")
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com
    qiangzh3 authored and Thomas Gleixner committed Dec 18, 2021
  8. Merge tag 'libata-5.16-rc6' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/dlemoal/libata
    
    Pull libata fix from Damien Le Moal:
     "A single fix for this cycle:
    
       - Check that ATA16 passthrough commands that do not transfer any data
         have a DMA direction set to DMA_NONE (From George)"
    
    * tag 'libata-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
      libata: if T_LENGTH is zero, dma direction should be DMA_NONE
    torvalds committed Dec 18, 2021
  9. Merge tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/dlemoal/zonefs
    
    Pull zonefs fixes from Damien Le Moal:
     "One fix and one trivial update for rc6:
    
       - Add MODULE_ALIAS_FS to get automatic module loading on mount
         (Naohiro)
    
       - Update Damien's email address in the MAINTAINERS file (me)"
    
    * tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
      MAITAINERS: Change zonefs maintainer email address
      zonefs: add MODULE_ALIAS_FS
    torvalds committed Dec 18, 2021
  10. cifs: sanitize multiple delimiters in prepath

    mount.cifs can pass a device with multiple delimiters in it. This will
    cause rename(2) to fail with ENOENT.
    
    V2:
      - Make sanitize_path more readable.
      - Fix multiple delimiters between UNC and prepath.
      - Avoid a memory leak if a bad user starts putting a lot of delimiters
        in the path on purpose.
    
    BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2031200
    Fixes: 24e0a1e ("cifs: switch to new mount api")
    Cc: stable@vger.kernel.org # 5.11+
    Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
    Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    trbecker authored and Steve French committed Dec 18, 2021
  11. cifs: ignore resource_id while getting fscache super cookie

    We have a cyclic dependency between fscache super cookie
    and root inode cookie. The super cookie relies on
    tcon->resource_id, which gets populated from the root inode
    number. However, fetching the root inode initializes inode
    cookie as a child of super cookie, which is yet to be populated.
    
    resource_id is only used as auxdata to check the validity of
    super cookie. We can completely avoid setting resource_id to
    remove the circular dependency. Since vol creation time and
    vol serial numbers are used for auxdata, we should be fine.
    Additionally, there will be auxiliary data check for each
    inode cookie as well.
    
    Fixes: 5bf91ef ("cifs: wait for tcon resource_id before getting fscache super")
    CC: David Howells <dhowells@redhat.com>
    Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    sprasad-microsoft authored and Steve French committed Dec 18, 2021

Commits on Dec 17, 2021

  1. timekeeping: Really make sure wall_to_monotonic isn't positive

    Even after commit e1d7ba8 ("time: Always make sure wall_to_monotonic
    isn't positive") it is still possible to make wall_to_monotonic positive
    by running the following code:
    
        int main(void)
        {
            struct timespec time;
    
            clock_gettime(CLOCK_MONOTONIC, &time);
            time.tv_nsec = 0;
            clock_settime(CLOCK_REALTIME, &time);
            return 0;
        }
    
    The reason is that the second parameter of timespec64_compare(), ts_delta,
    may be unnormalized because the delta is calculated with an open coded
    substraction which causes the comparison of tv_sec to yield the wrong
    result:
    
      wall_to_monotonic = { .tv_sec = -10, .tv_nsec =  900000000 }
      ts_delta 	    = { .tv_sec =  -9, .tv_nsec = -900000000 }
    
    That makes timespec64_compare() claim that wall_to_monotonic < ts_delta,
    but actually the result should be wall_to_monotonic > ts_delta.
    
    After normalization, the result of timespec64_compare() is correct because
    the tv_sec comparison is not longer misleading:
    
      wall_to_monotonic = { .tv_sec = -10, .tv_nsec =  900000000 }
      ts_delta 	    = { .tv_sec = -10, .tv_nsec =  100000000 }
    
    Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the
    issue.
    
    Fixes: e1d7ba8 ("time: Always make sure wall_to_monotonic isn't positive")
    Signed-off-by: Yu Liao <liaoyu15@huawei.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
    yuliao0214 authored and Thomas Gleixner committed Dec 17, 2021
  2. Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/jejb/scsi
    
    Pull SCSI fix from James Bottomley:
     "One driver fix: the pm8001 has never actually worked on a system with
      an IOMMU and this fixes that use case"
    
    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
      scsi: pm8001: Fix phys_to_virt() usage on dma_addr_t
    torvalds committed Dec 17, 2021
  3. Merge tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
     "A few more fixes, almost all error handling one-liners and for stable.
    
       - regression fix in directory logging items
    
       - regression fix of extent buffer status bits handling after an error
    
       - fix memory leak in error handling path in tree-log
    
       - fix freeing invalid anon device number when handling errors during
         subvolume creation
    
       - fix warning when freeing leaf after subvolume creation failure
    
       - fix missing blkdev put in device scan error handling
    
       - fix invalid delayed ref after subvolume creation failure"
    
    * tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: fix missing blkdev_put() call in btrfs_scan_one_device()
      btrfs: fix warning when freeing leaf after subvolume creation failure
      btrfs: fix invalid delayed ref after subvolume creation failure
      btrfs: check WRITE_ERR when trying to read an extent buffer
      btrfs: fix missing last dir item offset update when logging directory
      btrfs: fix double free of anon_dev after failure to create subvolume
      btrfs: fix memory leak in __add_inode_ref()
    torvalds committed Dec 17, 2021
Older