Skip to content
Permalink
Ming-Lei/genir…
Switch branches/tags

Commits on Aug 18, 2021

  1. blk-mq: build default queue map via group_cpus_evenly()

    The default queue mapping builder of blk_mq_map_queues doesn't take NUMA
    topo into account, so the built mapping is pretty bad, since CPUs
    belonging to different NUMA node are assigned to same queue. It is
    observed that IOPS drops by ~30% when running two jobs on same hctx
    of null_blk from two CPUs belonging to two NUMA nodes compared with
    from same NUMA node.
    
    Address the issue by reusing group_cpus_evenly() for addressing the
    issue since group_cpus_evenly() does group cpus according to CPU/NUMA
    locality.
    
    Lots of drivers may benefit from the change, such as nvme pci poll,
    nvme tcp, ...
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021
  2. lib/group_cpus: allow to group cpus in case of !CONFIG_SMP

    Allows group_cpus_evenly() to be called in case of !CONFIG_SMP by simply
    assigning all CPUs into the 1st group.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021
  3. genirq/affinity: move group_cpus_evenly() into lib/

    group_cpus_evenly() has become one generic helper which can be used for
    other subsystems, so move it into lib/.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021
  4. genirq/affinity: rename irq_build_affinity_masks as group_cpus_evenly

    Map irq vector into group, so we can abstract the algorithm for generic
    use case.
    
    Rename irq_build_affinity_masks as group_cpus_evenly, so we can reuse
    the API for blk-mq to make default queue mapping.
    
    No functional change, just rename vector as group.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021
  5. genirq/affinity: don't pass irq_affinity_desc array to irq_build_affi…

    …nity_masks
    
    Prepare for abstracting irq_build_affinity_masks() into one public helper
    for assigning all CPUs evenly into several groups. Don't pass
    irq_affinity_desc array to irq_build_affinity_masks, instead return
    a cpumask array by storing each assigned group into one element of
    the array.
    
    This way helps us to provide generic interface for grouping all CPUs
    evenly from NUMA and CPU locality viewpoint, and the cost is one extra
    allocation in irq_build_affinity_masks(), which should be fine since
    it is done via GFP_KERNEL and irq_build_affinity_masks() is called very
    less.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021
  6. genirq/affinity: pass affinity managed mask array to irq_build_affini…

    …ty_masks
    
    Pass affinity managed mask array to irq_build_affinity_masks() so that
    index of the first affinity managed vector is always zero, then we can
    simplify the implementation a bit.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021
  7. genirq/affinity: remove the 'firstvec' parameter from irq_build_affin…

    …ity_masks
    
    The 'firstvec' parameter is always same with the parameter of
    'startvec', so use 'startvec' directly inside irq_build_affinity_masks().
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Ming Lei authored and intel-lab-lkp committed Aug 18, 2021

Commits on Aug 13, 2021

  1. genirq: Fix kernel doc indentation

    Fixes: 61377ec ("genirq: Clarify documentation for request_threaded_irq()")
    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Thomas Gleixner committed Aug 13, 2021

Commits on Aug 11, 2021

  1. genirq: Fix kernel-doc warnings in pm.c, msi.c and ipi.c

    Fix all kernel-doc warnings in these 3 files and do some simple editing
    (capitalize acronyms, capitalize Linux).
    
    kernel/irq/pm.c:235: warning: expecting prototype for irq_pm_syscore_ops(). Prototype was for irq_pm_syscore_resume() instead
    kernel/irq/msi.c:530: warning: expecting prototype for __msi_domain_free_irqs(). Prototype was for msi_domain_free_irqs() instead
    kernel/irq/msi.c:31: warning: No description found for return value of 'alloc_msi_entry'
    kernel/irq/msi.c:103: warning: No description found for return value of 'msi_domain_set_affinity'
    kernel/irq/msi.c:288: warning: No description found for return value of 'msi_create_irq_domain'
    kernel/irq/msi.c:499: warning: No description found for return value of 'msi_domain_alloc_irqs'
    kernel/irq/msi.c:545: warning: No description found for return value of 'msi_get_domain_info'
    kernel/irq/ipi.c:264: warning: expecting prototype for ipi_send_mask(). Prototype was for __ipi_send_mask() instead
    kernel/irq/ipi.c:25: warning: No description found for return value of 'irq_reserve_ipi'
    kernel/irq/ipi.c:116: warning: No description found for return value of 'irq_destroy_ipi'
    kernel/irq/ipi.c:163: warning: No description found for return value of 'ipi_get_hwirq'
    kernel/irq/ipi.c:222: warning: No description found for return value of '__ipi_send_single'
    kernel/irq/ipi.c:308: warning: No description found for return value of 'ipi_send_single'
    kernel/irq/ipi.c:329: warning: No description found for return value of 'ipi_send_mask'
    
    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20210810234835.12547-1-rdunlap@infradead.org
    rddunlap authored and Thomas Gleixner committed Aug 11, 2021
  2. genirq/timings: Fix error return code in irq_timings_test_irqs()

    Return a negative error code from the error handling case instead of 0, as
    done elsewhere in this function.
    
    Fixes: f52da98 ("genirq/timings: Add selftest for irqs circular buffer")
    Reported-by: Hulk Robot <hulkci@huawei.com>
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20210811093333.2376-1-thunder.leizhen@huawei.com
    Zhen Lei authored and Thomas Gleixner committed Aug 11, 2021

Commits on Aug 10, 2021

  1. genirq/matrix: Fix kernel doc warnings for irq_matrix_alloc_managed()

    Describe the arguments correctly.
    
    Fixes the following W=1 kernel build warning(s):
    
    kernel/irq/matrix.c:287: warning: Function parameter or
     member 'msk' not described in 'irq_matrix_alloc_managed'
    kernel/irq/matrix.c:287: warning: Function parameter or
     member 'mapped_cpu' not described in 'irq_matrix_alloc_managed'
    kernel/irq/matrix.c:287: warning: Excess function
     parameter 'cpu' description in 'irq_matrix_alloc_managed'
    
    Signed-off-by: Baokun Li <libaokun1@huawei.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20210605063413.684085-1-libaokun1@huawei.com
    Baokun Li authored and Thomas Gleixner committed Aug 10, 2021
  2. genirq: Change force_irqthreads to a static key

    With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads
    could incur a cache line miss in invoke_softirq() and other places.
    
    Replace the test with a static key to avoid the potential cache miss.
    
    [ tglx: Dropped the IDE part, removed the export and updated blk-mq ]
    
    Suggested-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Tanner Love <tannerlove@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210602180338.3324213-1-tannerlove.kernel@gmail.com
    tannerlove authored and Thomas Gleixner committed Aug 10, 2021
  3. genirq/generic_chip: Use struct_size() in kzalloc()

    Make use of the struct_size() helper instead of an open-coded version,
    in order to avoid any potential type mistakes or integer overflows
    that, in the worst scenario, could lead to heap overflows.
    
    This code was detected with the help of Coccinelle and, audited and
    fixed manually.
    
    Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20210513212729.GA214145@embeddedor
    GustavoARSilva authored and Thomas Gleixner committed Aug 10, 2021
  4. genirq: Clarify documentation for request_threaded_irq()

    Clarify wording and document commonly used IRQF_ONESHOT flag.
    
    Signed-off-by: Joel Savitz <jsavitz@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20210731050740.444454-1-jsavitz@redhat.com
    theyoyojo authored and Thomas Gleixner committed Aug 10, 2021
  5. genirq/affinity: Replace deprecated CPU-hotplug functions.

    The functions get_online_cpus() and put_online_cpus() have been
    deprecated during the CPU hotplug rework. They map directly to
    cpus_read_lock() and cpus_read_unlock().
    
    Replace deprecated CPU-hotplug functions with the official version.
    The behavior remains unchanged.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20210803141621.780504-26-bigeasy@linutronix.de
    Sebastian Andrzej Siewior authored and Thomas Gleixner committed Aug 10, 2021
  6. PCI/MSI: Use new mask/unmask functions

    Switch the PCI/MSI core to use the new mask/unmask functions. No functional
    change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20210729222543.311207034@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  7. PCI/MSI: Provide a new set of mask and unmask functions

    The existing mask/unmask functions are convoluted and generate suboptimal
    assembly code.
    
    Provide a new set of functions which will be used in later patches to
    replace the exisiting ones.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/875ywetozb.ffs@tglx
    Thomas Gleixner committed Aug 10, 2021
  8. PCI/MSI: Cleanup msi_mask()

    msi_mask() is calculating the possible mask bits for MSI per vector
    masking.
    
    Rename it to msi_multi_mask() and hand the MSI descriptor pointer into it
    to simplify the call sites.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20210729222543.203905260@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  9. PCI/MSI: Deobfuscate virtual MSI-X

    Handling of virtual MSI-X is obfuscated by letting pci_msix_desc_addr()
    return NULL and checking the pointer.
    
    Just use msi_desc::msi_attrib.is_virtual at the call sites and get rid of
    that pointer check.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20210729222543.151522318@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  10. PCI/MSI: Consolidate error handling in msi_capability_init()

    Three error exits doing exactly the same ask for a common error exit point.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20210729222543.098828720@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  11. PCI/MSI: Rename msi_desc::masked

    msi_desc::masked is a misnomer. For MSI it's used to cache the MSI mask
    bits when the device supports per vector masking. For MSI-X it's used to
    cache the content of the vector control word which contains the mask bit
    for the vector.
    
    Replace it with a union of msi_mask and msix_ctrl to make the purpose clear
    and fix up the usage sites.
    
    No functional change
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20210729222543.045993608@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  12. PCI/MSI: Simplify msi_verify_entries()

    No point in looping over all entries when 64bit addressing mode is enabled
    for nothing.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Link: https://lore.kernel.org/r/20210729222542.992849326@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  13. s390/pci: Do not mask MSI[-X] entries on teardown

    The PCI core already ensures that the MSI[-X] state is correct when MSI[-X]
    is disabled. For MSI the reset state is all entries unmasked and for MSI-X
    all vectors are masked.
    
    S390 masks all MSI entries and masks the already masked MSI-X entries
    again. Remove it and let the device in the correct state.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210729222542.939798136@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  14. Merge branch 'irq/urgent' into irq/core

    to pick up fixes on which further changes depend on.
    Thomas Gleixner committed Aug 10, 2021
  15. x86/msi: Force affinity setup before startup

    The X86 MSI mechanism cannot handle interrupt affinity changes safely after
    startup other than from an interrupt handler, unless interrupt remapping is
    enabled. The startup sequence in the generic interrupt code violates that
    assumption.
    
    Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
    the default interrupt setting happens before the interrupt is started up
    for the first time.
    
    While the interrupt remapping MSI chip does not require this, there is no
    point in treating it differently as this might spare an interrupt to a CPU
    which is not in the default affinity mask.
    
    For the non-remapping case go to the direct write path when the interrupt
    is not yet started similar to the not yet activated case.
    
    Fixes: 1840475 ("genirq: Expose default irq affinity mask (take 3)")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  16. x86/ioapic: Force affinity setup before startup

    The IO/APIC cannot handle interrupt affinity changes safely after startup
    other than from an interrupt handler. The startup sequence in the generic
    interrupt code violates that assumption.
    
    Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
    the default interrupt setting happens before the interrupt is started up
    for the first time.
    
    Fixes: 1840475 ("genirq: Expose default irq affinity mask (take 3)")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  17. genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP

    X86 IO/APIC and MSI interrupts (when used without interrupts remapping)
    require that the affinity setup on startup is done before the interrupt is
    enabled for the first time as the non-remapped operation mode cannot safely
    migrate enabled interrupts from arbitrary contexts. Provide a new irq chip
    flag which allows affected hardware to request this.
    
    This has to be opt-in because there have been reports in the past that some
    interrupt chips cannot handle affinity setting before startup.
    
    Fixes: 1840475 ("genirq: Expose default irq affinity mask (take 3)")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  18. PCI/MSI: Protect msi_desc::masked for multi-MSI

    Multi-MSI uses a single MSI descriptor and there is a single mask register
    when the device supports per vector masking. To avoid reading back the mask
    register the value is cached in the MSI descriptor and updates are done by
    clearing and setting bits in the cache and writing it to the device.
    
    But nothing protects msi_desc::masked and the mask register from being
    modified concurrently on two different CPUs for two different Linux
    interrupts which belong to the same multi-MSI descriptor.
    
    Add a lock to struct device and protect any operation on the mask and the
    mask register with it.
    
    This makes the update of msi_desc::masked unconditional, but there is no
    place which requires a modification of the hardware register without
    updating the masked cache.
    
    msi_mask_irq() is now an empty wrapper which will be cleaned up in follow
    up changes.
    
    The problem goes way back to the initial support of multi-MSI, but picking
    the commit which introduced the mask cache is a valid cut off point
    (2.6.30).
    
    Fixes: f2440d9 ("PCI MSI: Refactor interrupt masking code")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  19. PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()

    No point in using the raw write function from shutdown. Preparatory change
    to introduce proper serialization for the msi_desc::masked cache.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.674391354@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  20. PCI/MSI: Correct misleading comments

    The comments about preserving the cached state in pci_msi[x]_shutdown() are
    misleading as the MSI descriptors are freed right after those functions
    return. So there is nothing to restore. Preparatory change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.621609423@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  21. PCI/MSI: Do not set invalid bits in MSI mask

    msi_mask_irq() takes a mask and a flags argument. The mask argument is used
    to mask out bits from the cached mask and the flags argument to set bits.
    
    Some places invoke it with a flags argument which sets bits which are not
    used by the device, i.e. when the device supports up to 8 vectors a full
    unmask in some places sets the mask to 0xFFFFFF00. While devices probably
    do not care, it's still bad practice.
    
    Fixes: 7ba1930 ("PCI MSI: Unmask MSI if setup failed")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  22. PCI/MSI: Enforce MSI[X] entry updates to be visible

    Nothing enforces the posted writes to be visible when the function
    returns. Flush them even if the flush might be redundant when the entry is
    masked already as the unmask will flush as well. This is either setup or a
    rare affinity change event so the extra flush is not the end of the world.
    
    While this is more a theoretical issue especially the logic in the X86
    specific msi_set_affinity() function relies on the assumption that the
    update has reached the hardware when the function returns.
    
    Again, as this never has been enforced the Fixes tag refers to a commit in:
       git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
    
    Fixes: f036d4e ("[PATCH] ia32 Message Signalled Interrupt support")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  23. PCI/MSI: Enforce that MSI-X table entry is masked for update

    The specification (PCIe r5.0, sec 6.1.4.5) states:
    
        For MSI-X, a function is permitted to cache Address and Data values
        from unmasked MSI-X Table entries. However, anytime software unmasks a
        currently masked MSI-X Table entry either by clearing its Mask bit or
        by clearing the Function Mask bit, the function must update any Address
        or Data values that it cached from that entry. If software changes the
        Address or Data value of an entry while the entry is unmasked, the
        result is undefined.
    
    The Linux kernel's MSI-X support never enforced that the entry is masked
    before the entry is modified hence the Fixes tag refers to a commit in:
          git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
    
    Enforce the entry to be masked across the update.
    
    There is no point in enforcing this to be handled at all possible call
    sites as this is just pointless code duplication and the common update
    function is the obvious place to enforce this.
    
    Fixes: f036d4e ("[PATCH] ia32 Message Signalled Interrupt support")
    Reported-by: Kevin Tian <kevin.tian@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  24. PCI/MSI: Mask all unused MSI-X entries

    When MSI-X is enabled the ordering of calls is:
    
      msix_map_region();
      msix_setup_entries();
      pci_msi_setup_msi_irqs();
      msix_program_entries();
    
    This has a few interesting issues:
    
     1) msix_setup_entries() allocates the MSI descriptors and initializes them
        except for the msi_desc:masked member which is left zero initialized.
    
     2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets
        up the MSI interrupts which ends up in pci_write_msi_msg() unless the
        interrupt chip provides its own irq_write_msi_msg() function.
    
     3) msix_program_entries() does not do what the name suggests. It solely
        updates the entries array (if not NULL) and initializes the masked
        member for each MSI descriptor by reading the hardware state and then
        masks the entry.
    
    Obviously this has some issues:
    
     1) The uninitialized masked member of msi_desc prevents the enforcement
        of masking the entry in pci_write_msi_msg() depending on the cached
        masked bit. Aside of that half initialized data is a NONO in general
    
     2) msix_program_entries() only ensures that the actually allocated entries
        are masked. This is wrong as experimentation with crash testing and
        crash kernel kexec has shown.
    
        This limited testing unearthed that when the production kernel had more
        entries in use and unmasked when it crashed and the crash kernel
        allocated a smaller amount of entries, then a full scan of all entries
        found unmasked entries which were in use in the production kernel.
    
        This is obviously a device or emulation issue as the device reset
        should mask all MSI-X table entries, but obviously that's just part
        of the paper specification.
    
    Cure this by:
    
     1) Masking all table entries in hardware
     2) Initializing msi_desc::masked in msix_setup_entries()
     3) Removing the mask dance in msix_program_entries()
     4) Renaming msix_program_entries() to msix_update_entries() to
        reflect the purpose of that function.
    
    As the masking of unused entries has never been done the Fixes tag refers
    to a commit in:
       git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
    
    Fixes: f036d4e ("[PATCH] ia32 Message Signalled Interrupt support")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
  25. PCI/MSI: Enable and mask MSI-X early

    The ordering of MSI-X enable in hardware is dysfunctional:
    
     1) MSI-X is disabled in the control register
     2) Various setup functions
     3) pci_msi_setup_msi_irqs() is invoked which ends up accessing
        the MSI-X table entries
     4) MSI-X is enabled and masked in the control register with the
        comment that enabling is required for some hardware to access
        the MSI-X table
    
    Step #4 obviously contradicts #3. The history of this is an issue with the
    NIU hardware. When #4 was introduced the table access actually happened in
    msix_program_entries() which was invoked after enabling and masking MSI-X.
    
    This was changed in commit d71d643 ("PCI/MSI: Kill redundant call of
    irq_set_msi_desc() for MSI-X interrupts") which removed the table write
    from msix_program_entries().
    
    Interestingly enough nobody noticed and either NIU still works or it did
    not get any testing with a kernel 3.19 or later.
    
    Nevertheless this is inconsistent and there is no reason why MSI-X can't be
    enabled and masked in the control register early on, i.e. move step #4
    above to step #1. This preserves the NIU workaround and has no side effects
    on other hardware.
    
    Fixes: d71d643 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts")
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Marc Zyngier <maz@kernel.org>
    Reviewed-by: Ashok Raj <ashok.raj@intel.com>
    Reviewed-by: Marc Zyngier <maz@kernel.org>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de
    Thomas Gleixner committed Aug 10, 2021
Older