Skip to content
Permalink
Tony-Luck/x86-…
Switch branches/tags

Commits on Oct 11, 2021

  1. x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

    SGX EPC pages do not have a "struct page" associated with them so the
    pfn_valid() sanity check fails and results in a warning message to
    the console.
    
    Add an additional check to skip the warning if the address of the error
    is in an SGX EPC page.
    
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021
  2. x86/sgx: Add hook to error injection address validation

    SGX reserved memory does not appear in the standard address maps.
    
    Add hook to call into the SGX code to check if an address is located
    in SGX memory.
    
    There are other challenges in injecting errors into SGX. Update the
    documentation with a sequence of operations to inject.
    
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021
  3. x86/sgx: Hook arch_memory_failure() into mainline code

    Add a call inside memory_failure() to call the arch specific code
    to check if the address is an SGX EPC page and handle it.
    
    Note the SGX EPC pages do not have a "struct page" entry, so the hook
    goes in at the same point as the device mapping hook.
    
    Pull the call to acquire the mutex earlier so the SGX errors are also
    protected.
    
    Make set_mce_nospec() skip SGX pages when trying to adjust
    the 1:1 map.
    
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021
  4. x86/sgx: Add SGX infrastructure to recover from poison

    Provide a recovery function sgx_memory_failure(). If the poison was
    consumed synchronously then send a SIGBUS. Note that the virtual
    address of the access is not included with the SIGBUS as is the case
    for poison outside of SGX enclaves. This doesn't matter as addresses
    of code/data inside an enclave is of little to no use to code executing
    outside the (now dead) enclave.
    
    Poison found in a free page results in the page being moved from the
    free list to the poison page list.
    
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021
  5. x86/sgx: Initial poison handling for dirty and free pages

    A memory controller patrol scrubber can report poison in a page
    that isn't currently being used.
    
    Add "poison" field in the sgx_epc_page that can be set for an
    sgx_epc_page. Check for it:
    1) When sanitizing dirty pages
    2) When freeing epc pages
    
    Poison is a new field separated from flags to avoid having to make
    all updates to flags atomic, or integrate poison state changes into
    some other locking scheme to protect flags.
    
    In both cases place the poisoned page on a list of poisoned epc pages
    to make sure it will not be reallocated.
    
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021
  6. x86/sgx: Add infrastructure to identify SGX EPC pages

    X86 machine check architecture reports a physical address when there
    is a memory error. Handling that error requires a method to determine
    whether the physical address reported is in any of the areas reserved
    for EPC pages by BIOS.
    
    SGX EPC pages do not have Linux "struct page" associated with them.
    
    Keep track of the mapping from ranges of EPC pages to the sections
    that contain them using an xarray.
    
    Create a function arch_is_platform_page() that simply reports whether an
    address is an EPC page for use elsewhere in the kernel. The ACPI error
    injection code needs this function and is typically built as a module,
    so export it.
    
    Note that arch_is_platform_page() will be slower than other similar
    "what type is this page" functions that can simply check bits in the
    "struct page".  If there is some future performance critical user of
    this function it may need to be implemented in a more efficient way.
    
    Note also that the current implementation of xarray allocates a few
    hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
    configured. This isn't ideal, but worth it for the code simplicity.
    
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021
  7. x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages

    SGX EPC pages go through the following life cycle:
    
            DIRTY ---> FREE ---> IN-USE --\
                        ^                 |
                        \-----------------/
    
    Recovery action for poison for a DIRTY or FREE page is simple. Just
    make sure never to allocate the page. IN-USE pages need some extra
    handling.
    
    Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page
    is allocated and cleared when the page is freed.
    
    Notes:
    
    1) These transitions are made while holding the node->lock so that
       future code that checks the flags while holding the node->lock
       can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the
       page is on the free list.
    
    2) Initially while the pages are on the dirty list the
       SGX_EPC_PAGE_IN_USE bit is set.
    
    Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    aegl authored and intel-lab-lkp committed Oct 11, 2021

Commits on Oct 7, 2021

  1. Merge branches 'thermal-int340x' and 'thermal-powerclamp' into linux-…

    …next
    
    * thermal-int340x:
      thermal: int340x: delete bogus length check
    
    * thermal-powerclamp:
      thermal: intel_powerclamp: Use bitmap_zalloc/bitmap_free when applicable
    rafaeljw committed Oct 7, 2021
  2. Merge branch 'pnp' into linux-next

    * pnp:
      PNP: system.c: unmark a comment as being kernel-doc
    rafaeljw committed Oct 7, 2021
  3. Merge branches 'acpica' and 'acpi-misc' into linux-next

    * acpica:
      ACPICA: Update version to 20210930
      ACPICA: iASL table disassembler: Added disassembly support for the NHLT ACPI table
      ACPICA: ACPI 6.4 SRAT: add Generic Port Affinity type
      ACPICA: Add support for Windows 2020 _OSI string
      ACPICA: Avoid evaluating methods too early during system resume
    
    * acpi-misc:
      ACPI: Update information in MAINTAINERS
    rafaeljw committed Oct 7, 2021
  4. Merge branch 'acpi-pci-fixes' into linux-next

    * acpi-pci-fixes:
      PCI: ACPI: Check parent pointer in acpi_pci_find_companion()
    rafaeljw committed Oct 7, 2021
  5. PCI: ACPI: Check parent pointer in acpi_pci_find_companion()

    If acpi_pci_find_companion() is called for a device whose parent
    pointer is NULL, it will crash when attempting to get the ACPI
    companion of the parent due to a NULL pointer dereference in
    the ACPI_COMPANION() macro.
    
    This was not a problem before commit 375553a ("PCI: Setup ACPI
    fwnode early and at the same time with OF") that made pci_setup_device()
    call pci_set_acpi_fwnode() and so it allowed devices with NULL parent
    pointers to be passed to acpi_pci_find_companion() which is the case
    in pci_iov_add_virtfn(), for instance.
    
    Fix this issue by making acpi_pci_find_companion() check the device's
    parent pointer upfront and bail out if it is NULL.
    
    While pci_iov_add_virtfn() can be changed to set the device's parent
    pointer before calling pci_setup_device() for it, checking pointers
    against NULL before dereferencing them is prudent anyway and looking
    for ACPI companions of virtual functions isn't really useful.
    
    Fixes: 375553a ("PCI: Setup ACPI fwnode early and at the same time with OF")
    Link: https://lore.kernel.org/linux-acpi/8e4bbd5c59de31db71f718556654c0aa077df03d.camel@linux.ibm.com/
    Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
    Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    rafaeljw committed Oct 7, 2021

Commits on Oct 6, 2021

  1. Merge branch 'pm-pci' into linux-next

    * pm-pci:
      PCI: PM: Do not call platform_pci_power_manageable() unnecessarily
      PCI: PM: Make pci_choose_state() call pci_target_state()
      PCI: PM: Rearrange pci_target_state()
    rafaeljw committed Oct 6, 2021

Commits on Oct 5, 2021

  1. thermal: int340x: delete bogus length check

    This check has a signedness bug and does not work.  If "length" is
    larger than "PAGE_SIZE" then "PAGE_SIZE - length" is not negative
    but instead it is a large unsigned value.  Fortunately, Takashi Iwai
    changed this code to use scnprint() instead of snprintf() so now
    "length" is never larger than "PAGE_SIZE - 1" and the check can be
    removed.
    
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    error27 authored and rafaeljw committed Oct 5, 2021
  2. thermal: intel_powerclamp: Use bitmap_zalloc/bitmap_free when applicable

    'cpu_clamping_mask' is a bitmap. So use 'bitmap_zalloc()' and
    'bitmap_free()' to simplify code, improve the semantic of the code and
    avoid some open-coded arithmetic in allocator arguments.
    
    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    tititiou36 authored and rafaeljw committed Oct 5, 2021
  3. PNP: system.c: unmark a comment as being kernel-doc

    Fix a documentation build warning caused by the comment not being
    in kernel-doc format:
    
    system.c:110: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
     * Reserve motherboard resources after PCI claim BARs,
    
    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    rddunlap authored and rafaeljw committed Oct 5, 2021
  4. ACPICA: Update version to 20210930

    ACPICA commit e01cc6b3d12b5f73f44d46fa15a7f569c793b328
    
    Version 20210930.
    
    Link: acpica/acpica@e01cc6b3
    Signed-off-by: Bob Moore <robert.moore@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    acpibob authored and rafaeljw committed Oct 5, 2021
  5. ACPICA: iASL table disassembler: Added disassembly support for the NH…

    …LT ACPI table
    
    ACPICA commit 94abe858583de24a425b37cb8e62d56c65c4f3cf
    
    Note: support for Vendor-defined microphone arrays and SNR extensions
    are not supported at this time -- mostly due to a lack of example tables.
    
    Actual compiler support for NHLT is forthcoming.
    
    Link: acpica/acpica@94abe858
    Signed-off-by: Bob Moore <robert.moore@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    acpibob authored and rafaeljw committed Oct 5, 2021
  6. ACPICA: ACPI 6.4 SRAT: add Generic Port Affinity type

    ACPICA commit 777e11b73e60f0eb606cf20142ef634702b09ba1
    
    Add a new subtable type for SRAT Generic Port Affinity.
    It uses the same subtable structure as the existing Generic
    Initiator Affinity type.
    
    Link: acpica/acpica@777e11b7
    Signed-off-by: Alison Schofield <alison.schofield@intel.com>
    Signed-off-by: Bob Moore <robert.moore@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    AlisonSchofield authored and rafaeljw committed Oct 5, 2021
  7. ACPICA: Add support for Windows 2020 _OSI string

    ACPICA commit 2dc55de56d2deac30af0b484dd1d65607eb33a9c
    
    Link: https://github.com/microsoft_docs/windows-driver-docs/commit/5164e24985e78ef4870d7a5801a5336104f36366
    Link: acpica/acpica@2dc55de5
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Bob Moore <robert.moore@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    superm1 authored and rafaeljw committed Oct 5, 2021
  8. ACPICA: Avoid evaluating methods too early during system resume

    ACPICA commit 0762982923f95eb652cf7ded27356b247c9774de
    
    During wakeup from system-wide sleep states, acpi_get_sleep_type_data()
    is called and it tries to get memory from the slab allocator in order
    to evaluate a control method, but if KFENCE is enabled in the kernel,
    the memory allocation attempt causes an IRQ work to be queued and a
    self-IPI to be sent to the CPU running the code which requires the
    memory controller to be ready, so if that happens too early in the
    wakeup path, it doesn't work.
    
    Prevent that from taking place by calling acpi_get_sleep_type_data()
    for S0 upfront, when preparing to enter a given sleep state, and
    saving the data obtained by it for later use during system wakeup.
    
    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214271
    Reported-by: Reik Keutterling <spielkind@gmail.com>
    Tested-by: Reik Keutterling <spielkind@gmail.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    rafaeljw committed Oct 5, 2021
  9. ACPI: Update information in MAINTAINERS

    Because Rui is now going to focus on work that is not related to the
    maintenance of kernel code, drop the MAINTAINERS records for the
    ACPI fan and video drivers that will be maintained by Rafael along
    with the rest of the ACPI subsystem.
    
    While at it, change the information regarding the Len Brown's role in
    the ACPI subsystem to "reviewer" to reflect the current status.
    
    Signed-off-by: Rafael J. Wysocki <rafael@kernel.org>
    Reviewed-by: Len Brown <len.brown@intel.com>
    rafaeljw committed Oct 5, 2021
  10. PCI: PM: Do not call platform_pci_power_manageable() unnecessarily

    Drop two invocations of platform_pci_power_manageable() that are not
    necessary, because the functions called when it returns 'true' do the
    requisite "power manageable" checks themselves.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Ferry Toth <fntoth@gmail.com>
    rafaeljw committed Oct 5, 2021
  11. PCI: PM: Make pci_choose_state() call pci_target_state()

    The pci_choose_state() and pci_target_state() implementations are
    somewhat divergent without a good reason, because they are used
    for similar purposes.
    
    Change the pci_choose_state() implementation to use pci_target_state()
    internally except for transitions to the working state of the system
    in which case it is expected to return D0.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Ferry Toth <fntoth@gmail.com>
    rafaeljw committed Oct 5, 2021
  12. PCI: PM: Rearrange pci_target_state()

    Make pci_target_state() return D3cold or D0 without checking PME
    support if the current power state of the device is D3cold or if it
    does not support the standard PCI PM, respectively.
    
    Next, drop the tergat_state local variable that has become redundant
    from it.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Ferry Toth <fntoth@gmail.com>
    rafaeljw committed Oct 5, 2021
  13. Merge branch 'pm-cpufreq' into linux-next

    * pm-cpufreq:
      cpufreq: intel_pstate: Process HWP Guaranteed change notification
    rafaeljw committed Oct 5, 2021
  14. cpufreq: intel_pstate: Process HWP Guaranteed change notification

    It is possible that HWP guaranteed ratio is changed in response to
    change in power and thermal limits. For example when Intel Speed Select
    performance profile is changed or there is change in TDP, hardware can
    send notifications. It is possible that the guaranteed ratio is
    increased. This creates an issue when turbo is disabled, as the old
    limits set in MSR_HWP_REQUEST are still lower and hardware will clip
    to older limits.
    
    This change enables HWP interrupt and process HWP interrupts. When
    guaranteed is changed, calls cpufreq_update_policy() so that driver
    callbacks are called to update to new HWP limits. This callback
    is called from a delayed workqueue of 10ms to avoid frequent updates.
    
    Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some
    plaforms interrupt is generated on all CPUs. This is particularly a
    problem during initialization, when the driver didn't allocated
    data for other CPUs. So this change uses a cpumask of enabled CPUs and
    process interrupts on those CPUs only.
    
    When the cpufreq offline() or suspend() callback is called, HWP interrupt
    is disabled on those CPUs and also cancels any pending work item.
    
    Spin lock is used to protect data and processing shared with interrupt
    handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate
    shared data, even though spin lock act as an optimization barrier here.
    
    Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Tested-by: pablomh@gmail.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    spandruvada authored and rafaeljw committed Oct 5, 2021
  15. Merge branches 'pm-sleep', 'pm-pci' and 'pm-cpuidle' into linux-next

    * pm-sleep:
      PM: hibernate: Remove blk_status_to_errno in hib_wait_io
      PM: sleep: Do not assume that "mem" is always present
    
    * pm-pci:
      PCI: PM: Simplify acpi_pci_power_manageable()
      PCI: PM: Drop struct pci_platform_pm_ops
      PCI: ACPI: PM: Do not use pci_platform_pm_ops for ACPI
      PCI: PM: Do not use pci_platform_pm_ops for Intel MID PM
    
    * pm-cpuidle:
      cpuidle: Fix kobject memory leaks in error paths
      intel_idle: enable interrupts before C1 on Xeons
    rafaeljw committed Oct 5, 2021
  16. Merge branches 'acpi-pci', 'acpi-pnp', 'acpi-docs', 'acpi-misc' and '…

    …acpi-processor' into linux-next
    
    * acpi-pci:
      ACPI: glue: Look for ACPI bus type only if ACPI companion is not known
      ACPI: glue: Drop cleanup callback from struct acpi_bus_type
      PCI: ACPI: Drop acpi_pci_bus
    
    * acpi-pnp:
      ACPI: PNP: remove duplicated BRI0A49 and BDP3336 entries
    
    * acpi-docs:
      Documentation: ACPI: Fix spelling mistake "Millenium" -> "Millennium"
    
    * acpi-misc:
      ACPI: Kconfig: Fix a typo in Kconfig
    
    * acpi-processor:
      ACPI: processor idle: Allow playing dead in C3 state
    rafaeljw committed Oct 5, 2021
  17. Merge branches 'acpi-x86' and 'acpi-resources' into linux-next

    * acpi-x86:
      x86: ACPI: cstate: Optimize C3 entry on AMD CPUs
      x86/ACPI: Don't add CPUs that are not online capable
      ACPICA: Add support for MADT online enabled bit
    
    * acpi-resources:
      ACPI: resources: Add DMI-based legacy IRQ override quirk
    rafaeljw committed Oct 5, 2021
  18. PCI: PM: Simplify acpi_pci_power_manageable()

    Make acpi_pci_power_manageable() more straightforward.
    
    No functional impact.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Ferry Toth <fntoth@gmail.com>
    rafaeljw committed Oct 5, 2021
  19. PCI: PM: Drop struct pci_platform_pm_ops

    After previous changes there are no more users of struct
    pci_platform_pm_ops in the tree, so drop it along with all of the
    remaining related code.
    
    No functional impact.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Ferry Toth <fntoth@gmail.com>
    rafaeljw committed Oct 5, 2021

Commits on Oct 3, 2021

  1. Linux 5.15-rc4

    torvalds committed Oct 3, 2021
  2. elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings

    In commit b212921 ("elf: don't use MAP_FIXED_NOREPLACE for elf
    executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
    load_elf_interp.
    
    Unfortunately, this will cause kernel to fail to start with:
    
        1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
        Failed to execute /init (error -17)
    
    The reason is that the elf interpreter (ld.so) has overlapping segments.
    
      readelf -l ld-2.31.so
      Program Headers:
        Type           Offset             VirtAddr           PhysAddr
                       FileSiz            MemSiz              Flags  Align
        LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                       0x000000000002c94c 0x000000000002c94c  R E    0x10000
        LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                       0x00000000000021e8 0x0000000000002320  RW     0x10000
        LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                       0x00000000000011ac 0x0000000000001328  RW     0x10000
    
    The reason for this problem is the same as described in commit
    ad55eac ("elf: enforce MAP_FIXED on overlaying elf segments").
    
    Not only executable binaries, elf interpreters (e.g. ld.so) can have
    overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
    back to MAP_FIXED in load_elf_interp.
    
    Fixes: 4ed2863 ("fs, elf: drop MAP_FIXED usage from elf_map")
    Cc: <stable@vger.kernel.org> # v4.19
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Chen Jingwen authored and torvalds committed Oct 3, 2021
  3. Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/tytso/ext4
    
    Pull ext4 fixes from Ted Ts'o:
     "Fix a number of ext4 bugs in fast_commit, inline data, and delayed
      allocation.
    
      Also fix error handling code paths in ext4_dx_readdir() and
      ext4_fill_super().
    
      Finally, avoid a grabbing a journal head in the delayed allocation
      write in the common cases where we are overwriting a pre-existing
      block or appending to an inode"
    
    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
      ext4: recheck buffer uptodate bit under buffer lock
      ext4: fix potential infinite loop in ext4_dx_readdir()
      ext4: flush s_error_work before journal destroy in ext4_fill_super
      ext4: fix loff_t overflow in ext4_max_bitmap_size()
      ext4: fix reserved space counter leakage
      ext4: limit the number of blocks in one ADD_RANGE TLV
      ext4: enforce buffer head state assertion in ext4_da_map_blocks
      ext4: remove extent cache entries when truncating inline data
      ext4: drop unnecessary journal handle in delalloc write
      ext4: factor out write end code of inline file
      ext4: correct the error path of ext4_write_inline_data_end()
      ext4: check and update i_disksize properly
      ext4: add error checking to ext4_ext_replay_set_iblocks()
    torvalds committed Oct 3, 2021
Older