Tony-Luck/x86-…
Commits on Oct 11, 2021
-
x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
SGX EPC pages do not have a "struct page" associated with them so the pfn_valid() sanity check fails and results in a warning message to the console. Add an additional check to skip the warning if the address of the error is in an SGX EPC page. Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
x86/sgx: Add hook to error injection address validation
SGX reserved memory does not appear in the standard address maps. Add hook to call into the SGX code to check if an address is located in SGX memory. There are other challenges in injecting errors into SGX. Update the documentation with a sequence of operations to inject. Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
x86/sgx: Hook arch_memory_failure() into mainline code
Add a call inside memory_failure() to call the arch specific code to check if the address is an SGX EPC page and handle it. Note the SGX EPC pages do not have a "struct page" entry, so the hook goes in at the same point as the device mapping hook. Pull the call to acquire the mutex earlier so the SGX errors are also protected. Make set_mce_nospec() skip SGX pages when trying to adjust the 1:1 map. Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
x86/sgx: Add SGX infrastructure to recover from poison
Provide a recovery function sgx_memory_failure(). If the poison was consumed synchronously then send a SIGBUS. Note that the virtual address of the access is not included with the SIGBUS as is the case for poison outside of SGX enclaves. This doesn't matter as addresses of code/data inside an enclave is of little to no use to code executing outside the (now dead) enclave. Poison found in a free page results in the page being moved from the free list to the poison page list. Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
x86/sgx: Initial poison handling for dirty and free pages
A memory controller patrol scrubber can report poison in a page that isn't currently being used. Add "poison" field in the sgx_epc_page that can be set for an sgx_epc_page. Check for it: 1) When sanitizing dirty pages 2) When freeing epc pages Poison is a new field separated from flags to avoid having to make all updates to flags atomic, or integrate poison state changes into some other locking scheme to protect flags. In both cases place the poisoned page on a list of poisoned epc pages to make sure it will not be reallocated. Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
x86/sgx: Add infrastructure to identify SGX EPC pages
X86 machine check architecture reports a physical address when there is a memory error. Handling that error requires a method to determine whether the physical address reported is in any of the areas reserved for EPC pages by BIOS. SGX EPC pages do not have Linux "struct page" associated with them. Keep track of the mapping from ranges of EPC pages to the sections that contain them using an xarray. Create a function arch_is_platform_page() that simply reports whether an address is an EPC page for use elsewhere in the kernel. The ACPI error injection code needs this function and is typically built as a module, so export it. Note that arch_is_platform_page() will be slower than other similar "what type is this page" functions that can simply check bits in the "struct page". If there is some future performance critical user of this function it may need to be implemented in a more efficient way. Note also that the current implementation of xarray allocates a few hundred kilobytes for this usage on a system with 4GB of SGX EPC memory configured. This isn't ideal, but worth it for the code simplicity. Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
-
x86/sgx: Add new sgx_epc_page flag bit to mark in-use pages
SGX EPC pages go through the following life cycle: DIRTY ---> FREE ---> IN-USE --\ ^ | \-----------------/ Recovery action for poison for a DIRTY or FREE page is simple. Just make sure never to allocate the page. IN-USE pages need some extra handling. Add a new flag bit SGX_EPC_PAGE_IN_USE that is set when a page is allocated and cleared when the page is freed. Notes: 1) These transitions are made while holding the node->lock so that future code that checks the flags while holding the node->lock can be sure that if the SGX_EPC_PAGE_IN_USE bit is set, then the page is on the free list. 2) Initially while the pages are on the dirty list the SGX_EPC_PAGE_IN_USE bit is set. Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Commits on Oct 7, 2021
-
Merge branches 'thermal-int340x' and 'thermal-powerclamp' into linux-…
…next * thermal-int340x: thermal: int340x: delete bogus length check * thermal-powerclamp: thermal: intel_powerclamp: Use bitmap_zalloc/bitmap_free when applicable
-
Merge branch 'pnp' into linux-next
* pnp: PNP: system.c: unmark a comment as being kernel-doc
-
Merge branches 'acpica' and 'acpi-misc' into linux-next
* acpica: ACPICA: Update version to 20210930 ACPICA: iASL table disassembler: Added disassembly support for the NHLT ACPI table ACPICA: ACPI 6.4 SRAT: add Generic Port Affinity type ACPICA: Add support for Windows 2020 _OSI string ACPICA: Avoid evaluating methods too early during system resume * acpi-misc: ACPI: Update information in MAINTAINERS
-
Merge branch 'acpi-pci-fixes' into linux-next
* acpi-pci-fixes: PCI: ACPI: Check parent pointer in acpi_pci_find_companion()
-
PCI: ACPI: Check parent pointer in acpi_pci_find_companion()
If acpi_pci_find_companion() is called for a device whose parent pointer is NULL, it will crash when attempting to get the ACPI companion of the parent due to a NULL pointer dereference in the ACPI_COMPANION() macro. This was not a problem before commit 375553a ("PCI: Setup ACPI fwnode early and at the same time with OF") that made pci_setup_device() call pci_set_acpi_fwnode() and so it allowed devices with NULL parent pointers to be passed to acpi_pci_find_companion() which is the case in pci_iov_add_virtfn(), for instance. Fix this issue by making acpi_pci_find_companion() check the device's parent pointer upfront and bail out if it is NULL. While pci_iov_add_virtfn() can be changed to set the device's parent pointer before calling pci_setup_device() for it, checking pointers against NULL before dereferencing them is prudent anyway and looking for ACPI companions of virtual functions isn't really useful. Fixes: 375553a ("PCI: Setup ACPI fwnode early and at the same time with OF") Link: https://lore.kernel.org/linux-acpi/8e4bbd5c59de31db71f718556654c0aa077df03d.camel@linux.ibm.com/ Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Commits on Oct 6, 2021
-
Merge branch 'pm-pci' into linux-next
* pm-pci: PCI: PM: Do not call platform_pci_power_manageable() unnecessarily PCI: PM: Make pci_choose_state() call pci_target_state() PCI: PM: Rearrange pci_target_state()
Commits on Oct 5, 2021
-
thermal: int340x: delete bogus length check
This check has a signedness bug and does not work. If "length" is larger than "PAGE_SIZE" then "PAGE_SIZE - length" is not negative but instead it is a large unsigned value. Fortunately, Takashi Iwai changed this code to use scnprint() instead of snprintf() so now "length" is never larger than "PAGE_SIZE - 1" and the check can be removed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
thermal: intel_powerclamp: Use bitmap_zalloc/bitmap_free when applicable
'cpu_clamping_mask' is a bitmap. So use 'bitmap_zalloc()' and 'bitmap_free()' to simplify code, improve the semantic of the code and avoid some open-coded arithmetic in allocator arguments. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
PNP: system.c: unmark a comment as being kernel-doc
Fix a documentation build warning caused by the comment not being in kernel-doc format: system.c:110: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Reserve motherboard resources after PCI claim BARs, Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
ACPICA: Update version to 20210930
ACPICA commit e01cc6b3d12b5f73f44d46fa15a7f569c793b328 Version 20210930. Link: acpica/acpica@e01cc6b3 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
ACPICA: iASL table disassembler: Added disassembly support for the NH…
…LT ACPI table ACPICA commit 94abe858583de24a425b37cb8e62d56c65c4f3cf Note: support for Vendor-defined microphone arrays and SNR extensions are not supported at this time -- mostly due to a lack of example tables. Actual compiler support for NHLT is forthcoming. Link: acpica/acpica@94abe858 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
ACPICA: ACPI 6.4 SRAT: add Generic Port Affinity type
ACPICA commit 777e11b73e60f0eb606cf20142ef634702b09ba1 Add a new subtable type for SRAT Generic Port Affinity. It uses the same subtable structure as the existing Generic Initiator Affinity type. Link: acpica/acpica@777e11b7 Signed-off-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
ACPICA: Add support for Windows 2020 _OSI string
ACPICA commit 2dc55de56d2deac30af0b484dd1d65607eb33a9c Link: https://github.com/microsoft_docs/windows-driver-docs/commit/5164e24985e78ef4870d7a5801a5336104f36366 Link: acpica/acpica@2dc55de5 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
ACPICA: Avoid evaluating methods too early during system resume
ACPICA commit 0762982923f95eb652cf7ded27356b247c9774de During wakeup from system-wide sleep states, acpi_get_sleep_type_data() is called and it tries to get memory from the slab allocator in order to evaluate a control method, but if KFENCE is enabled in the kernel, the memory allocation attempt causes an IRQ work to be queued and a self-IPI to be sent to the CPU running the code which requires the memory controller to be ready, so if that happens too early in the wakeup path, it doesn't work. Prevent that from taking place by calling acpi_get_sleep_type_data() for S0 upfront, when preparing to enter a given sleep state, and saving the data obtained by it for later use during system wakeup. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214271 Reported-by: Reik Keutterling <spielkind@gmail.com> Tested-by: Reik Keutterling <spielkind@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
ACPI: Update information in MAINTAINERS
Because Rui is now going to focus on work that is not related to the maintenance of kernel code, drop the MAINTAINERS records for the ACPI fan and video drivers that will be maintained by Rafael along with the rest of the ACPI subsystem. While at it, change the information regarding the Len Brown's role in the ACPI subsystem to "reviewer" to reflect the current status. Signed-off-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Len Brown <len.brown@intel.com>
-
PCI: PM: Do not call platform_pci_power_manageable() unnecessarily
Drop two invocations of platform_pci_power_manageable() that are not necessary, because the functions called when it returns 'true' do the requisite "power manageable" checks themselves. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ferry Toth <fntoth@gmail.com>
-
PCI: PM: Make pci_choose_state() call pci_target_state()
The pci_choose_state() and pci_target_state() implementations are somewhat divergent without a good reason, because they are used for similar purposes. Change the pci_choose_state() implementation to use pci_target_state() internally except for transitions to the working state of the system in which case it is expected to return D0. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ferry Toth <fntoth@gmail.com>
-
PCI: PM: Rearrange pci_target_state()
Make pci_target_state() return D3cold or D0 without checking PME support if the current power state of the device is D3cold or if it does not support the standard PCI PM, respectively. Next, drop the tergat_state local variable that has become redundant from it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ferry Toth <fntoth@gmail.com>
-
Merge branch 'pm-cpufreq' into linux-next
* pm-cpufreq: cpufreq: intel_pstate: Process HWP Guaranteed change notification
-
cpufreq: intel_pstate: Process HWP Guaranteed change notification
It is possible that HWP guaranteed ratio is changed in response to change in power and thermal limits. For example when Intel Speed Select performance profile is changed or there is change in TDP, hardware can send notifications. It is possible that the guaranteed ratio is increased. This creates an issue when turbo is disabled, as the old limits set in MSR_HWP_REQUEST are still lower and hardware will clip to older limits. This change enables HWP interrupt and process HWP interrupts. When guaranteed is changed, calls cpufreq_update_policy() so that driver callbacks are called to update to new HWP limits. This callback is called from a delayed workqueue of 10ms to avoid frequent updates. Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some plaforms interrupt is generated on all CPUs. This is particularly a problem during initialization, when the driver didn't allocated data for other CPUs. So this change uses a cpumask of enabled CPUs and process interrupts on those CPUs only. When the cpufreq offline() or suspend() callback is called, HWP interrupt is disabled on those CPUs and also cancels any pending work item. Spin lock is used to protect data and processing shared with interrupt handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate shared data, even though spin lock act as an optimization barrier here. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: pablomh@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Merge branches 'pm-sleep', 'pm-pci' and 'pm-cpuidle' into linux-next
* pm-sleep: PM: hibernate: Remove blk_status_to_errno in hib_wait_io PM: sleep: Do not assume that "mem" is always present * pm-pci: PCI: PM: Simplify acpi_pci_power_manageable() PCI: PM: Drop struct pci_platform_pm_ops PCI: ACPI: PM: Do not use pci_platform_pm_ops for ACPI PCI: PM: Do not use pci_platform_pm_ops for Intel MID PM * pm-cpuidle: cpuidle: Fix kobject memory leaks in error paths intel_idle: enable interrupts before C1 on Xeons
-
Merge branches 'acpi-pci', 'acpi-pnp', 'acpi-docs', 'acpi-misc' and '…
…acpi-processor' into linux-next * acpi-pci: ACPI: glue: Look for ACPI bus type only if ACPI companion is not known ACPI: glue: Drop cleanup callback from struct acpi_bus_type PCI: ACPI: Drop acpi_pci_bus * acpi-pnp: ACPI: PNP: remove duplicated BRI0A49 and BDP3336 entries * acpi-docs: Documentation: ACPI: Fix spelling mistake "Millenium" -> "Millennium" * acpi-misc: ACPI: Kconfig: Fix a typo in Kconfig * acpi-processor: ACPI: processor idle: Allow playing dead in C3 state
-
Merge branches 'acpi-x86' and 'acpi-resources' into linux-next
* acpi-x86: x86: ACPI: cstate: Optimize C3 entry on AMD CPUs x86/ACPI: Don't add CPUs that are not online capable ACPICA: Add support for MADT online enabled bit * acpi-resources: ACPI: resources: Add DMI-based legacy IRQ override quirk
-
PCI: PM: Simplify acpi_pci_power_manageable()
Make acpi_pci_power_manageable() more straightforward. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ferry Toth <fntoth@gmail.com>
-
PCI: PM: Drop struct pci_platform_pm_ops
After previous changes there are no more users of struct pci_platform_pm_ops in the tree, so drop it along with all of the remaining related code. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ferry Toth <fntoth@gmail.com>
Commits on Oct 3, 2021
-
-
elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
In commit b212921 ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") we still leave MAP_FIXED_NOREPLACE in place for load_elf_interp. Unfortunately, this will cause kernel to fail to start with: 1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already Failed to execute /init (error -17) The reason is that the elf interpreter (ld.so) has overlapping segments. readelf -l ld-2.31.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000002c94c 0x000000000002c94c R E 0x10000 LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0 0x00000000000021e8 0x0000000000002320 RW 0x10000 LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00 0x00000000000011ac 0x0000000000001328 RW 0x10000 The reason for this problem is the same as described in commit ad55eac ("elf: enforce MAP_FIXED on overlaying elf segments"). Not only executable binaries, elf interpreters (e.g. ld.so) can have overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go back to MAP_FIXED in load_elf_interp. Fixes: 4ed2863 ("fs, elf: drop MAP_FIXED usage from elf_map") Cc: <stable@vger.kernel.org> # v4.19 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/lin…
…ux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a number of ext4 bugs in fast_commit, inline data, and delayed allocation. Also fix error handling code paths in ext4_dx_readdir() and ext4_fill_super(). Finally, avoid a grabbing a journal head in the delayed allocation write in the common cases where we are overwriting a pre-existing block or appending to an inode" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: recheck buffer uptodate bit under buffer lock ext4: fix potential infinite loop in ext4_dx_readdir() ext4: flush s_error_work before journal destroy in ext4_fill_super ext4: fix loff_t overflow in ext4_max_bitmap_size() ext4: fix reserved space counter leakage ext4: limit the number of blocks in one ADD_RANGE TLV ext4: enforce buffer head state assertion in ext4_da_map_blocks ext4: remove extent cache entries when truncating inline data ext4: drop unnecessary journal handle in delalloc write ext4: factor out write end code of inline file ext4: correct the error path of ext4_write_inline_data_end() ext4: check and update i_disksize properly ext4: add error checking to ext4_ext_replay_set_iblocks()