Anshuman-Khand…
Commits on Jan 24, 2022
-
perf: Capture branch privilege information
Platforms like arm64 could capture privilege level information for all the branch records. Hence this adds a new element in the struct branch_entry to record the privilege level information, which could be requested through a new event.attr.branch_sample_type flag PERF_SAMPLE_BRANCH_PRIV_SAVE. While here, update the BRBE driver as required. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
perf: Expand perf_branch_entry.type
Current perf_branch_entry.type is a 4 bits field just enough to accommodate 16 generic branch types. This is insufficient to accommodate platforms like arm64 which has much more branch types. Lets just expands this field into a 6 bits one, which can now hold 64 generic branch types. This also adds more generic branch types and updates the BRBE driver as required. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
perf: Add more generic branch types
This expands generic branch type classification by adding some more entries , that can still be represented with the existing 4 bit 'type' field. While here this also updates the x86 implementation with these new branch types. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
arm64/perf: Enable branch stack sampling
Now that all the required pieces are already in place, just enable the perf branch stack sampling support on arm64 platform, by removing the gate which blocks it in armpmu_event_init(). Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
This adds a BRBE driver which implements all the required helper functions for struct arm_pmu. Following functions are defined by this driver which will configure, enable, capture, reset and disable BRBE buffer HW as and when requested via perf branch stack sampling framework. - arm64_pmu_brbe_filter() - arm64_pmu_brbe_enable() - arm64_pmu_brbe_disable() - arm64_pmu_brbe_read() - arm64_pmu_brbe_probe() - arm64_pmu_brbe_reset() - arm64_pmu_brbe_supported() Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
arm64/perf: Drive BRBE from perf event states
Branch stack sampling rides along the normal perf event and all the branch records get captured during the PMU interrupt. This just changes perf event handling on the arm64 platform to accommodate required BRBE operations that will enable branch stack sampling support. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
arm64/perf: Detect support for BRBE
CPU specific BRBE entries, cycle count, format support gets detected during PMU init. This information gets saved in per-cpu struct pmu_hw_events which later helps in operating BRBE during a perf event context. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
arm64/perf: Update struct pmu_hw_events for BRBE
A single perf event instance BRBE related contexts and data will be tracked in struct pmu_hw_events. Hence update the structure to accommodate required details related to BRBE. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
arm64/perf: Update struct arm_pmu for BRBE
This updates struct arm_pmu to include all required helpers that will drive BRBE functionality for a given PMU implementation. These are the following. - brbe_filter : Convert perf event filters into BRBE HW filters - brbe_probe : Probe BRBE HW and capture its attributes - brbe_enable : Enable BRBE HW with a given config - brbe_disable : Disable BRBE HW - brbe_read : Read BRBE buffer for captured branch records - brbe_reset : Reset BRBE buffer - brbe_supported: Whether BRBE is supported or not A BRBE driver implementation needs to provide these functionalities. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
arm64/perf: Add register definitions for BRBE
This adds BRBE related register definitions and various other related field macros there in. These will be used subsequently in a BRBE driver which is being added later on. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
-
perf: Consolidate branch sample filter helpers
Besides the branch type filtering requests, 'event.attr.branch_sample_type' also contains various flags indicating which additional information should be captured, along with the base branch record. These flags help configure the underlying hardware, and capture the branch records appropriately when required e.g after PMU interrupt. But first, this moves an existing helper perf_sample_save_hw_index() into the header before adding some more helpers for other branch sample filter flags. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Commits on Jan 20, 2022
-
arm64: mm: apply __ro_after_init to memory_limit
This variable is only set during initialization, so mark with __ro_after_init. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20211215064559.2843555-1-peng.fan@oss.nxp.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64: atomics: lse: Dereference matching size
When building with -Warray-bounds, the following warning is generated: In file included from ./arch/arm64/include/asm/lse.h:16, from ./arch/arm64/include/asm/cmpxchg.h:14, from ./arch/arm64/include/asm/atomic.h:16, from ./include/linux/atomic.h:7, from ./include/asm-generic/bitops/atomic.h:5, from ./arch/arm64/include/asm/bitops.h:25, from ./include/linux/bitops.h:33, from ./include/linux/kernel.h:22, from kernel/printk/printk.c:22: ./arch/arm64/include/asm/atomic_lse.h:247:9: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'atomic_t[1]' [-Warray-bounds] 247 | asm volatile( \ | ^~~ ./arch/arm64/include/asm/atomic_lse.h:266:1: note: in expansion of macro '__CMPXCHG_CASE' 266 | __CMPXCHG_CASE(w, , acq_, 32, a, "memory") | ^~~~~~~~~~~~~~ kernel/printk/printk.c:3606:17: note: while referencing 'printk_cpulock_owner' 3606 | static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1); | ^~~~~~~~~~~~~~~~~~~~ This is due to the compiler seeing an unsigned long * cast against something (atomic_t) that is int sized. Replace the cast with the matching size cast. This results in no change in binary output. Note that __ll_sc__cmpxchg_case_##name##sz already uses the same constraint: [v] "+Q" (*(u##sz *)ptr Which is why only the LSE form needs updating and not the LL/SC form, so this change is unlikely to be problematic. Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220112202259.3950286-1-keescook@chromium.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> -
asm-generic: Add missing brackets for io_stop_wc macro
After using io_stop_wc(), drivers reports following compile error when compiled on X86. drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: In function ‘hns3_tx_push_bd’: drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:2058:12: error: expected ‘;’ before ‘(’ token io_stop_wc(); ^ It is because I missed to add the brackets after io_stop_wc macro. So let's add the missing brackets. Fixes: d5624bb ("asm-generic: introduce io_stop_wc() and add implementation for ARM64") Reported-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20220114105857.126300-1-wangxiongfeng2@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commits on Jan 5, 2022
-
Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/s…
…tacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (32 commits) arm64: perf: Don't register user access sysctl handler multiple times drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check perf/smmuv3: Fix unused variable warning when CONFIG_OF=n arm64: perf: Support new DT compatibles arm64: perf: Simplify registration boilerplate arm64: perf: Support Denver and Carmel PMUs drivers/perf: hisi: Add driver for HiSilicon PCIe PMU docs: perf: Add description for HiSilicon PCIe PMU driver dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings drivers: perf: Add LLC-TAD perf counter support perf/smmuv3: Synthesize IIDR from CoreSight ID registers perf/smmuv3: Add devicetree support dt-bindings: Add Arm SMMUv3 PMCG binding perf/arm-cmn: Add debugfs topology info perf/arm-cmn: Add CI-700 Support dt-bindings: perf: arm-cmn: Add CI-700 perf/arm-cmn: Support new IP features perf/arm-cmn: Demarcate CMN-600 specifics perf/arm-cmn: Move group validation data off-stack perf/arm-cmn: Optimise DTC counter accesses ... * for-next/misc: : Miscellaneous patches arm64: Use correct method to calculate nomap region boundaries arm64: Drop outdated links in comments arm64: errata: Fix exec handling in erratum 1418040 workaround arm64: Unhash early pointer print plus improve comment asm-generic: introduce io_stop_wc() and add implementation for ARM64 arm64: remove __dma_*_area() aliases docs/arm64: delete a space from tagged-address-abi arm64/fp: Add comments documenting the usage of state restore functions arm64: mm: Use asid feature macro for cheanup arm64: mm: Rename asid2idx() to ctxid2asid() arm64: kexec: reduce calls to page_address() arm64: extable: remove unused ex_handler_t definition arm64: entry: Use SDEI event constants arm64: Simplify checking for populated DT arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c * for-next/cache-ops-dzp: : Avoid DC instructions when DCZID_EL0.DZP == 1 arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1 arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1 * for-next/stacktrace: : Unify the arm64 unwind code arm64: Make some stacktrace functions private arm64: Make dump_backtrace() use arch_stack_walk() arm64: Make profile_pc() use arch_stack_walk() arm64: Make return_address() use arch_stack_walk() arm64: Make __get_wchan() use arch_stack_walk() arm64: Make perf_callchain_kernel() use arch_stack_walk() arm64: Mark __switch_to() as __sched arm64: Add comment for stack_info::kr_cur arch: Make ARCH_STACKWALK independent of STACKTRACE * for-next/xor-neon: : Use SHA3 instructions to speed up XOR arm64/xor: use EOR3 instructions when available * for-next/kasan: : Log potential KASAN shadow aliases arm64: mm: log potential KASAN shadow alias arm64: mm: use die_kernel_fault() in do_mem_abort() * for-next/armv8_7-fp: : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES arm64: cpufeature: add HWCAP for FEAT_RPRES arm64: add ID_AA64ISAR2_EL1 sys register arm64: cpufeature: add HWCAP for FEAT_AFP * for-next/atomics: : arm64 atomics clean-ups and codegen improvements arm64: atomics: lse: define RETURN ops in terms of FETCH ops arm64: atomics: lse: improve constraints for simple ops arm64: atomics: lse: define ANDs in terms of ANDNOTs arm64: atomics lse: define SUBs in terms of ADDs arm64: atomics: format whitespace consistently * for-next/bti: : BTI clean-ups arm64: Ensure that the 'bti' macro is defined where linkage.h is included arm64: Use BTI C directly and unconditionally arm64: Unconditionally override SYM_FUNC macros arm64: Add macro version of the BTI instruction arm64: ftrace: add missing BTIs arm64: kexec: use __pa_symbol(empty_zero_page) arm64: update PAC description for kernel * for-next/sve: : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions arm64/sve: Minor clarification of ABI documentation arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Make sysctl interface for SVE reusable by SME * for-next/kselftest: : arm64 kselftest additions kselftest/arm64: Add pidbench for floating point syscall cases kselftest/arm64: Add a test program to exercise the syscall ABI kselftest/arm64: Allow signal tests to trigger from a function kselftest/arm64: Parameterise ptrace vector length information * for-next/kcsan: : Enable KCSAN for arm64 arm64: Enable KCSAN -
arm64: Use correct method to calculate nomap region boundaries
Nomap regions are treated as "reserved". When region boundaries are not page aligned, we usually increase the "reserved" regions rather than decrease them. So, we should use memblock_region_reserved_base_pfn()/ memblock_region_reserved_end_pfn() instead of memblock_region_memory_ base_pfn()/memblock_region_memory_base_pfn() to calculate boundaries. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/r/20211022070646.41923-1-chenhuacai@loongson.cn Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64: Drop outdated links in comments
As started by commit 05a5f51 ("Documentation: Replace lkml.org links with lore"), an effort was made to replace lkml.org links with lore to better use a single source that's more likely to stay available long-term. However, it seems these links don't offer much value here, so just remove them entirely. Cc: Joe Perches <joe@perches.com> Suggested-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/20210211100213.GA29813@willie-the-truck/ Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20211215191835.1420010-1-keescook@chromium.org [catalin.marinas@arm.com: removed the arch/arm changes] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commits on Jan 4, 2022
-
arm64: perf: Don't register user access sysctl handler multiple times
Commit e201260 ("arm64: perf: Add userspace counter access disable switch") introduced a new 'perf_user_access' sysctl file to enable and disable direct userspace access to the PMU counters. Sadly, Geert reports that on his big.LITTLE SoC ('Renesas Salvator-XS w/ R-Car H3'), the file is created for each PMU type probed, resulting in a splat during boot: | hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available | sysctl duplicate entry: /kernel//perf_user_access | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3-arm64-renesas-00003-ge2012600810c #1420 | Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT) | Call trace: | dump_backtrace+0x0/0x190 | show_stack+0x14/0x20 | dump_stack_lvl+0x88/0xb0 | dump_stack+0x14/0x2c | __register_sysctl_table+0x384/0x818 | register_sysctl+0x20/0x28 | armv8_pmu_init.constprop.0+0x118/0x150 | armv8_a57_pmu_init+0x1c/0x28 | arm_pmu_device_probe+0x1b4/0x558 | armv8_pmu_device_probe+0x18/0x20 | platform_probe+0x64/0xd0 | hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available Introduce a state variable to track creation of the sysctl file and ensure that it is only created once. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: e201260 ("arm64: perf: Add userspace counter access disable switch") Link: https://lore.kernel.org/r/CAMuHMdVcDxR9sGzc5pcnORiotonERBgc6dsXZXMd6wTvLGA9iw@mail.gmail.com Signed-off-by: Will Deacon <will@kernel.org>
-
drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
The devm_ioremap() function does not return error pointers. It returns NULL. Fixes: 036a758 ("drivers: perf: Add LLC-TAD perf counter support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211217145907.GA16611@kili Signed-off-by: Will Deacon <will@kernel.org>
-
perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
The kbuild robot reports that building the SMMUv3 PMU driver with CONFIG_OF=n results in a warning for W=1 builds: >> drivers/perf/arm_smmuv3_pmu.c:889:34: warning: unused variable 'smmu_pmu_of_match' [-Wunused-const-variable] static const struct of_device_id smmu_pmu_of_match[] = { ^ Guard the match table with #ifdef CONFIG_OF. Link: https://lore.kernel.org/r/202201041700.01KZEzhb-lkp@intel.com Fixes: 3f7be43 ("perf/smmuv3: Add devicetree support") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Will Deacon <will@kernel.org>
Commits on Dec 22, 2021
-
arm64: errata: Fix exec handling in erratum 1418040 workaround
The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0 when executing compat threads. The workaround is applied when switching between tasks, but the need for the workaround could also change at an exec(), when a non-compat task execs a compat binary or vice versa. Apply the workaround in arch_setup_new_exec(). This leaves a small window of time between SET_PERSONALITY and arch_setup_new_exec where preemption could occur and confuse the old workaround logic that compares TIF_32BIT between prev and next. Instead, we can just read cntkctl to make sure it's in the state that the next task needs. I measured cntkctl read time to be about the same as a mov from a general-purpose register on N1. Update the workaround logic to examine the current value of cntkctl instead of the previous task's compat state. Fixes: d49f7d7 ("arm64: Move handling of erratum 1418040 into C code") Cc: <stable@vger.kernel.org> # 5.9.x Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211220234114.3926-1-scott@os.amperecomputing.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64: Unhash early pointer print plus improve comment
When facing a really early issue on DT parsing we have currently a message that shows both the physical and virtual address of the FDT. The printk pointer modifier for the virtual address shows a hashed address there unless the user provides "no_hash_pointers" parameter in the command-line. The situation in which this message shows-up is a bit more serious though: the boot process is broken, nothing can be done (even an oops is too much for this early stage) so we have this message as a last resort in order to help debug bootloader issues, for example. Hence, we hereby change that to "%px" in order to make debugging easy, there's not much information leak risk in such early boot failure. Also, we tried to improve a bit the commenting on that function, given that if kernel fails there, it just hangs forever in a cpu_relax() loop. The reason we cannot BUG/panic is that is too early to do so; thanks to Mark Brown for pointing that on IRC and thanks Robin Murphy for the good pointer hash discussion in the mailing-list. Cc: Mark Brown <broonie@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20211221155230.1532850-1-gpiccoli@igalia.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
asm-generic: introduce io_stop_wc() and add implementation for ARM64
For memory accesses with write-combining attributes (e.g. those returned by ioremap_wc()), the CPU may wait for prior accesses to be merged with subsequent ones. But in some situation, such wait is bad for the performance. We introduce io_stop_wc() to prevent the merging of write-combining memory accesses before this macro with those after it. We add implementation for ARM64 using DGH instruction and provide NOP implementation for other architectures. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Suggested-by: Will Deacon <will@kernel.org> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211221035556.60346-1-wangxiongfeng2@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commits on Dec 17, 2021
-
arm64: Ensure that the 'bti' macro is defined where linkage.h is incl…
…uded Not all .S files include asm/assembler.h, however the SYM_FUNC_* definitions invoke the 'bti' macro. Include asm/assembler.h in asm/linkage.h. Fixes: 9be34be ("arm64: Add macro version of the BTI instruction") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commits on Dec 15, 2021
-
arm64: remove __dma_*_area() aliases
The __dma_inv_area() and __dma_clean_area() aliases make cache.S harder to navigate, but don't gain us anything in practice. For clarity, let's remove them along with their redundant comments. The only users are __dma_map_area() and __dma_unmap_area(), which need to be position independent, and can call __pi_dcache_inval_poc() and __pi_dcache_clean_poc() directly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20211206124715.4101571-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commits on Dec 14, 2021
-
docs/arm64: delete a space from tagged-address-abi
Since e71e2ac("userfaultfd: do not untag user pointers") which introduced a warning: linux/Documentation/arm64/tagged-address-abi.rst:52: WARNING: Unexpected indentation. Let's fix it. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20211209091922.560979-1-siyanteng@loongson.cn Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
This patch enables KCSAN for arm64, with updates to build rules to not use KCSAN for several incompatible compilation units. Recent GCC version(at least GCC10) made outline-atomics as the default option(unlike Clang), which will cause linker errors for kernel/kcsan/core.o. Disables the out-of-line atomics by no-outline-atomics to fix the linker errors. Meanwhile, as Mark said[1], some latent issues are needed to be fixed which isn't just a KCSAN problem, we make the KCSAN depends on EXPERT for now. Tested selftest and kcsan_test(built with GCC11 and Clang 13), and all passed. [1] https://lkml.kernel.org/r/YadiUPpJ0gADbiHQ@FVFF77S0Q05N Acked-by: Marco Elver <elver@google.com> # kernel/kcsan Tested-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20211211131734.126874-1-wangkefeng.wang@huawei.com [catalin.marinas@arm.com: added comment to justify EXPERT] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
kselftest/arm64: Add pidbench for floating point syscall cases
Since it's likely to be useful for performance work with SVE let's have a pidbench that gives us some numbers for consideration. In order to ensure that we test exactly the scenario we want this is written in assembly - if system libraries use SVE this would stop us exercising the case where the process has never used SVE. We exercise three cases: - Never having used SVE. - Having used SVE once. - Using SVE after each syscall. by spinning running getpid() for a fixed number of iterations with the time measured using CNTVCT_EL0 reported on the console. This is obviously a totally unrealistic benchmark which will show the extremes of any performance variation but equally given the potential gotchas with use of FP instructions by system libraries it's good to have some concrete code shared to make it easier to compare notes on results. Testing over multiple SVE vector lengths will need to be done with vlset currently, the test could be extended to iterate over all of them if desired. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211202165107.1075259-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64/fp: Add comments documenting the usage of state restore functions
Add comments to help people figure out when fpsimd_bind_state_to_cpu() and fpsimd_update_current_state() are used. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211207163250.1373542-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
kselftest/arm64: Add a test program to exercise the syscall ABI
Currently we don't have any coverage of the syscall ABI so let's add a very dumb test program which sets up register patterns, does a sysscall and then checks that the register state after the syscall matches what we expect. The program is written in an extremely simplistic fashion with the goal of making it easy to verify that it's doing what it thinks it's doing, it is not a model of how one should write actual code. Currently we validate the general purpose, FPSIMD and SVE registers. There are other thing things that could be covered like FPCR and flags registers, these can be covered incrementally - my main focus at the minute is covering the ABI for the SVE registers. The program repeats the tests for all possible SVE vector lengths in case some vector length specific optimisation causes issues, as well as testing FPSIMD only. It tries two syscalls, getpid() and sched_yield(), in an effort to cover both immediate return to userspace and scheduling another task though there are no guarantees which cases will be hit. A new test directory "abi" is added to hold the test, it doesn't seem to fit well into any of the existing directories. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211210184133.320748-7-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
kselftest/arm64: Allow signal tests to trigger from a function
Currently we have the facility to specify custom code to trigger a signal but none of the tests use it and for some reason the framework requires us to also specify a signal to send as a trigger in order to make use of a custom trigger. This doesn't seem to make much sense, instead allow the use of a custom trigger function without specifying a signal to inject. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211210184133.320748-6-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
kselftest/arm64: Parameterise ptrace vector length information
SME introduces a new mode called streaming mode in which the SVE registers have a different vector length. Since the ptrace interface for this is based on the existing SVE interface prepare for supporting this by moving the regset specific configuration into struct and passing that around, allowing these tests to be reused for streaming mode. As we will also have to verify the interoperation of the SVE and streaming SVE regsets don't just iterate over an array. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211210184133.320748-5-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64/sve: Minor clarification of ABI documentation
As suggested by Luis for the SME version of this explicitly say that the vector length should be extracted from the return value of a set vector length prctl() with a bitwise and rather than just any old and. Suggested-by: Luis Machado <Luis.Machado@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211210184133.320748-4-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64/sve: Generalise vector length configuration prctl() for SME
In preparation for adding SME support update the bulk of the implementation for the vector length configuration prctl() calls to be independent of vector type. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211210184133.320748-3-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
arm64/sve: Make sysctl interface for SVE reusable by SME
The vector length configuration for SME is very similar to that for SVE so in order to allow reuse refactor the SVE configuration so that it takes the vector type from the struct ctl_table. Since there's no dedicated space for this we repurpose the extra1 field to store the vector type, this is otherwise unused for integer sysctls. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211210184133.320748-2-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>