Skip to content

Commit 398edaa

Browse files
mrutland-armwilldeacon
authored andcommitted
arm64/fpsimd: Do not discard modified SVE state
Historically SVE state was discarded deterministically early in the syscall entry path, before ptrace is notified of syscall entry. This permitted ptrace to modify SVE state before and after the "real" syscall logic was executed, with the modified state being retained. This behaviour was changed by commit: 8c845e2 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") That commit was intended to speed up workloads that used SVE by opportunistically leaving SVE enabled when returning from a syscall. The syscall entry logic was modified to truncate the SVE state without disabling userspace access to SVE, and fpsimd_save_user_state() was modified to discard userspace SVE state whenever in_syscall(current_pt_regs()) is true, i.e. when current_pt_regs()->syscallno != NO_SYSCALL. Leaving SVE enabled opportunistically resulted in a couple of changes to userspace visible behaviour which weren't described at the time, but are logical consequences of opportunistically leaving SVE enabled: * Signal handlers can observe the type of saved state in the signal's sve_context record. When the kernel only tracks FPSIMD state, the 'vq' field is 0 and there is no space allocated for register contents. When the kernel tracks SVE state, the 'vq' field is non-zero and the register contents are saved into the record. As a result of the above commit, 'vq' (and the presence of SVE register state) is non-deterministically zero or non-zero for a period of time after a syscall. The effective register state is still deterministic. Hopefully no-one relies on this being deterministic. In general, handlers for asynchronous events cannot expect a deterministic state. * Similarly to signal handlers, ptrace requests can observe the type of saved state in the NT_ARM_SVE and NT_ARM_SSVE regsets, as this is exposed in the header flags. As a result of the above commit, this is now in a non-deterministic state after a syscall. The effective register state is still deterministic. Hopefully no-one relies on this being deterministic. In general, debuggers would have to handle this changing at arbitrary points during program flow. Discarding the SVE state within fpsimd_save_user_state() resulted in other changes to userspace visible behaviour which are not desirable: * A ptrace tracer can modify (or create) a tracee's SVE state at syscall entry or syscall exit. As a result of the above commit, the tracee's SVE state can be discarded non-deterministically after modification, rather than being retained as it previously was. Note that for co-operative tracer/tracee pairs, the tracer may (re)initialise the tracee's state arbitrarily after the tracee sends itself an initial SIGSTOP via a syscall, so this affects realistic design patterns. * The current_pt_regs()->syscallno field can be modified via ptrace, and can be altered even when the tracee is not really in a syscall, causing non-deterministic discarding to occur in situations where this was not previously possible. Further, using current_pt_regs()->syscallno in this way is unsound: * There are data races between readers and writers of the current_pt_regs()->syscallno field. The current_pt_regs()->syscallno field is written in interruptible task context using plain C accesses, and is read in irq/softirq context using plain C accesses. These accesses are subject to data races, with the usual concerns with tearing, etc. * Writes to current_pt_regs()->syscallno are subject to compiler reordering. As current_pt_regs()->syscallno is written with plain C accesses, the compiler is free to move those writes arbitrarily relative to anything which doesn't access the same memory location. In theory this could break signal return, where prior to restoring the SVE state, restore_sigframe() calls forget_syscall(). If the write were hoisted after restore of some SVE state, that state could be discarded unexpectedly. In practice that reordering cannot happen in the absence of LTO (as cross compilation-unit function calls happen prevent this reordering), and that reordering appears to be unlikely in the presence of LTO. Additionally, since commit: f130ac0 ("arm64: syscall: unmask DAIF earlier for SVCs") ... DAIF is unmasked before el0_svc_common() sets regs->syscallno to the real syscall number. Consequently state may be saved in SVE format prior to this point. Considering all of the above, current_pt_regs()->syscallno should not be used to infer whether the SVE state can be discarded. Luckily we can instead use cpu_fp_state::to_save to track when it is safe to discard the SVE state: * At syscall entry, after the live SVE register state is truncated, set cpu_fp_state::to_save to FP_STATE_FPSIMD to indicate that only the FPSIMD portion is live and needs to be saved. * At syscall exit, once the task's state is guaranteed to be live, set cpu_fp_state::to_save to FP_STATE_CURRENT to indicate that TIF_SVE must be considered to determine which state needs to be saved. * Whenever state is modified, it must be saved+flushed prior to manipulation. The state will be truncated if necessary when it is saved, and reloading the state will set fp_state::to_save to FP_STATE_CURRENT, preventing subsequent discarding. This permits SVE state to be discarded *only* when it is known to have been truncated (and the non-FPSIMD portions must be zero), and ensures that SVE state is retained after it is explicitly modified. For backporting, note that this fix depends on the following commits: * b248280 ("arm64/sme: Optimise SME exit on syscall entry") * f130ac0 ("arm64: syscall: unmask DAIF earlier for SVCs") * 929fa99 ("arm64/fpsimd: signal: Always save+flush state early") Fixes: 8c845e2 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: f130ac0 ("arm64: syscall: unmask DAIF earlier for SVCs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
1 parent f699c66 commit 398edaa

File tree

3 files changed

+47
-17
lines changed

3 files changed

+47
-17
lines changed

arch/arm64/include/asm/fpsimd.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
#define __ASM_FP_H
77

88
#include <asm/errno.h>
9+
#include <asm/percpu.h>
910
#include <asm/ptrace.h>
1011
#include <asm/processor.h>
1112
#include <asm/sigcontext.h>
@@ -92,6 +93,8 @@ struct cpu_fp_state {
9293
enum fp_type to_save;
9394
};
9495

96+
DECLARE_PER_CPU(struct cpu_fp_state, fpsimd_last_state);
97+
9598
extern void fpsimd_bind_state_to_cpu(struct cpu_fp_state *fp_state);
9699

97100
extern void fpsimd_flush_task_state(struct task_struct *target);

arch/arm64/kernel/entry-common.c

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -393,20 +393,16 @@ static bool cortex_a76_erratum_1463225_debug_handler(struct pt_regs *regs)
393393
* As per the ABI exit SME streaming mode and clear the SVE state not
394394
* shared with FPSIMD on syscall entry.
395395
*/
396-
static inline void fp_user_discard(void)
396+
static inline void fpsimd_syscall_enter(void)
397397
{
398-
/*
399-
* If SME is active then exit streaming mode. If ZA is active
400-
* then flush the SVE registers but leave userspace access to
401-
* both SVE and SME enabled, otherwise disable SME for the
402-
* task and fall through to disabling SVE too. This means
403-
* that after a syscall we never have any streaming mode
404-
* register state to track, if this changes the KVM code will
405-
* need updating.
406-
*/
398+
/* Ensure PSTATE.SM is clear, but leave PSTATE.ZA as-is. */
407399
if (system_supports_sme())
408400
sme_smstop_sm();
409401

402+
/*
403+
* The CPU is not in streaming mode. If non-streaming SVE is not
404+
* supported, there is no SVE state that needs to be discarded.
405+
*/
410406
if (!system_supports_sve())
411407
return;
412408

@@ -416,6 +412,33 @@ static inline void fp_user_discard(void)
416412
sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;
417413
sve_flush_live(true, sve_vq_minus_one);
418414
}
415+
416+
/*
417+
* Any live non-FPSIMD SVE state has been zeroed. Allow
418+
* fpsimd_save_user_state() to lazily discard SVE state until either
419+
* the live state is unbound or fpsimd_syscall_exit() is called.
420+
*/
421+
__this_cpu_write(fpsimd_last_state.to_save, FP_STATE_FPSIMD);
422+
}
423+
424+
static __always_inline void fpsimd_syscall_exit(void)
425+
{
426+
if (!system_supports_sve())
427+
return;
428+
429+
/*
430+
* The current task's user FPSIMD/SVE/SME state is now bound to this
431+
* CPU. The fpsimd_last_state.to_save value is either:
432+
*
433+
* - FP_STATE_FPSIMD, if the state has not been reloaded on this CPU
434+
* since fpsimd_syscall_enter().
435+
*
436+
* - FP_STATE_CURRENT, if the state has been reloaded on this CPU at
437+
* any point.
438+
*
439+
* Reset this to FP_STATE_CURRENT to stop lazy discarding.
440+
*/
441+
__this_cpu_write(fpsimd_last_state.to_save, FP_STATE_CURRENT);
419442
}
420443

421444
UNHANDLED(el1t, 64, sync)
@@ -739,10 +762,11 @@ static void noinstr el0_svc(struct pt_regs *regs)
739762
{
740763
enter_from_user_mode(regs);
741764
cortex_a76_erratum_1463225_svc_handler();
742-
fp_user_discard();
765+
fpsimd_syscall_enter();
743766
local_daif_restore(DAIF_PROCCTX);
744767
do_el0_svc(regs);
745768
exit_to_user_mode(regs);
769+
fpsimd_syscall_exit();
746770
}
747771

748772
static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr)

arch/arm64/kernel/fpsimd.c

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@
119119
* whatever is in the FPSIMD registers is not saved to memory, but discarded.
120120
*/
121121

122-
static DEFINE_PER_CPU(struct cpu_fp_state, fpsimd_last_state);
122+
DEFINE_PER_CPU(struct cpu_fp_state, fpsimd_last_state);
123123

124124
__ro_after_init struct vl_info vl_info[ARM64_VEC_MAX] = {
125125
#ifdef CONFIG_ARM64_SVE
@@ -451,12 +451,15 @@ static void fpsimd_save_user_state(void)
451451
*(last->fpmr) = read_sysreg_s(SYS_FPMR);
452452

453453
/*
454-
* If a task is in a syscall the ABI allows us to only
455-
* preserve the state shared with FPSIMD so don't bother
456-
* saving the full SVE state in that case.
454+
* Save SVE state if it is live.
455+
*
456+
* The syscall ABI discards live SVE state at syscall entry. When
457+
* entering a syscall, fpsimd_syscall_enter() sets to_save to
458+
* FP_STATE_FPSIMD to allow the SVE state to be lazily discarded until
459+
* either new SVE state is loaded+bound or fpsimd_syscall_exit() is
460+
* called prior to a return to userspace.
457461
*/
458-
if ((last->to_save == FP_STATE_CURRENT && test_thread_flag(TIF_SVE) &&
459-
!in_syscall(current_pt_regs())) ||
462+
if ((last->to_save == FP_STATE_CURRENT && test_thread_flag(TIF_SVE)) ||
460463
last->to_save == FP_STATE_SVE) {
461464
save_sve_regs = true;
462465
save_ffr = true;

0 commit comments

Comments
 (0)