Skip to content

B6 opening: T-022 security review + ADR-0037 + T-023 (EL0 entry context)#37

Merged
cemililik merged 7 commits into
mainfrom
b6-el0-entry-context
May 31, 2026
Merged

B6 opening: T-022 security review + ADR-0037 + T-023 (EL0 entry context)#37
cemililik merged 7 commits into
mainfrom
b6-el0-entry-context

Conversation

@cemililik
Copy link
Copy Markdown
Collaborator

@cemililik cemililik commented May 31, 2026

Opens Milestone B6 (first userspace "hello") on the merged high-half regime. Lands the gating T-022 security review, the EL0 entry-context decision (ADR-0037), and its dormant implementation (T-023) — closing T-021 carry-forward gate #2. No runnable EL0 task yet; the mechanism is dormant (syscall_entry stays unreachable from EL0 while EPD0 = 1), so the QEMU trace is byte-stable.

Commits

9a51b73 T-022 high-half migration security review — Approve (8 axes, no live finding; discharges the "awaiting security review" flag)
00d342e / 546febc ADR-0037 — EL0 entry context (Propose / Accept, per write-adr §10)
bbc42e8 T-023ContextSwitch::init_user_context + the BSP enter_el0 ERET trampoline + Scheduler::add_user_task (+ review-round-1 findings)
771ebd1 T-023 review-round 2FPCR/FPSR/TPIDR scrub, offset_of! layout asserts, user_sp + AS-active guards
12095c6 T-023 review-round 3 — three doc-only fixes from a multi-agent adversarial review

Design (ADR-0037)

A userspace task reuses the cooperative context switch. init_user_context seeds the existing Aarch64TaskContext (x19=user_entry, x20=user_sp, lr=enter_el0, sp=kernel_stack_top) — no struct/layout change (D2), so context_switch_asm's offsets are untouched. The unchanged switch restores those and rets into enter_el0, which sets SP_EL0/ELR_EL1/SPSR_EL1 (EL0t + DAIF-masked), scrubs the EL0-visible register file (x0x30 + v0v31 + FPCR/FPSR/TPIDR_EL0/TPIDRRO_EL0), and ERETs. Gate #2 (SP_EL1) is closed by construction — the task enters EL0 from its own kernel context, so a later +0x400 trap lands on that kernel stack. New UNSAFE-2026-0032 (second-reviewer-flagged).

Security

The enter_el0 register scrub is load-bearing: the cooperative switch restores only callee-saved regs, so without it the first EL0 instruction would read kernel pointers (x0/x1/x8), a kernel code address (x30), and caller-saved SIMD — a disclosure orthogonal to ADR-0033's AP/UXN memory isolation. The scrub also zeroes FPCR/FPSR/TPIDR_EL0/TPIDRRO_EL0 (UNKNOWN-reset on real HW, e.g. Pi 4) for a defined EL0 environment.

Review

Three review passes total. A final multi-agent adversarial review (5 lenses × per-finding skeptic verification) found no code/security/correctness/asm defect — the scrub, the offset_of! asserts, the SPSR register form, the dormancy/gate-ordering, all four ContextSwitch implementors, and the host tests were verified correct. The only findings were three doc-only consistency fixes (already applied in 12095c6). T-023's own formal code security review (UNSAFE-2026-0032) remains a DoD item before a real EL0 task is wired in B6.

Gates

cargo fmt · host + kernel clippy -D warnings · 342 host tests (+2) · kernel build (the scrub asm + offset_of! asserts assemble) · QEMU smoke byte-stable + fault-clean (mechanism dormant) · cargo +nightly miri test --workspace --exclude tyrne-bsp-qemu-virt 0 UB.

Not in scope (next B6 threads)

task_create_from_image → the security-critical gate #1 (per-task user-VA→kernel-VA translation) + gate #3tyrne-user + userland/hello + build pipeline → wire-up + EL0 +0x400 smoke → closure (= Phase B retrospective).

🤖 Generated with Claude Code

Summary by Sourcery

Introduce a dormant EL0 entry mechanism that reuses the existing Aarch64 cooperative context switch, including a new enter_el0 trampoline and scheduler API for seeding userspace tasks, alongside updated safety assertions and documentation for ADR-0037 and the T-022 security review.

New Features:

  • Add ContextSwitch::init_user_context and Scheduler::add_user_task to support userspace (EL0) tasks seeded via the cooperative context switch.
  • Introduce the enter_el0 naked assembly trampoline to drop freshly created tasks from EL1 to EL0 with a scrubbed register file while preserving the high-half regime.

Enhancements:

  • Harden Aarch64TaskContext layout guarantees with offset assertions to protect context_switch_asm and the EL0 hand-off.
  • Extend test-hal fakes and scheduler tests to cover the new EL0 user-context path and ensure ready-queue integration.
  • Update HAL, scheduler, architecture, roadmap, ADR, and unsafe-log documentation to describe the EL0 entry context design, its security review, and milestone B6 status.

Documentation:

  • Add ADR-0037 describing the EL0 entry context and enter_el0 design, plus detailed security-review and task-analysis documents for T-022 and T-023.
  • Update roadmap and architecture docs to reflect B6 opening, the new EL0 entry mechanism, and its gating relationship to future userspace work.

Tests:

  • Add tests exercising ContextSwitch::init_user_context via the FakeContextSwitch and Scheduler::add_user_task, ensuring correct seeding of userspace contexts without executing EL0.

Summary by CodeRabbit

  • New Features

    • Added userspace task support with a one-shot EL0 entry trampoline and an API to initialize user contexts for safe first-entry.
    • Ensures EL0-visible registers and SIMD/FP state are scrubbed before handing control to user code.
  • Documentation

    • Added ADR, audit, roadmap, task, and architecture docs describing EL0 entry model and contracts.
  • Tests

    • Extended host test doubles and unit tests to cover userspace context initialization and scheduler integration.

cemililik and others added 6 commits May 31, 2026 11:07
Standalone, time-separated (2026-05-31) security pass over the T-022 /
ADR-0033 high-half kernel migration, discharging the "awaiting explicit
security review" flag current.md carried for UNSAFE-2026-0031.

Verdict: Approve — eight axes pass (cryptography N/A), no live finding,
no fix required. The migration is a structural kernel/user isolation
*strengthening*: the kernel becomes absent from the TTBR0_EL1 regime an
EL0 task runs on (removing the Meltdown substrate, replacing a standing
per-descriptor AP invariant with structural absence). UNSAFE-2026-0031 +
the 0022/0023/0024 T-022 Amendments are policy-conformant and
second-reviewer-signed; the migration is smoke-verified fault-clean. The
review-round-3 hardenings (fail-fast assert! range guards + regime-correct
panic-handler UART alias) are verified sound and cannot misfire in v1
(incl. the no-double-panic property on the panic path).

Forward-flagged (non-blocking, pre-B6): the three T-021 carry-forward
gates (gate #1 — per-task console_write window + per-page user-VA->kernel-VA
translation — is the single most important pre-B6 gate) and ADR-0034
(kernel-image section W^X; privileged-side gap, EL0-unreachable).

Refs: T-022, ADR-0033
Security-Review: @cemililik (+ Claude Opus 4.8 (1M context) agent)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sibling ADR to ADR-0033 (which scoped itself to the migration and named the
EL0 context a separate B6 task). Settles how an EL0 task's first entry is
expressed and how it integrates with the cooperative scheduler.

Decision (Option 1): reuse the existing Aarch64TaskContext + a one-shot
enter_el0 ERET trampoline; carry (user_entry, user_sp) in the x19/x20
callee-saved slots the cooperative switch already restores; lr = enter_el0,
sp = kernel stack. SP_EL1 (gate #2) is closed by construction — the task
enters EL0 from its own kernel context, so a later +0x400 trap lands on that
kernel stack. D2 settled: neither add EL0 fields to the kernel-object Task
NOR a separate TaskContext struct — the 168-byte Aarch64TaskContext layout
(and the security-critical context_switch_asm offsets) stay UNCHANGED, the
exact drift the T-022 security review flagged. v1 simplification:
SPSR_EL1 = 0x3C0 (EL0t, DAIF masked — cooperative, no preemption).

Opens T-023 (Draft) in the same commit per ADR-0025 §Rule 1. Introduces
UNSAFE-2026-0032 (the enter_el0 asm; security-sensitive → second-reviewer).

Refs: ADR-0037, T-023
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Careful re-read (write-adr §10) passed in a separate commit from Propose:
forward-references grounded (T-023 exists; downstream consumers
deliberately absent as consumers-not-dependencies, per ADR-0033's pattern),
dependency chain complete, negative consequences carry real mitigations,
the §Simulation table walks the EL0 first-entry + trap state machine.

Refs: ADR-0037
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…trampoline + per-task SP_EL1)

Implements ADR-0037 (the B6 EL0 entry-context decision) — the dormant
mechanism a userspace task needs to drop to EL0 on first dispatch. Closes
T-021 carry-forward gate #2 (SP_EL1). No runnable EL0 task yet (that is the
B6 wire-up, after gate #1 / gate #3); the mechanism is dormant and the QEMU
trace is byte-stable.

What lands:
- HAL: `ContextSwitch::init_user_context(ctx, user_entry, user_sp,
  kernel_stack_top)` — additive sibling of `init_context`; its `# Safety`
  states the EL0-trap kernel-stack size contract (must hold the ~272-byte
  trap frame + handler call tree per +0x400, stronger than init_context's).
- BSP: `QemuVirtCpu::init_user_context` seeds the *existing* Aarch64TaskContext
  (x19=user_entry, x20=user_sp, lr=enter_el0, sp=kernel_stack_top) — no
  struct/layout change, so context_switch_asm offsets are untouched (D2) —
  plus the `enter_el0` `#[unsafe(naked)]` ERET trampoline.
- Scheduler: `Scheduler::add_user_task` (mirrors add_task, calls
  init_user_context; debug_asserts kernel_stack_top non-null + 16-aligned —
  gate #2 belt-and-braces).
- test-hal: FakeContextSwitch::init_user_context + FakeTaskContext
  {is_user,user_sp}; host tests for the test-hal recorder and the scheduler
  route. All four ContextSwitch implementors updated (BSP + test-hal +
  scheduler's FakeCpu/ResetQueuesCpu).
- Audit: new UNSAFE-2026-0032 (the enter_el0 ERET asm; second-reviewer-flagged).

2026-05-31 review-round (1 HIGH + 3 Medium + 4 nits) folded in BEFORE this
commit:
- HIGH — enter_el0 now SCRUBS the register file (x0–x30 + v0–v31) before
  ERET. The cooperative switch restores only callee-saved regs, so without
  the scrub the first EL0 instruction would observe kernel pointers/state in
  x0/x1/x8 (context ptrs / kernel stack), x30 (kernel code addr), and the
  caller-saved SIMD — a disclosure orthogonal to ADR-0033's AP/UXN memory
  isolation. Recorded as a load-bearing invariant (UNSAFE-2026-0032) and in
  ADR-0037 §Revision notes (CLAUDE.md #1).
- SPSR_EL1 written via the register form (mov x8,#0x3c0; msr spsr_el1,x8) —
  the immediate MSR form is PSTATE-fields-only.
- gate #2 reconciled (by-construction + the debug_assert); EL0-trap
  stack-size contract documented; test-scope claim narrowed (the host test
  pins the scheduler route via the fake — the real QemuVirtCpu slot write +
  the scrub are audit + kernel-build verified, the BSP being bare-metal).

Docs synced: ADR-0037 §Revision notes (rider, append-only per ADR-0025 §Rule
2), T-023 (AC + review-history + row-to-verification mapping), hal.md +
scheduler.md (ContextSwitch surface gains init_user_context / add_user_task),
current.md (B6-opening banner + active-task/working-branch).

Gates green: cargo fmt; host + kernel clippy -D warnings; 342 host tests
(+2: test-hal init_user_context recorder, scheduler add_user_task route);
kernel build (the 67-instruction scrub asm assembles); QEMU smoke
byte-stable + fault-clean (mechanism dormant); cargo +nightly miri test
--workspace --exclude tyrne-bsp-qemu-virt clean (0 UB).

Security-relevant (the EL1→EL0 first-entry trust-boundary primitive); the
ADR + task design review-round is folded in, the formal code security pass
on UNSAFE-2026-0032 remains a DoD item before a real EL0 task is wired.

Refs: T-023, ADR-0037, UNSAFE-2026-0032
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sserts, user_sp + AS-active guards

Folds in the 2026-05-31 second review-round against the committed
implementation (verdict: Approve-with-nits). 2 Low + 4 nits, all verified
against the code and addressed.

- Low-1 — enter_el0's scrub omitted the FP-control/status + EL0 thread-ID
  register classes. It zeroed x0–x30 + v0–v31 but NOT FPCR/FPSR/TPIDR_EL0/
  TPIDRRO_EL0 — all EL0-readable, with architecturally-UNKNOWN reset values on
  real HW (Pi 4), so "clean, defined register file" was over-claimed off-QEMU
  (FPCR governs EL0 rounding/FZ/default-NaN; Linux zeroes FPCR/FPSR on exec).
  enter_el0 now also `msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr`. No current
  secret leak (kernel does no FP arithmetic, never writes TPIDR), but the EL0
  environment is now defined on real HW, not just QEMU's zeroed reset.
- Low-2 — the implicit x19/x20/lr/sp hand-off was pinned only by
  size_of == 168; a layout-preserving field reorder (e.g. fp/lr swap) would
  silently corrupt both context_switch_asm and the enter_el0 hand-off. Added
  offset_of! asserts for all five offsets (0/80/88/96/104) — hardens the
  pre-existing context_switch_asm coupling too.
- Nit — add_user_task now debug_asserts user_sp 16-byte alignment (it becomes
  SP_EL0; SCTLR_EL1.SA0 faults a misaligned EL0 stack), symmetric with the
  kernel-stack assert.
- Nit — the AS-must-be-active obligation (TTBR0 installed + EPD0 cleared at
  first dispatch) is now stated on the caller-facing # Safety of both
  init_user_context and add_user_task (it previously lived only on enter_el0's
  doc, which the future task_create_from_image author would not read).
- Nit (process honesty) — the review-round-1 rider shipped WITH bbc42e8, after
  ADR-0037's Accept (546febc): the careful-re-read missed the scrub HIGH;
  recorded in ADR-0037 §Revision notes per ADR-0025 §Rule 2.
- Nit — "67-instruction scrub" wording dropped (it is 63 register-zeros + 4
  special-register writes); docs reworded.

Docs synced: UNSAFE-2026-0032 (scrub scope), ADR-0037 §Revision notes (second
rider), T-023 (AC + review-history), hal.md, current.md.

Gates green: cargo fmt; host + kernel clippy -D warnings; 342 host tests;
kernel build (offset_of! asserts + the new MSRs assemble); QEMU smoke
byte-stable + fault-clean (mechanism still dormant); cargo +nightly miri test
--workspace --exclude tyrne-bsp-qemu-virt clean (0 UB).

Refs: T-023, ADR-0037, UNSAFE-2026-0032
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oc drift, T-023 status/ACs

A multi-agent adversarial review of the whole T-023 arc (b6549d7..HEAD; 5
lenses × per-finding skeptic verification) found NO code/security/correctness/
asm defect — the register scrub (incl FPCR/FPSR/TPIDR), the offset_of!
asserts, the SPSR register form, the dormancy + gate ordering, all four
ContextSwitch implementors, and the host tests were verified correct. The
only confirmed findings were three doc-only consistency defects, all fixed:

- The T-022 security review (2026-05-31) cited ADR-0033's Accept as `db392c1`,
  which does not exist (transposed digit) — corrected to `db892c1`.
- `FakeContextSwitch::init_count()`'s rustdoc said "init_context calls" but the
  shared counter is now also incremented by `init_user_context`; doc broadened
  to match (mirrors the already-updated `FakeTaskContext.initialized` field doc).
- T-023 was still `Status: Draft` with all nine ACs unchecked despite being
  implemented + gates-green; moved to "In Review (… awaiting explicit security
  review …)" per the T-022 house convention and checked the nine satisfied ACs.

Gates re-confirmed: cargo fmt, host clippy -D warnings, 342 host tests (kernel
binary unchanged → the prior byte-stable QEMU smoke + Miri-0-UB still hold).

Refs: T-023, ADR-0037
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 31, 2026

Reviewer's Guide

Implements the EL0 first-entry mechanism for B6 by extending the HAL context-switch trait with a userspace initializer, adding a one-shot AArch64 enter_el0 trampoline that drops a task from EL1 to EL0 while fully scrubbing the EL0-visible register file, and wiring scheduler and test infrastructure to seed and verify user tasks, all while keeping the existing Aarch64TaskContext layout and syscall path byte-stable (mechanism dormant).

Sequence diagram for EL0 first-entry via enter_el0 trampoline

sequenceDiagram
    participant Scheduler
    participant QemuVirtCpu
    participant context_switch_asm
    participant enter_el0
    participant EL0_task

    Scheduler->>QemuVirtCpu: init_user_context(ctx, user_entry, user_sp, kernel_stack_top)
    Note right of QemuVirtCpu: Seeds Aarch64TaskContext
    QemuVirtCpu-->>Scheduler: ctx.x19_x28[0..1], ctx.lr, ctx.sp set

    Scheduler->>QemuVirtCpu: context_switch(current_ctx, next_ctx)
    QemuVirtCpu->>context_switch_asm: context_switch_asm(current_ctx, next_ctx)
    context_switch_asm-->>enter_el0: ret (x19=user_entry, x20=user_sp, sp=kernel_stack_top)

    enter_el0->>enter_el0: msr sp_el0, x20
    enter_el0->>enter_el0: msr elr_el1, x19
    enter_el0->>enter_el0: msr spsr_el1, x8 (0x3c0)
    enter_el0->>enter_el0: [scrub x0-x30, v0-v31, FPCR/FPSR/TPIDR]

    enter_el0-->>EL0_task: eret to user_entry (SP=SP_EL0)
Loading

File-Level Changes

Change Details Files
Add EL0-aware context-switch API and BSP implementation without changing the existing task-context layout.
  • Extend the ContextSwitch HAL trait with an unsafe init_user_context method that seeds a context for an EL0 first entry, with detailed safety and stack-size contracts.
  • Implement QemuVirtCpu::init_user_context to reuse Aarch64TaskContext fields (x19/x20/lr/sp) for EL0 entry parameters while preserving the 168-byte layout.
  • Introduce enter_el0 as a naked AArch64 asm trampoline that installs SP_EL0/ELR_EL1/SPSR_EL1, scrubs GPRs/SIMD/FP/TLS registers, and ERETs into EL0 exactly once per task.
  • Harden Aarch64TaskContext/asm coupling by adding offset_of! asserts for all fields used by context_switch_asm and enter_el0.
hal/src/context_switch.rs
bsp-qemu-virt/src/cpu.rs
Enable the scheduler to register userspace (EL0) tasks and seed their contexts via the new HAL API, plus adapt tests and fakes.
  • Add Scheduler::add_user_task to register EL0 tasks, call init_user_context, and enqueue them while enforcing kernel/user stack alignment and address-space-activation preconditions.
  • Update the test HAL FakeContextSwitch and FakeCpu to implement and record init_user_context, including user vs kernel task markers and separate user stack pointers.
  • Add new scheduler and context-switch tests that assert EL0 contexts are seeded correctly, tasks are enqueued as Ready, and the new init path increments the shared init counter.
  • Extend mock ContextSwitch implementations used in other tests with no-op init_user_context stubs to keep trait bounds satisfied.
kernel/src/sched/mod.rs
test-hal/src/context_switch.rs
Document the EL0 entry context design, unsafe audit, and roadmap status, and link them into existing architecture docs.
  • Add ADR-0037 describing the EL0 entry-context design (reuse of Aarch64TaskContext, enter_el0 trampoline, SPSR settings, and security invariants) with detailed simulation and revision notes.
  • Record UNSAFE-2026-0032 in the unsafe log, documenting enter_el0’s behaviour, invariants (including register scrubbing), and review history.
  • Update roadmap/current, architecture (scheduler and HAL) docs, and decisions index to reflect B6 opening, T-022 security review approval, T-023 implementation, and the new EL0 entry mechanism.
  • Add a T-023 task-analysis document and a T-022 high-half migration security-review document that tie this work into the broader phase-B plan and gate structure.
docs/decisions/0037-el0-entry-context.md
docs/audits/unsafe-log.md
docs/roadmap/current.md
docs/architecture/scheduler.md
docs/architecture/hal.md
docs/decisions/README.md
docs/analysis/reviews/security-reviews/2026-05-31-T-022-high-half-migration.md
docs/analysis/tasks/phase-b/T-023-el0-entry-context.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c23ffadd-60c2-4d15-8fd1-f25297a31b82

📥 Commits

Reviewing files that changed from the base of the PR and between 12095c6 and 9b36967.

📒 Files selected for processing (1)
  • test-hal/src/context_switch.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • test-hal/src/context_switch.rs

📝 Walkthrough

Walkthrough

Implements EL0 first-entry via a new HAL method ContextSwitch::init_user_context, BSP enter_el0 ERET trampoline with EL0 register-file scrubbing and layout pinning, scheduler add_user_task to seed/enqueue userspace tasks, and test/docs/audit changes to validate and record the mechanism.

Changes

EL0 Entry Context Implementation

Layer / File(s) Summary
HAL trait and design decision
hal/src/context_switch.rs, docs/decisions/0037-el0-entry-context.md
ContextSwitch::init_user_context added; ADR-0037 documents encoding user entry VA/stack into restored callee-saved slots, one-shot enter_el0 trampoline semantics, and verification mapping.
BSP EL0 trampoline and context initialization
bsp-qemu-virt/src/cpu.rs
Compile-time assertions pin Aarch64TaskContext size/offsets. enter_el0 naked trampoline installs SP_EL0/ELR_EL1/SPSR_EL1, zeroes EL0-visible GPRs and SIMD/FP/TLS state, and ERETs. QemuVirtCpu::init_user_context seeds user entry/sp into callee-saved slots, sets lr to enter_el0, and sp to kernel-stack top.
Scheduler EL0 task registration
kernel/src/sched/mod.rs, docs/architecture/scheduler.md
Adds Scheduler::add_user_task unsafe API that calls cpu.init_user_context, asserts kernel-stack/user-sp invariants, enqueues the task, and updates scheduler state; docs updated to describe add_user_task and the expanded ContextSwitch responsibilities.
Test infrastructure updates
test-hal/src/context_switch.rs, kernel/src/sched/mod.rs (tests/doubles)
FakeTaskContext gains is_user and user_sp. FakeContextSwitch::init_user_context added; test doubles updated (FakeCpu, ResetQueuesCpu) and unit tests validate EL0-context seeding and ready enqueue behavior.

Documentation, Planning, and Audit

Layer / File(s) Summary
Task documentation and implementation status
docs/analysis/tasks/phase-b/T-023-el0-entry-context.md
T-023 documents EL0 entry-context as dormant, acceptance criteria (HAL, BSP trampoline, scheduler API, register scrubbing), host-test scope, and required audits.
Security audit and prior high-half migration review
docs/audits/unsafe-log.md, docs/analysis/reviews/security-reviews/2026-05-31-T-022-high-half-migration.md
UNSAFE-2026-0032 records the enter_el0 asm phases and invariants; T-022 security review recorded as prerequisite accepted.
Architecture docs and roadmap
docs/architecture/hal.md, docs/architecture/scheduler.md, docs/decisions/README.md, docs/roadmap/current.md
Adds init_user_context description, documents add_user_task, inserts ADR-0037 index entry, and updates roadmap to mark B6 opening and T-023 dormant implementation.

Sequence Diagram

sequenceDiagram
  participant Scheduler
  participant CPU as CPU/ContextSwitch
  participant TaskContext
  participant ContextSwitch as context_switch_asm
  participant Trampoline as enter_el0
  participant EL0 as EL0 User Task
  
  Scheduler->>CPU: init_user_context(entry, user_sp, kernel_stack_top)
  CPU->>TaskContext: seed x19=entry, x20=user_sp, lr=enter_el0, sp=kernel_stack_top
  Scheduler->>Scheduler: enqueue task ready
  Scheduler->>ContextSwitch: restore from TaskContext on dispatch
  ContextSwitch->>Trampoline: RET (lr=enter_el0)
  Trampoline->>Trampoline: MSR SP_EL0, user_sp (from x20)
  Trampoline->>Trampoline: MSR ELR_EL1, entry (from x19)
  Trampoline->>Trampoline: MSR SPSR_EL1, 0x3C0 (EL0t, DAIF masked)
  Trampoline->>Trampoline: zero x0-x30, v0-v31, FPCR, FPSR, TPIDR_EL0
  Trampoline->>EL0: ERET
  EL0->>EL0: execute user code
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • HodeTech/Tyrne#4: Baseline Aarch64TaskContext layout and context_switch_asm/QemuVirtCpu context initialization that this change extends for EL0 entry.
  • HodeTech/Tyrne#15: Prior layout assertions and context-switch invariants related to the same Aarch64TaskContext pinning relied upon here.

Poem

🐰 I tuck the registers neat and low,

then hop the kernel’s high-to-low,
With scrubbed-up regs and SPSR bright,
I ERET you to userspace light—
Safe hops, no leaks, sleep tight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changesets: B6 milestone opening with T-022 security review completion, ADR-0037 decision documentation, and T-023 EL0 entry context implementation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch b6-el0-entry-context

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the foundational EL0 (userspace) entry context mechanism, adding the init_user_context HAL trait method, the enter_el0 naked assembly trampoline, and the Scheduler::add_user_task method. A critical security issue was identified in the enter_el0 trampoline where MSR instructions use xzr to zero the FP and thread-ID registers. In A64, xzr is encoded as 31 which represents SP for MSR instructions, potentially leaking the kernel stack pointer to userspace. It is recommended to use a zeroed general-purpose register like x0 instead.

Comment thread bsp-qemu-virt/src/cpu.rs
Comment on lines +559 to +562
"msr fpcr, xzr",
"msr fpsr, xzr",
"msr tpidr_el0, xzr",
"msr tpidrro_el0, xzr",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

In the A64 instruction set, the MSR (register) instruction does not support the zero register (XZR) as a source operand. The Rt register field value of 31 represents the stack pointer (SP) for MSR and MRS instructions.\n\nDepending on the toolchain/assembler version, using xzr here will either:\n1. Cause a compilation/assembly failure (e.g., LLVM/Clang or GCC gas rejecting xzr as an invalid operand for MSR).\n2. Or, if accepted, assemble as msr <reg>, sp, which writes the active kernel stack pointer (SP_EL1) into fpcr, fpsr, tpidr_el0, and tpidrro_el0. Since tpidr_el0 and tpidrro_el0 are readable by EL0 (userspace), this creates a critical security vulnerability by leaking the kernel stack pointer directly to userspace, defeating stack protection and KASLR.\n\nTo ensure correctness, portability, and security, use a general-purpose register that has already been zeroed (such as x0, which is zeroed on line 491) instead of xzr.

Suggested change
"msr fpcr, xzr",
"msr fpsr, xzr",
"msr tpidr_el0, xzr",
"msr tpidrro_el0, xzr",
"msr fpcr, x0",\n "msr fpsr, x0",\n "msr tpidr_el0, x0",\n "msr tpidrro_el0, x0",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified against the build + the ISA, and respectfully this does not apply — no change made.

For MSR/MRS, Rt = 31 is XZR, not SP: MSR (register) reads its source through the X[t] (zero-register) accessor, so msr <sysreg>, xzr writes 0. The "Rt = 31 ⇒ SP" rule applies to load/store base registers and the SP-form of arithmetic — not to MSR/MRS.

Confirmed three independent ways:

  1. The assembler accepts it. cargo kernel-build assembles msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr with no error — so the "rejected operand" failure mode does not occur.
  2. It encodes as XZR, not SP. The kernel already ships msr ttbr0_el1, xzr (the TTBR0-free in kernel_main_high); rust-objdump disassembles it as msr TTBR0_EL1, xzr (d518201f, Rt = 31) — rendered xzr, not sp. The four FP/thread-ID writes here use the identical MSR-register source encoding.
  3. Behaviourally proven. That msr ttbr0_el1, xzr is smoke-clean: TTBR0_EL1 becomes 0 (correctly freeing the low half). If Rt = 31 meant SP, TTBR0 would hold a stack pointer and the kernel would translation-fault on the next low-half access — it does not (-d int,unimp shows zero Translation/Permission faults).

So these instructions zero FPCR/FPSR/TPIDR_EL0/TPIDRRO_EL0 (the intended register scrub) with no SP leak. Rationale also recorded in commit 9b36967.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The enter_el0 naked_asm block manually zeros all GPRs and SIMD registers one by one; consider factoring this into a small local macro or .irp-style pattern so future changes don’t accidentally miss a register or introduce inconsistencies while still keeping the scrub logic explicit and auditable.
  • ContextSwitch::init_user_context is a new required unsafe method on the trait; you might want to provide a default implementation (e.g. unimplemented!() or delegating to init_context) so new BSPs or test HALs fail loudly rather than silently omitting EL0 bring-up when they add a ContextSwitch impl and forget to wire this method.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `enter_el0` naked_asm block manually zeros all GPRs and SIMD registers one by one; consider factoring this into a small local macro or `.irp`-style pattern so future changes don’t accidentally miss a register or introduce inconsistencies while still keeping the scrub logic explicit and auditable.
- `ContextSwitch::init_user_context` is a new required unsafe method on the trait; you might want to provide a default implementation (e.g. `unimplemented!()` or delegating to `init_context`) so new BSPs or test HALs fail loudly rather than silently omitting EL0 bring-up when they add a ContextSwitch impl and forget to wire this method.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test-hal/src/context_switch.rs`:
- Around line 189-202: When re-seeding a reused TaskContext via the kernel path,
clear any leftover user-path markers so stale values from init_user_context
don't persist: update init_context to explicitly set ctx.is_user = false and
ctx.user_sp = 0 (and if applicable clear any user-specific entry_addr/user-only
fields) while still setting ctx.initialized, ctx.entry_addr/stack_top for the
kernel, and preserving the locked().init_count increment; this ensures
init_context fully overwrites any prior user state left by init_user_context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6060822-2713-4e9b-9a85-9b128e760976

📥 Commits

Reviewing files that changed from the base of the PR and between b6549d7 and 12095c6.

📒 Files selected for processing (12)
  • bsp-qemu-virt/src/cpu.rs
  • docs/analysis/reviews/security-reviews/2026-05-31-T-022-high-half-migration.md
  • docs/analysis/tasks/phase-b/T-023-el0-entry-context.md
  • docs/architecture/hal.md
  • docs/architecture/scheduler.md
  • docs/audits/unsafe-log.md
  • docs/decisions/0037-el0-entry-context.md
  • docs/decisions/README.md
  • docs/roadmap/current.md
  • hal/src/context_switch.rs
  • kernel/src/sched/mod.rs
  • test-hal/src/context_switch.rs

Comment thread test-hal/src/context_switch.rs
…markers on slot reuse

FakeContextSwitch::init_context (the kernel path) seeded initialized/
entry_addr/stack_top but left is_user / user_sp untouched, so a context slot
re-seeded user-task -> kernel-task would report stale is_user=true / a stale
user_sp. init_context now fully re-seeds a kernel context: clears is_user and
user_sp. Adds a regression test (init_context_clears_prior_user_markers_on_reuse).
Test-double-only correctness; no runtime/kernel effect.

Skipped (with reason) from the same review:
- "msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr leaks SP" — INVALID. For MSR/MRS,
  Rt=31 is XZR (the X[]/ZR accessor), not SP (the 31=SP rule is for load/store
  base + SP-form arithmetic). Verified: kernel-build assembles these cleanly,
  and the shipped `msr ttbr0_el1, xzr` disassembles as `msr TTBR0_EL1, xzr`
  (d518201f) and is smoke-clean (TTBR0 -> 0, would fault if SP). No change.
- "factor the enter_el0 scrub into a .irp/macro" — SKIPPED. The explicit
  per-register form is deliberately maximally auditable for a security-critical
  scrub (every scrubbed register visible); the register set is fixed; no defect.
- "give init_user_context a default impl (unimplemented!/delegate)" — SKIPPED.
  A required trait method already fails loudest (compile error) if an
  implementor omits it; unimplemented! would defer that to a runtime panic
  (against the kernel's panic-free discipline + kernel-clippy denies panic),
  and delegating to init_context would silently produce a broken EL0 entry.

Gates: cargo fmt; host clippy -D warnings; 343 host tests (+1); Miri 0 UB
(kernel binary unchanged -> prior byte-stable QEMU smoke holds).

Refs: T-023
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cemililik
Copy link
Copy Markdown
Collaborator Author

Responding to @sourcery-ai's two high-level points — both considered and intentionally not actioned (rationale also recorded in commit 9b36967):

1. Factor the enter_el0 GPR/SIMD scrub into a .irp/macro. Skipped, deliberately. The explicit per-register form is kept for maximum auditability of a security-critical scrub: every scrubbed register — x0x30, v0v31, and FPCR/FPSR/TPIDR_EL0/TPIDRRO_EL0 — is visible at the call site so a reviewer can confirm coverage by inspection. A .irp loop trades that for compactness; the "accidentally miss a register" risk it guards against is effectively nil here because the AArch64 register set is fixed (it won't grow). No correctness gain, so per the minimum-change + security-first posture (CLAUDE.md #1) it stays explicit.

2. Give ContextSwitch::init_user_context a default impl (unimplemented!() or delegate to init_context). Skipped — a default would regress safety here. As a required trait method, an implementor that forgets it gets a compile error ("not all trait items implemented") — the loudest, earliest failure. A default unimplemented!() would defer that to a runtime panic (against the kernel's panic-free discipline; kernel-clippy denies clippy::panic/unwrap_used/expect_used), and a default delegating to init_context would silently produce a broken EL0 entry (an EL1 kernel-fn entry instead of the ERET-to-EL0 drop) — exactly the "silently omitting EL0 bring-up" the suggestion aims to prevent, only worse. So the no-default required method is the safer design.

(For completeness: @coderabbitai's inline comment on test-hal/src/context_switch.rs — reset the kernel-path markers on slot reuse — was actioned, in commit 9b36967, with a regression test. @gemini-code-assist's critical inline comment on the msr …, xzr scrub is answered in-thread: Rt = 31 is XZR for MSR/MRS, not SP — verified by disassembly + the shipped msr ttbr0_el1, xzr.)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Tip

For best results, initiate chat on the files or code changes.

@cemililik: Thanks for the thorough write-up — the rationale on both counts is well-reasoned and recorded.

On the explicit scrub: Agreed. For a one-time, security-auditable register wipe of a fixed ISA register set, the verbosity is the feature. A .irp loop would make a future auditor count iterations rather than scan a flat list.

On the required trait method (no default): The compile-error-vs-silent-wrong-EL1-entry argument is exactly right. A default that delegates to init_context would be the worst outcome — it compiles, runs, and does the wrong thing quietly. Required method with no default is the correct design here.

Glad the slot-reuse fix (is_user/user_sp clear + regression test) landed in 9b36967.


✏️ Learnings added
Learnt from: cemililik
URL: https://github.com/HodeTech/Tyrne/pull/37

Timestamp: 2026-05-31T10:23:19.738Z
Learning: In the Tyrne kernel (bsp-qemu-virt/src/cpu.rs), the `enter_el0` EL0 trampoline register scrub (x0–x30, v0–v31, FPCR/FPSR/TPIDR_EL0/TPIDRRO_EL0) is intentionally kept as explicit per-register inline assembly rather than being factored into a `.irp` loop or macro. This is a deliberate security-auditability decision: every scrubbed register is visible at the call site for reviewer confirmation. The AArch64 register set is fixed, so there is no compactness benefit that would justify reducing auditability.

Learnt from: cemililik
URL: https://github.com/HodeTech/Tyrne/pull/37

Timestamp: 2026-05-31T10:23:19.738Z
Learning: In the Tyrne kernel (hal/src/context_switch.rs), `ContextSwitch::init_user_context` intentionally has no default implementation. A required method forces a compile-time error if a BSP implementor forgets it. A default `unimplemented!()` would defer to a runtime panic (violating the kernel's panic-free discipline enforced by kernel-clippy); a default delegating to `init_context` would silently produce a broken EL0 entry (EL1 kernel-function path instead of ERET-to-EL0). The no-default required method is the correct, safer design.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@cemililik cemililik merged commit eb3125f into main May 31, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant