Skip to content

Development#3

Merged
cemililik merged 6 commits into
mainfrom
development
Apr 21, 2026
Merged

Development#3
cemililik merged 6 commits into
mainfrom
development

Conversation

@cemililik
Copy link
Copy Markdown
Collaborator

@cemililik cemililik commented Apr 21, 2026

Summary by Sourcery

Introduce the initial IPC subsystem with synchronous endpoint-based send/receive/notify operations, including capability transfer, and update documentation and roadmap to mark the kernel-object milestone complete and define the new IPC milestone.

New Features:

  • Add ipc module providing synchronous endpoint-based send/recv and notification primitives over kernel objects with fixed-size messages and single-capability transfer.
  • Extend capability rights with SEND, RECV, and NOTIFY bits to gate IPC operations.
  • Expose capability-table helpers to take a capability out of a table and query table fullness for use by IPC.

Enhancements:

  • Expose SlotId indices for internal IPC use when indexing parallel state arrays.

Documentation:

  • Add ADR-0017 describing the IPC primitive set and its rationale.
  • Document the new T-003 IPC primitives task and wire it into the Phase A task list and roadmap.
  • Update existing task and phase documentation to mark kernel-object storage (T-002, A3) as done and to describe the A4 IPC milestone and its relationship to previous work.
  • Extend the kernel crate top-level docs to reference the new IPC subsystem.

Tests:

  • Add host tests covering IPC send/recv ordering, capability transfer, queue-full behaviour, rights enforcement, and notification bit semantics.

Summary by CodeRabbit

  • New Features

    • Implemented IPC (Inter-Process Communication) primitives: send, recv, and notify operations for synchronous message passing between kernel objects with capability-gated access.
    • Expanded capability rights system with new IPC operation permissions.
  • Documentation

    • Advanced Phase A roadmap from kernel objects (completed) to IPC primitives (in progress).
    • Added architectural decision documentation for IPC primitive design and constraints.

cemililik and others added 5 commits April 21, 2026 08:57
T-002 PR merged to main; all acceptance criteria and DoD items met.
Milestone A3 (Kernel objects) closed — covered by T-002 in one task.

T-003 (IPC primitives, Milestone A4) opened at Draft status. Blocked on
ADR-0017 (IPC primitive set); propose ADR-0017 to advance to Ready.
ADR-0018 (Badge scheme) must also be Accepted or explicitly deferred
before A4 closes.

Refs: ADR-0016, ADR-0017
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Settles the A4 IPC interface: pure rendezvous (send + recv + notify),
fixed-size 4-word Message struct, up to one capability per message.
reply_recv fastpath and badge scheme explicitly deferred to ADR-0018.

Refs: ADR-0017
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ADR-0017 (IPC primitive set) accepted: send + recv + notify, fixed-size
4-word Message struct, up to one capability per message atomically.
reply_recv fastpath and badge scheme deferred to ADR-0018.

T-003 advances from Draft → Ready → In Progress; implementation begins
on development branch.

Refs: ADR-0017
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements the three IPC operations settled in ADR-0017:

- ipc_send: rendezvous send on an Endpoint; delivers to a waiting
  receiver or enqueues the sender (SendPending state).
- ipc_recv: rendezvous receive; drains a waiting sender or registers
  the caller as a waiting receiver (RecvWaiting state).
- ipc_notify: non-blocking bit-OR into a Notification word.

Capability transfer is atomic: the sender's cap is extracted via
cap_take, held in the endpoint's waiter state, and installed into the
receiver's table via insert_root on delivery. Pre-flight checks prevent
partial transfer.

Supporting changes:
- cap::rights: add SEND, RECV, NOTIFY bits; update KNOWN_BITS.
- cap::table: add cap_take (move-out for IPC transfer) and is_full
  (pre-flight check for receiver table capacity).
- obj::arena: remove #[cfg(test)] from SlotId::index so the IPC layer
  can index into IpcQueues by raw slot index.
- lib.rs: declare pub mod ipc; add T-003 doc reference.

55 host tests pass (44 previous + 11 new IPC tests).
fmt, host-clippy, and kernel-clippy all clean.

Refs: ADR-0017

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Marks T-003 IPC primitives as In Review:
- All acceptance criteria and DoD checkboxes ticked.
- current.md and phase-a.md updated to reflect In Review status.
- ADR-0018 (badge scheme) noted as deferred per ADR-0017.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 21, 2026

Reviewer's Guide

Implements the initial IPC subsystem (A4/T-003) with synchronous send/recv/notify primitives over endpoints and notifications, adds capability rights to gate IPC operations, extends the capability table and object arena APIs to support atomic capability transfer and IPC bookkeeping, introduces the IpcQueues state machine, and updates ADRs and roadmap/docs to reflect the new milestone and task status.

Sequence diagram for ipc_send and ipc_recv with capability transfer

sequenceDiagram
    actor Sender
    participant SenderTable as CapabilityTable_sender
    participant Receiver as Receiver
    participant ReceiverTable as CapabilityTable_receiver
    participant EndpointArena
    participant IpcQueues

    Sender->>SenderTable: lookup(ep_cap)
    SenderTable-->>Sender: Capability (CapObject::Endpoint, rights)
    Sender->>SenderTable: lookup(transfer_cap?)
    SenderTable-->>Sender: Capability or error
    Sender->>EndpointArena: get(EndpointHandle.slot())
    EndpointArena-->>Sender: Endpoint or error
    Sender->>IpcQueues: state_of(EndpointHandle)
    IpcQueues-->>Sender: EndpointState

    alt EndpointState is RecvWaiting
        Sender->>SenderTable: cap_take(transfer_cap?)
        SenderTable-->>Sender: Capability or None
        Sender->>IpcQueues: set_state(RecvComplete{msg,cap})
        Sender-->>Sender: SendOutcome::Delivered
    else EndpointState is Idle
        Sender->>SenderTable: cap_take(transfer_cap?)
        SenderTable-->>Sender: Capability or None
        Sender->>IpcQueues: set_state(SendPending{msg,cap})
        Sender-->>Sender: SendOutcome::Enqueued
    else EndpointState is SendPending or RecvComplete
        Sender-->>Sender: IpcError::QueueFull
    end

    %% Receiver path
    Receiver->>ReceiverTable: lookup(ep_cap)
    ReceiverTable-->>Receiver: Capability (CapObject::Endpoint, rights)
    Receiver->>EndpointArena: get(EndpointHandle.slot())
    EndpointArena-->>Receiver: Endpoint or error

    Receiver->>IpcQueues: peek_state(EndpointHandle)
    IpcQueues-->>Receiver: EndpointState

    alt state has cap and ReceiverTable.is_full()
        Receiver-->>Receiver: IpcError::ReceiverTableFull
    else state is SendPending or RecvComplete
        Receiver->>IpcQueues: take_state() -> {msg,cap}
        Receiver->>ReceiverTable: insert_root(cap?)
        ReceiverTable-->>Receiver: CapHandle or None
        Receiver-->>Receiver: RecvOutcome::Received{msg,cap}
    else state is Idle
        Receiver->>IpcQueues: set_state(RecvWaiting)
        Receiver-->>Receiver: RecvOutcome::Pending
    else state is RecvWaiting
        Receiver-->>Receiver: IpcError::QueueFull
    end
Loading

Class diagram for new IPC subsystem and related capability types

classDiagram
    class Message {
        +u64 label
        +u64[3] params
        +Message()
    }

    class IpcError {
        <<enum>>
        +InvalidCapability
        +QueueFull
        +InvalidTransferCap
        +ReceiverTableFull
    }

    class SendOutcome {
        <<enum>>
        +Delivered
        +Enqueued
    }

    class RecvOutcome {
        <<enum>>
        +Received(msg: Message, cap: Option_CapHandle_)
        +Pending
    }

    class EndpointState {
        <<enum>>
        +Idle
        +SendPending(msg: Message, cap: Option_Capability_)
        +RecvWaiting
        +RecvComplete(msg: Message, cap: Option_Capability_)
    }

    class IpcQueues {
        -EndpointState[ENDPOINT_ARENA_CAPACITY] states
        +IpcQueues new()
        +EndpointState state_of(handle: EndpointHandle)
        +EndpointState peek_state(handle: EndpointHandle)
    }

    class CapabilityTable {
        +Capability lookup(handle: CapHandle) Result
        +CapHandle insert_root(cap: Capability) Result
        +Capability cap_take(handle: CapHandle) Result
        +bool is_full()
    }

    class Capability {
        +CapRights rights()
        +CapObject object()
    }

    class CapRights {
        <<bitflags>>
        +DUPLICATE
        +DERIVE
        +REVOKE
        +TRANSFER
        +SEND
        +RECV
        +NOTIFY
        +KNOWN_BITS
        +bool contains(other: CapRights)
    }

    class CapObject {
        <<enum>>
        +Endpoint(handle: EndpointHandle)
        +Notification(handle: NotificationHandle)
    }

    class EndpointArena {
        +Endpoint get(slot: SlotId) Option
        +Endpoint get_mut(slot: SlotId) Option
    }

    class NotificationArena {
        +Notification get(slot: SlotId) Option
        +Notification get_mut(slot: SlotId) Option
    }

    class EndpointHandle {
        +SlotId slot()
    }

    class NotificationHandle {
        +SlotId slot()
    }

    class SlotId {
        +u16 index()
    }

    class Endpoint {
        +Endpoint new(id: u64)
    }

    class Notification {
        +Notification new(bits: u64)
        +void set(bits: u64)
        +u64 word()
    }

    %% Top-level IPC functions
    class ipc_mod {
        +IpcError ipc_send(ep_arena: EndpointArena, queues: IpcQueues, ep_cap: CapHandle, caller_table: CapabilityTable, msg: Message, transfer: Option_CapHandle_)
        +IpcError ipc_recv(ep_arena: EndpointArena, queues: IpcQueues, ep_cap: CapHandle, caller_table: CapabilityTable)
        +IpcError ipc_notify(notif_arena: NotificationArena, notif_cap: CapHandle, caller_table: CapabilityTable, bits: u64)
    }

    %% Relationships
    ipc_mod --> Message
    ipc_mod --> IpcError
    ipc_mod --> SendOutcome
    ipc_mod --> RecvOutcome
    ipc_mod --> IpcQueues
    ipc_mod --> EndpointArena
    ipc_mod --> NotificationArena
    ipc_mod --> CapabilityTable

    IpcQueues --> EndpointState
    IpcQueues "1" o-- "*" EndpointState : states

    CapabilityTable --> Capability
    Capability --> CapRights
    Capability --> CapObject

    CapObject --> EndpointHandle
    CapObject --> NotificationHandle

    EndpointHandle --> SlotId
    NotificationHandle --> SlotId

    EndpointArena --> EndpointHandle
    EndpointArena --> Endpoint

    NotificationArena --> NotificationHandle
    NotificationArena --> Notification

    SlotId --> u16

    RecvOutcome --> Message
    RecvOutcome --> CapHandle

    EndpointState --> Message
    EndpointState --> Capability
Loading

State diagram for EndpointState IPC waiter lifecycle

stateDiagram-v2
    [*] --> Idle

    Idle --> SendPending: ipc_send(no_receiver)
    Idle --> RecvWaiting: ipc_recv(no_sender)

    RecvWaiting --> RecvComplete: ipc_send(receiver_waiting)

    SendPending --> Idle: ipc_recv(deliver_msg_and_cap)
    RecvComplete --> Idle: ipc_recv(collect_delivery)

    SendPending --> SendPending: ipc_send_when_SendPending / IpcError_QueueFull
    RecvComplete --> RecvComplete: ipc_send_when_RecvComplete / IpcError_QueueFull
    RecvWaiting --> RecvWaiting: ipc_recv_when_RecvWaiting / IpcError_QueueFull

    Idle --> Idle: invalid_capability / IpcError_InvalidCapability
    SendPending --> SendPending: receiver_table_full / IpcError_ReceiverTableFull
    RecvComplete --> RecvComplete: receiver_table_full / IpcError_ReceiverTableFull
Loading

File-Level Changes

Change Details Files
Add IPC subsystem with rendezvous send/recv/notify semantics, waiter queues, and atomic single-capability transfer between tasks
  • Introduce kernel::ipc module implementing Message, IpcError, SendOutcome, RecvOutcome, and public ipc_send/ipc_recv/ipc_notify functions per ADR-0017
  • Implement IpcQueues and EndpointState state machine to track per-endpoint sender/receiver waiters and in-flight messages without embedding capability state into Endpoint objects
  • Use CapabilityTable::cap_take and insert_root to move at most one Capability per message from sender to receiver tables atomically, with preflight checks for handle validity and receiver table capacity
  • Enforce CapRights-based access control (SEND/RECV/NOTIFY) and endpoint/notification kind validation in IPC helpers, returning structured IpcError variants for invalid caps, queue-full, invalid transfer caps, and receiver-table-full cases
  • Add an extensive suite of host tests covering sender-first/receiver-first rendezvous, capability transfer, queue-full behavior, rights enforcement, notify bit semantics, and blocked-sender delivery
kernel/src/ipc/mod.rs
Extend the capability and object storage layers to support IPC integration and capacity queries
  • Add CapabilityTable::cap_take to remove a capability by handle, unlink it from the tree, recycle the slot with generation bump, and return the owned Capability value while enforcing no-children and valid-handle checks
  • Add CapabilityTable::is_full to report when no free slots remain, allowing ipc_recv to preflight capability transfer into the receiver table
  • Relax SlotId::index visibility/usage so IPC can index parallel state arrays by slot index while keeping it crate-private and returning a raw u16 index
kernel/src/cap/table.rs
kernel/src/obj/arena.rs
Introduce new capability rights for IPC operations and wire them into the documented rights set
  • Define CapRights::TRANSFER as the ability to include a capability in an IPC transfer and add SEND, RECV, and NOTIFY rights bits for endpoint send/recv and notification notify operations
  • Expand CapRights::KNOWN_BITS to include the new IPC-related bits so from_raw correctly masks unknown rights while allowing SEND/RECV/NOTIFY
kernel/src/cap/rights.rs
Document the IPC design (ADR-0017), add the T-003 task description, and update roadmap/status docs for milestones and tasks
  • Add ADR-0017 describing the chosen IPC primitive set (send/recv/notify), fixed-size Message layout, single-capability transfer semantics, blocking model with depth-1 waiter queues, and deferral of reply_recv and badge scheme
  • Create T-003 IPC primitives task doc defining scope, acceptance criteria, host-test expectations, and Phase-A/A4 context and dependencies
  • Mark T-002 kernel object storage task and A3 milestone as Done, add T-003 as In Review in the Phase-A task list and roadmap overviews, and update current.md to make A4/T-003 the active milestone/task with notes on ADR-0017 and IPC tests
  • Update the T-002 task doc status, acceptance criteria checkboxes, definition-of-done items, and review history to reflect merged implementation, applied review findings, and A3 closure
  • Extend kernel crate top-level docs and ADR index to reference the new ipc module and ADR-0017
docs/decisions/0017-ipc-primitive-set.md
docs/analysis/tasks/phase-a/T-003-ipc-primitives.md
docs/analysis/tasks/phase-a/T-002-kernel-object-storage.md
docs/analysis/tasks/phase-a/README.md
docs/roadmap/current.md
docs/roadmap/phases/phase-a.md
docs/decisions/README.md
kernel/src/lib.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Warning

Rate limit exceeded

@cemililik has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 30 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cee25e9e-e8f8-4fe5-bae7-2e05b1b4c207

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb879c and 028d81f.

📒 Files selected for processing (6)
  • docs/analysis/tasks/phase-a/T-003-ipc-primitives.md
  • docs/decisions/0017-ipc-primitive-set.md
  • docs/roadmap/current.md
  • kernel/src/cap/table.rs
  • kernel/src/ipc/mod.rs
  • kernel/src/obj/arena.rs
📝 Walkthrough

Walkthrough

The PR advances the project from Milestone A3 (Kernel objects, now marked Done) to Milestone A4 (IPC primitives, In Review). It adds three new capability rights for IPC operations, implements synchronous send/recv rendezvous and non-blocking notify primitives over Endpoint and Notification kernel objects, extends the capability system with capability transfer operations, and documents the complete IPC specification in ADR-0017.

Changes

Cohort / File(s) Summary
Documentation & Roadmap
docs/analysis/tasks/phase-a/README.md, docs/analysis/tasks/phase-a/T-002-kernel-object-storage.md, docs/analysis/tasks/phase-a/T-003-ipc-primitives.md, docs/decisions/0017-ipc-primitive-set.md, docs/decisions/README.md, docs/roadmap/current.md, docs/roadmap/phases/phase-a.md
Milestone A3 marked complete with T-002 transitioned to Done; new task T-003 (IPC primitives) added with In Review status; new ADR-0017 specifies minimal IPC API with fixed-size messages, one optional capability per message, bounded waiter queues, and non-blocking notify; roadmap updated to track A4 as active milestone.
Capability System Extensions
kernel/src/cap/rights.rs, kernel/src/cap/table.rs
Added three new IPC-related capability rights (SEND, RECV, NOTIFY at bits 4–6) with updated KNOWN_BITS mask; added cap_take() method to remove and transfer capability ownership to caller; added is_full() query method.
IPC Primitives Implementation
kernel/src/ipc/mod.rs
New IPC subsystem implementing synchronous ipc_send and ipc_recv rendezvous with state machine (Idle, SendPending, RecvWaiting, RecvComplete), non-blocking ipc_notify for bit manipulation, fixed-size Message payload (label + 3 params), and atomic capability transfer via cap_take/insert_root at delivery time; includes error types and comprehensive test coverage.
Module Export & Arena API
kernel/src/lib.rs, kernel/src/obj/arena.rs
Added public ipc module declaration and crate documentation; exposed SlotId::index() method (previously test-only) for IPC layer indexing into parallel state arrays, changing visibility from #[cfg(test)] to always available.

Sequence Diagram(s)

sequenceDiagram
    actor Caller1 as Sender Task
    participant CTable1 as Sender<br/>CapabilityTable
    participant IPC as IpcQueues<br/>EndpointState
    participant CTable2 as Receiver<br/>CapabilityTable
    actor Caller2 as Receiver Task

    Caller1->>CTable1: cap_take(transfer_cap)?
    alt transfer valid
        CTable1-->>Caller1: Capability
    else invalid/stale
        CTable1-->>Caller1: CapError
        Caller1->>IPC: ipc_send() fails
    end

    Caller1->>IPC: ipc_send(ep_cap, msg, transfer?)
    alt RecvWaiting state
        IPC->>CTable2: insert_root(capability)
        CTable2-->>IPC: CapHandle
        IPC-->>Caller1: Delivered + CapHandle
        IPC-->>Caller2: msg + CapHandle ready
    else Idle state
        IPC-->>Caller1: Enqueued
        Caller2->>IPC: ipc_recv(ep_cap)
        IPC->>CTable2: insert_root(capability)
        CTable2-->>IPC: CapHandle
        IPC-->>Caller2: Received + CapHandle
    else QueueFull
        IPC-->>Caller1: QueueFull error
    end
Loading
sequenceDiagram
    actor Caller as Task
    participant CTable as CapabilityTable
    participant NotifObj as Notification<br/>Object
    participant IPC as ipc_notify

    Caller->>IPC: ipc_notify(notif_cap, bits=0x05)
    IPC->>NotifObj: validate notif_cap & NOTIFY right
    alt valid capability
        NotifObj->>NotifObj: word \|= bits (with saturation)
        NotifObj-->>IPC: success
        IPC-->>Caller: Ok(())
    else invalid/stale/wrong-kind
        IPC-->>Caller: InvalidCapability error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Development #1: Extends the capability subsystem foundation introduced in the same PR—adds IPC-specific rights (SEND, RECV, NOTIFY) in kernel/src/cap/rights.rs and new CapabilityTable methods (cap_take, is_full) in kernel/src/cap/table.rs for capability transfer during IPC operations.
  • Development #2: Modifies kernel object and capability subsystems (including kernel/src/obj/arena.rs and capability-table changes), providing the object-arena and capability-table foundations upon which this PR's IPC implementation depends.

Poem

🐰 From A3's depths, the objects rest in place,
Now A4 bounces in at quickened pace—
Send, recv, and notify dance in the queue,
Capabilities leap 'cross tables, both old and new!
The IPC hops forward, state machines align,
Milestone A4 sparkles—we're right on time!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Development' is vague and generic, providing no meaningful information about the changeset's primary objectives or content. Use a descriptive title that summarizes the main change, such as 'Add IPC primitives implementation (T-003) and capability rights' or 'Implement Phase A4 IPC subsystem and ADR-0017'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch development

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="kernel/src/ipc/mod.rs" line_range="204-206" />
<code_context>
+    let ep_handle = validate_ep_cap(caller_table, ep_cap, CapRights::SEND)?;
+
+    // Pre-flight: validate the transfer cap before touching endpoint state.
+    if let Some(xfer) = transfer {
+        caller_table
+            .lookup(xfer)
+            .map_err(|_| IpcError::InvalidTransferCap)?;
+    }
</code_context>
<issue_to_address>
**issue (bug_risk):** TRANSFER right is defined but not enforced on the transferred capability

`CapRights::TRANSFER` is documented as controlling whether a capability may be sent over IPC, but here we only check that the handle is live via `lookup` and never enforce that the capability has the `TRANSFER` right. As written, any capability the caller holds can be transferred. If the intent is to gate IPC transfer on this right, add a check like `cap.rights().contains(CapRights::TRANSFER)` in the transfer pre-flight and reject otherwise.
</issue_to_address>

### Comment 2
<location path="docs/decisions/0017-ipc-primitive-set.md" line_range="196" />
<code_context>
+
+### Option H — Multiple capabilities per message
+
+- Pro: more expressive; covers protocols that handoff a set of capabilities atomically.
+- Con: no A4/A6 scenario requires it; adding a loop and array before there is a test case is premature complexity.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Use the correct verbal form "hand off" instead of "handoff" here.

Here it’s used as a verb, so it should be written as two words: change "handoff" to "hand off".

```suggestion
- Pro: more expressive; covers protocols that hand off a set of capabilities atomically.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread kernel/src/ipc/mod.rs
Comment thread docs/decisions/0017-ipc-primitive-set.md Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the IPC primitives for Milestone A4, including ipc_send, ipc_recv, and ipc_notify, as defined in the new ADR-0017. The changes introduce the ipc module, add specific capability rights for IPC operations, and implement atomic capability transfer mechanisms within the CapabilityTable. Review feedback highlights a critical logic error in IpcQueues regarding the lack of generation tracking, which could lead to stale state reuse across different endpoint lifecycles. Additionally, the implementation of ipc_send is flagged for potential state corruption during fallible transfers, and there is a recommendation to refactor duplicated validation logic in ipc_notify into a helper function.

Comment thread kernel/src/ipc/mod.rs
Comment on lines +148 to +150
pub struct IpcQueues {
states: [EndpointState; ENDPOINT_ARENA_CAPACITY],
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The IpcQueues structure currently only stores the EndpointState for each slot, which is a critical logic error. It does not account for the lifecycle of the endpoints. Since IpcQueues is indexed by the raw slot index, it can retain stale state (like RecvWaiting) from a previously destroyed endpoint that occupied the same slot. If a new endpoint is created in that slot, it will inherit this state, potentially allowing a task to receive a message intended for a different (now destroyed) endpoint. To fix this, IpcQueues should store the generation for each slot and verify it in state_of and peek_state, resetting to Idle if they differ. This will also require making SlotId::generation accessible to the crate by removing its #[cfg(test)] restriction in kernel/src/obj/arena.rs.

Comment thread kernel/src/ipc/mod.rs
Comment on lines +215 to +236
let state = queues.state_of(ep_handle);
let old = core::mem::replace(state, EndpointState::Idle);
match old {
EndpointState::RecvWaiting => {
// A receiver is waiting. Extract the cap and transition to
// RecvComplete so the receiver can pick up the result.
let owned = take_cap_if_some(caller_table, transfer)?;
*state = EndpointState::RecvComplete { msg, cap: owned };
Ok(SendOutcome::Delivered)
}
EndpointState::Idle => {
// No receiver. Store the message in the endpoint queue.
let owned = take_cap_if_some(caller_table, transfer)?;
*state = EndpointState::SendPending { msg, cap: owned };
Ok(SendOutcome::Enqueued)
}
occupied @ (EndpointState::SendPending { .. } | EndpointState::RecvComplete { .. }) => {
// Restore the original state unchanged.
*state = occupied;
Err(IpcError::QueueFull)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation of ipc_send is vulnerable to state corruption. By calling core::mem::replace(state, EndpointState::Idle) at line 216, the original state is moved out. If take_cap_if_some subsequently fails (for example, if the transfer capability has children, which is not checked in the pre-flight), the function returns an error but the endpoint state remains Idle. If the original state was RecvWaiting, the registered receiver is lost, leading to kernel state inconsistency. A safer approach is to check the state using peek_state first, perform the fallible capability transfer, and only then commit the state change.

    if matches!(queues.peek_state(ep_handle), EndpointState::SendPending { .. } | EndpointState::RecvComplete { .. }) {
        return Err(IpcError::QueueFull);
    }

    let owned = take_cap_if_some(caller_table, transfer)?;

    let state = queues.state_of(ep_handle);
    let old = core::mem::replace(state, EndpointState::Idle);
    match old {
        EndpointState::RecvWaiting => {
            *state = EndpointState::RecvComplete { msg, cap: owned };
            Ok(SendOutcome::Delivered)
        }
        EndpointState::Idle => {
            *state = EndpointState::SendPending { msg, cap: owned };
            Ok(SendOutcome::Enqueued)
        }
        _ => unreachable!("Occupied states handled by pre-flight check"),
    }

Comment thread kernel/src/ipc/mod.rs Outdated
Comment on lines +315 to +326
let notif_handle = {
let cap = caller_table
.lookup(notif_cap)
.map_err(|_| IpcError::InvalidCapability)?;
if !cap.rights().contains(CapRights::NOTIFY) {
return Err(IpcError::InvalidCapability);
}
match cap.object() {
CapObject::Notification(h) => h,
_ => return Err(IpcError::InvalidCapability),
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The capability validation logic in ipc_notify is duplicated from the pattern used in ipc_send and ipc_recv. To improve maintainability and reduce the risk of inconsistent validation, this should be extracted into a helper function similar to validate_ep_cap.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
kernel/src/cap/table.rs (2)

444-466: Consider reusing free_slot to avoid duplicating free-list bookkeeping.

After entry.take(), the remaining three lines replicate exactly what free_slot at Line 542 does (entry is already None, so free_slot's slot.entry = None is idempotent). Delegating keeps the free-list invariants in one place — if free_slot ever changes (e.g., to zero out entry more carefully, or to assert on generation overflow), cap_take will stay consistent automatically.

♻️ Proposed refactor
         self.unlink_from_siblings(index)?;
-        // Take the SlotEntry out of the slot before bumping the generation.
         let entry = self.slots[index as usize]
             .entry
             .take()
             .ok_or(CapError::InvalidHandle)?;
-        // Free-slot bookkeeping (mirrors free_slot, but entry is already None).
-        let old_head = self.free_head;
-        self.free_head = Some(index);
-        self.slots[index as usize].generation =
-            self.slots[index as usize].generation.wrapping_add(1);
-        self.slots[index as usize].next_free = old_head;
+        self.free_slot(index);
         Ok(entry.capability)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kernel/src/cap/table.rs` around lines 444 - 466, The cap_take implementation
duplicates the free-list bookkeeping that free_slot already encapsulates; after
taking the entry (the entry is already None), remove the manual updates to
free_head/generation/next_free and instead call self.free_slot(index) (or the
appropriate free_slot method) to perform the slot recycling; keep the prior
unlink_from_siblings(index)? call and return entry.capability as before, and
ensure free_slot tolerates being called when slot.entry is already None (it
should be idempotent).

444-466: Missing unit tests for cap_take and is_full.

The new public API surface has no direct test coverage in this file's tests module. Given cap_take's invariants overlap with but diverge from cap_drop (generation bump + returns Capability, same HasChildren/InvalidHandle refusals, interaction with sibling unlinking, interaction with is_full), please add tests covering at minimum:

  • cap_take on a leaf returns the expected Capability and invalidates the old handle.
  • cap_take on an interior node returns HasChildren without freeing.
  • cap_take on a stale handle returns InvalidHandle.
  • cap_take on a middle sibling preserves outer siblings (mirrors drop_middle_sibling_preserves_list_integrity).
  • After cap_take the slot is reusable with a bumped generation.
  • is_full transitions: empty → not full; saturated → full; after cap_take/cap_drop → not full.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kernel/src/cap/table.rs` around lines 444 - 466, Add unit tests to the
existing tests module exercising cap_take and is_full: write tests that (1) call
cap_take on a leaf slot and assert it returns the expected Capability and that
resolve_handle/using the old CapHandle returns CapError::InvalidHandle; (2) call
cap_take on a node with children and assert it returns CapError::HasChildren and
does not free or change sibling links; (3) call cap_take with a stale handle
(old generation) and assert CapError::InvalidHandle; (4) mirror
drop_middle_sibling_preserves_list_integrity by taking a middle sibling with
cap_take and assert outer siblings remain linked; (5) after cap_take verify the
slot is reusable and its generation was bumped (allocate a new cap into the same
index and ensure the new handle differs); and (6) add an is_full transition
test: start with empty table → assert !is_full, fill to capacity → assert
is_full, then perform cap_take and cap_drop and assert is_full becomes false.
Use the existing CapTable helpers and existing test patterns (e.g.
drop_middle_sibling_preserves_list_integrity) and assert specific CapError
variants and generation/handle changes for identification.
kernel/src/ipc/mod.rs (2)

526-543: Minor: unused intermediate cap in send_transfers_cap_atomically.

The block at lines 534-539 installs a cap whose handle is bound to ch and immediately discarded (let (_, xfer_ep_handle) = { …; (ch, h) };). Only xfer_ep_handle is subsequently used — the first sender_table.insert_root(c).unwrap() just leaves an orphan cap in the sender table that has nothing to do with what the test asserts. Tightens the test to only create what it uses:

♻️ Suggested simplification
-        // Give sender a second endpoint cap to transfer.
-        let (_, xfer_ep_handle) = {
-            let h = create_endpoint(&mut ep_arena, Endpoint::new(1)).unwrap();
-            let c = Capability::new(all_ep_rights(), CapObject::Endpoint(h));
-            let ch = sender_table.insert_root(c).unwrap();
-            (ch, h)
-        };
+        // Give sender a second endpoint to reference via a transfer cap.
+        let xfer_ep_handle = create_endpoint(&mut ep_arena, Endpoint::new(1)).unwrap();
         let xfer_cap_h = {
             let c = Capability::new(all_task_rights(), CapObject::Endpoint(xfer_ep_handle));
             sender_table.insert_root(c).unwrap()
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kernel/src/ipc/mod.rs` around lines 526 - 543, The test
send_transfers_cap_atomically creates an unused intermediate capability when
constructing (ch, h) and discarding ch; remove that orphan insertion and only
create the endpoint handle and the single capability you actually use
(xfer_ep_handle -> xfer_cap_h). Concretely, stop calling
sender_table.insert_root(...) for the discarded ch returned from the block that
calls create_endpoint, instead just obtain the endpoint handle from
create_endpoint (create_endpoint -> xfer_ep_handle) and then insert only the
Capability::new(all_task_rights(), CapObject::Endpoint(xfer_ep_handle)) into
sender_table to produce xfer_cap_h.

269-297: Pre-flight check in ipc_recv relies on the sender's cap having been installable — worth a brief rationale comment.

The pending_has_cap && caller_table.is_full() guard (lines 271-278) is what keeps the SendPending { msg, cap }install_cap_if_some path from losing the message+cap when the receiver's table is full: without the guard, line 281's mem::replace(state, Idle) would commit before insert_root is attempted. The invariant that "install_cap_if_some cannot fail here" is subtle and load-bearing — a short comment would help future readers (and anyone touching install_cap_if_some error variants) avoid regressing it.

📝 Suggested comment
     let state = queues.state_of(ep_handle);
     let old = core::mem::replace(state, EndpointState::Idle);
     match old {
         EndpointState::SendPending { msg, cap } | EndpointState::RecvComplete { msg, cap } => {
-            // Deliver the message. Install cap (if any) into the receiver's table.
+            // Deliver the message. The pre-flight `is_full` check above
+            // guarantees `insert_root` has a free slot for `cap`, so
+            // `install_cap_if_some` cannot fail here — the state commit to
+            // `Idle` is therefore safe.
             let xfer = install_cap_if_some(caller_table, cap)?;
             Ok(RecvOutcome::Received { msg, cap: xfer })
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kernel/src/ipc/mod.rs` around lines 269 - 297, Add a short rationale comment
in ipc_recv just above the pre-flight guard that checks pending_has_cap &&
caller_table.is_full(), explaining that this pre-flight ensures
install_cap_if_some(caller_table, cap) cannot fail when we later replace the
endpoint state with EndpointState::Idle (via core::mem::replace(state,
EndpointState::Idle)), so the SendPending/RecvComplete -> install_cap_if_some
path won't lose the message+cap; also note that if install_cap_if_some's error
variants or caller_table capacity semantics change, this invariant must be
revisited.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/analysis/tasks/phase-a/T-003-ipc-primitives.md`:
- Around line 83-84: Remove the stale "(to be written before code lands)" note
from the ADR-0017 link and mark ADR-0017 as Accepted in the document
(referencing ADR-0017 / docs/decisions/0017-ipc-primitive-set.md), and for
ADR-0018 either remove the dead link or replace it with a deferral note that
points readers to ADR-0017's "Open questions" section (reference ADR-0018 /
docs/decisions/0018-badge-scheme.md) until ADR-0018 is authored; ensure the
"Acceptance criteria" section remains consistent with ADR-0017's Accepted
status.

In `@docs/roadmap/current.md`:
- Around line 9-14: The "Active task: T-003 — IPC primitives" is marked In
Review but the "Next review trigger" still says it triggers when T-003 reaches
In Review; reconcile by either (A) change the active task status in the "Active
task: T-003 — IPC primitives" bullet from "In Review" back to the correct
pre-review state (e.g., "In Progress") if it hasn't actually entered review, or
(B) update the "Next review trigger" bullet to reference the appropriate next
event now that T-003 is already In Review (e.g., "T-003 merge/PR merge" or the
A6 closure per phase-a.md). Ensure you edit the two bullets referencing "T-003 —
IPC primitives" and "Next review trigger" so they are consistent and point to
the actual next pending review (use explicit wording like "code + security
review on merge" or "A6 business review") and keep date/status text consistent.

In `@kernel/src/ipc/mod.rs`:
- Around line 203-237: ipc_send currently replaces the endpoint state with
EndpointState::Idle before attempting to take the transfer cap, so if
take_cap_if_some (which calls cap_take and can fail with CapError::HasChildren)
fails the prior RecvWaiting/SendPending state is lost; fix by performing the
non-destructive cap check or the actual cap_take before mutating the endpoint
state: either (a) extend the pre-flight to reject caps with children by calling
the same validation used by cap_take (or exposing a non-destructive has-children
check) on the transfer cap referenced by caller_table.lookup(xfer), or (b) move
the call to take_cap_if_some/cap_take so it happens before
core::mem::replace(state, EndpointState::Idle) and ensure on QueueFull branch
you return or restore the cap to the caller (or surface a new error) so no
receiver state (EndpointState::RecvWaiting) is overwritten; update ipc_send and
the helper take_cap_if_some accordingly and add the suggested regression test to
ensure RecvWaiting is preserved on InvalidTransferCap.

---

Nitpick comments:
In `@kernel/src/cap/table.rs`:
- Around line 444-466: The cap_take implementation duplicates the free-list
bookkeeping that free_slot already encapsulates; after taking the entry (the
entry is already None), remove the manual updates to
free_head/generation/next_free and instead call self.free_slot(index) (or the
appropriate free_slot method) to perform the slot recycling; keep the prior
unlink_from_siblings(index)? call and return entry.capability as before, and
ensure free_slot tolerates being called when slot.entry is already None (it
should be idempotent).
- Around line 444-466: Add unit tests to the existing tests module exercising
cap_take and is_full: write tests that (1) call cap_take on a leaf slot and
assert it returns the expected Capability and that resolve_handle/using the old
CapHandle returns CapError::InvalidHandle; (2) call cap_take on a node with
children and assert it returns CapError::HasChildren and does not free or change
sibling links; (3) call cap_take with a stale handle (old generation) and assert
CapError::InvalidHandle; (4) mirror drop_middle_sibling_preserves_list_integrity
by taking a middle sibling with cap_take and assert outer siblings remain
linked; (5) after cap_take verify the slot is reusable and its generation was
bumped (allocate a new cap into the same index and ensure the new handle
differs); and (6) add an is_full transition test: start with empty table →
assert !is_full, fill to capacity → assert is_full, then perform cap_take and
cap_drop and assert is_full becomes false. Use the existing CapTable helpers and
existing test patterns (e.g. drop_middle_sibling_preserves_list_integrity) and
assert specific CapError variants and generation/handle changes for
identification.

In `@kernel/src/ipc/mod.rs`:
- Around line 526-543: The test send_transfers_cap_atomically creates an unused
intermediate capability when constructing (ch, h) and discarding ch; remove that
orphan insertion and only create the endpoint handle and the single capability
you actually use (xfer_ep_handle -> xfer_cap_h). Concretely, stop calling
sender_table.insert_root(...) for the discarded ch returned from the block that
calls create_endpoint, instead just obtain the endpoint handle from
create_endpoint (create_endpoint -> xfer_ep_handle) and then insert only the
Capability::new(all_task_rights(), CapObject::Endpoint(xfer_ep_handle)) into
sender_table to produce xfer_cap_h.
- Around line 269-297: Add a short rationale comment in ipc_recv just above the
pre-flight guard that checks pending_has_cap && caller_table.is_full(),
explaining that this pre-flight ensures install_cap_if_some(caller_table, cap)
cannot fail when we later replace the endpoint state with EndpointState::Idle
(via core::mem::replace(state, EndpointState::Idle)), so the
SendPending/RecvComplete -> install_cap_if_some path won't lose the message+cap;
also note that if install_cap_if_some's error variants or caller_table capacity
semantics change, this invariant must be revisited.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bea7121b-136f-4bdb-8ee8-f715238302dc

📥 Commits

Reviewing files that changed from the base of the PR and between 3068c71 and 1bb879c.

📒 Files selected for processing (12)
  • docs/analysis/tasks/phase-a/README.md
  • docs/analysis/tasks/phase-a/T-002-kernel-object-storage.md
  • docs/analysis/tasks/phase-a/T-003-ipc-primitives.md
  • docs/decisions/0017-ipc-primitive-set.md
  • docs/decisions/README.md
  • docs/roadmap/current.md
  • docs/roadmap/phases/phase-a.md
  • kernel/src/cap/rights.rs
  • kernel/src/cap/table.rs
  • kernel/src/ipc/mod.rs
  • kernel/src/lib.rs
  • kernel/src/obj/arena.rs

Comment thread docs/analysis/tasks/phase-a/T-003-ipc-primitives.md Outdated
Comment thread docs/roadmap/current.md Outdated
Comment on lines +9 to +14
- **Active task:** [T-003IPC primitives](../analysis/tasks/phase-a/T-003-ipc-primitives.md) (status: **In Review**).
- **Working branch:** `development`.
- **Last completed milestone:** A2Capability table foundation, on 2026-04-21 (PR #1 merged to `main`).
- **Last completed task:** [T-001Capability table foundation](../analysis/tasks/phase-a/T-001-capability-table-foundation.md) — `Done` 2026-04-21.
- **Last completed milestone:** A3Kernel objects, on 2026-04-21 (PR merged to `main`).
- **Last completed task:** [T-002Kernel object storage foundation](../analysis/tasks/phase-a/T-002-kernel-object-storage.md) — `Done` 2026-04-21.
- **Last review:** [A2 completion business review](../analysis/reviews/business-reviews/2026-04-21-A2-completion.md) — 2026-04-21.
- **Next review trigger:** code + security review of T-002's implementation when it reaches `In Review`; business review waits for A6 per [phase-a.md closure](phases/phase-a.md).
- **Next review trigger:** code + security review of T-003's implementation when it reaches `In Review`; business review waits for A6 per [phase-a.md closure](phases/phase-a.md).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor inconsistency: "next review trigger" condition already satisfied.

Line 9 states T-003's status is already In Review, but Line 14 says the next review trigger is T-003 reaching In Review. Either the trigger has already fired (and this bullet should be updated to point at the next pending review, e.g., A6 closure or T-003 merge) or T-003 hasn't actually entered review yet and Line 9's status is premature. Please reconcile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/roadmap/current.md` around lines 9 - 14, The "Active task: T-003 — IPC
primitives" is marked In Review but the "Next review trigger" still says it
triggers when T-003 reaches In Review; reconcile by either (A) change the active
task status in the "Active task: T-003 — IPC primitives" bullet from "In Review"
back to the correct pre-review state (e.g., "In Progress") if it hasn't actually
entered review, or (B) update the "Next review trigger" bullet to reference the
appropriate next event now that T-003 is already In Review (e.g., "T-003
merge/PR merge" or the A6 closure per phase-a.md). Ensure you edit the two
bullets referencing "T-003 — IPC primitives" and "Next review trigger" so they
are consistent and point to the actual next pending review (use explicit wording
like "code + security review on merge" or "A6 business review") and keep
date/status text consistent.

Comment thread kernel/src/ipc/mod.rs
Bugs:
- ipc_send: move take_cap_if_some before core::mem::replace so that a
  failing cap_take (e.g. HasChildren) leaves endpoint state unchanged;
  RecvWaiting is now preserved on InvalidTransferCap. A queue-full pre-
  check via peek_state avoids any cap manipulation on that path.
- ipc_send: enforce CapRights::TRANSFER on the capability being sent;
  previously any live capability could be transferred regardless of rights.
- IpcQueues: track arena generation per slot; state_of/peek_state now
  reset the slot to Idle when the handle's generation differs, preventing
  a newly allocated endpoint from inheriting stale waiter state left by
  a destroyed predecessor in the same slot.
- SlotId::generation: remove #[cfg(test)] so the IPC generation-tracking
  logic can call it in production builds.

Nitpicks:
- cap_take: delegate free-list bookkeeping to free_slot instead of
  duplicating it inline.
- cap/table: add six cap_take and is_full unit tests.
- ipc_notify: extract validate_notif_cap helper (mirrors validate_ep_cap).
- ipc_recv: add invariant comment on the receiver-table-full pre-flight.
- tests: remove orphan inserted-but-unused caps in send_transfers_cap_*
  tests; add regression tests for the three bugs fixed above.

Docs:
- T-003-ipc-primitives.md: remove stale "(to be written before code
  lands)" note on ADR-0017; update ADR-0018 link to deferral note.
- current.md: reconcile "Next review trigger" now that T-003 is In Review.
- 0017-ipc-primitive-set.md: fix verb form "handoff" → "hand off".

64 host tests pass; fmt, host-clippy, kernel-clippy all clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cemililik
Copy link
Copy Markdown
Collaborator Author

@coderabbirai review
@sourcery-ai review

@cemililik cemililik merged commit 7378cb6 into main Apr 21, 2026
2 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Apr 21, 2026
cemililik added a commit that referenced this pull request Apr 21, 2026
Integrate every follow-up from the 2026-04-21 Phase-A code and security
reviews into phase-b.md. No review item remains in an ad-hoc ledger;
each is mapped to a specific B0..B6 milestone, and every decision that
must close during Phase B is 🚩-flagged in its milestone and collected
under the plan's "Open questions" section.

New milestone **B0 — Phase A exit hygiene** prepends the original
EL-drop pipeline and absorbs:

- Security-review blockers #1 / #2 / #3 — UNSAFE-2026-0012 raw-pointer
  refactor, cross-table revocation policy, scheduler deadlock panic →
  idle task + typed error — as ADR-0021 / ADR-0022 / ADR-0023.
- Code-review follow-ups: architecture docs for kernel-objects / IPC /
  scheduler; missing tests (ReceiverTableFull, slot-reuse with pending
  transfer cap, returns-Err replacements for should-panic tests);
  const {assert!(N>0)} type-level invariants on SchedQueue and
  CapabilityTable; debug_assert → typed-error hardening.
- Non-blocking tracking items, placed in their natural milestone:
  generation wrap (B2 flag), fault containment scope (B5 flag →
  Phase E for full supervisor), Capability::Debug redaction (B5 before
  console_write syscall), write_bytes TX timeout (B6 flag), QEMU-smoke
  CI gate (B6 flag), cargo-vet init (B6 flag), TaskArena local → global
  StaticCell migration (bundled with T-006), boot.s early DAIFSet (B1
  BSP checklist), TPIDR_EL0 save-set watch (TLS trigger, no decision
  yet).

ADR numbers shifted: the pre-review 0021..0026 (EL-drop, MMU, AS,
loader, syscall ABI, initial syscalls) become 0024..0029 so the
B0 blockers can claim 0021..0023. Final numbers are assigned when the
ADR is actually written per ADR-0013.

current.md updated: active phase B, active milestone B0, first task to
open is T-006 (raw-pointer scheduler API refactor).

Also: docs/analysis/temp/ added to .gitignore as a local-only transient
workbook directory (follow-up ledgers, drafts, scratch notes that do not
belong in git history).

Refs: ADR-0013
Code-Review: docs/analysis/reviews/code-reviews/2026-04-21-umbrix-to-phase-a.md
Security-Review: docs/analysis/reviews/security-reviews/2026-04-21-umbrix-to-phase-a.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cemililik added a commit that referenced this pull request Apr 27, 2026
Verified each open item from the second review pass. All applied; the
deferred Q1 settled with a strict-§3 discipline call.

Q1 (HCR_EL2 Amendment) — strict discipline picked over carve-out
The first review-round (commit 9a8e312) added the HCR_EL2 literal-
write rationale to UNSAFE-2026-0017's §Invariants relied on as an
in-place body edit. The second review-round flagged this as
asymmetric against V13/V14 (which went via Amendment) and offered
two paths: (1) codify a same-PR-mutable carve-out in
unsafe-policy.md §3, or (2) convert V7 to an Amendment.

Picked Path 2 (strict). Reasoning, in detail:

- The asymmetry within the PR is the actual problem; Path 1 does
  not fix it because V13/V14 went via Amendment despite being same-
  PR. The carve-out would need a sub-rule "additions in-place,
  corrections via Amendment" — a gray-zone rule that just opens
  more gray zones.
- Precedent: UNSAFE-2026-0006's 2026-04-23 Amendment for the post-
  T-009 QemuVirtCpu struct shape is exactly this pattern — adding
  to an existing entry's content via Amendment. Strict §3 reading
  matches the precedent.
- The §3 text "must not be rewritten once committed" is unambiguous
  when read literally. The strict reading is what the rule says.
- "Merge boundary" as the locking event introduces a second
  ambiguity (PR merge ≠ commit, and not observable from inside the
  audit log). Strict introducing-commit boundary is auditable.

Mechanics: removed the HCR_EL2 paragraph from §Invariants relied on
(reverting to f289d4d's introducing-commit shape); added a new
Amendment block at the end of UNSAFE-2026-0017 with the same
content + a "Discipline note for future readers" paragraph
documenting the introducing-commit boundary so the next PR does
not stumble into the same trap. The existing
2026-04-27-second-review-round Amendment (GAS halt-loop syntax
correction) stays as-is, now placed after the HCR_EL2 Amendment in
chronological order (first review-round commit 9a8e312 →
HCR_EL2; second review-round commit 39dd978 → GAS-syntax).

Orta #1 — T-008 + T-013 task-file checkbox flips
Same discipline T-011 received in 9a8e312, applied to T-008 and
T-013. Every delivered AC + DoD item flipped to [x] with
parenthetical delivery notes. T-013's two QEMU smoke items (AC
"Tests — QEMU smoke at default config" / "QEMU smoke at -machine
virtualization=on" + DoD's QEMU smoke line) are *intentionally*
left at [ ] — settled item 8 from the second review pass: those
gates are deferred to the maintainer / CI runner and the PR test
plan checklists them explicitly. T-008's "Cross-references go both
ways" DoD item is now [x] thanks to Düşük #3 below.

Düşük #1 — boot.s:21 leading-comment typo
Same wfe; b .- malformed token that V13 corrected via Amendment in
UNSAFE-2026-0017 also appeared in bsp-qemu-virt/src/boot.s line 21
(the leading comment block, not the actual asm — the asm uses the
named-label form correctly). In-source comment, not audit-log body;
in-place fine. Replaced with the named-label loop description so
the comment matches the asm.

Düşük #2 — coverage rerun report follow-up note
The 2026-04-27 coverage report's tables describe T-011's
761af95-tip state. The second review-round's drop_first_child_…
test fix (in 9a8e312) bumped cap/table.rs regions from 97.46 % →
97.60 % (+0.14 pp), with a corresponding +0.04 pp on the workspace
total (96.33 % → 96.37 %). The original tables are intentionally
not rewritten — the report describes T-011's tip; the post-fix
delta lives in a follow-up note appended to the report. Both AC
gates (sched ≥ 90 %, workspace ≥ 96 %) remain comfortably met.

Düşük #3 — bidirectional ADR ↔ architecture-doc cross-references
T-008's DoD line 88 ("Cross-references go both ways") was deferred
("ADRs cited from architecture docs are the same ADRs whose
§References sections cite the new architecture docs — or will
cite, in their next rider"). Closed in this commit with five
short §Revision notes riders pointing back at the architecture
docs:

- ADR-0010 (Timer trait) → architecture/hal.md Timer subsection.
- ADR-0017 (IPC primitive set) → architecture/ipc.md.
- ADR-0019 (Scheduler shape) → architecture/scheduler.md.
- ADR-0020 (ContextSwitch + Cpu v2) → architecture/scheduler.md.
- ADR-0022 (Idle task + typed Deadlock) → architecture/scheduler.md.

ADR-0021 already had a direct architecture/scheduler.md citation
(observed by the second review pass), so it does not need a rider.

Each rider is one short paragraph dated 2026-04-27, framed
"pointer to architecture doc" — same shape as ADR-0013's
2026-04-27 pointer-to-ADR-0025 rider. ADR-0019's rider also notes
that its previously-open "Idle task" question was settled by
ADR-0022.

Verification: cargo fmt clean; host-clippy / kernel-clippy /
kernel-build clean; host-test 143 / 143 green.

Refs: ADR-0010, ADR-0017, ADR-0019, ADR-0020, ADR-0022, ADR-0024
Audit: UNSAFE-2026-0017

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cemililik added a commit that referenced this pull request Apr 28, 2026
Two Düşük findings from the T-012-arc approval review:

1. `bsp-qemu-virt/src/main.rs::idle_entry` — function-level doc-comment
   was stale: still described the body as `spin_loop` + `yield_now`
   ("not `wfi` + `yield_now` — until a timer IRQ source lands (T-009)")
   and projected `wait_for_interrupt` as future work. The body itself
   landed in commit b4ed68c using `cpu.wait_for_interrupt()`; only the
   doc-comment lagged. The PR-#9 post-fix-sweep DoD rule
   (code-review.md item 7 — `git grep -F` after a behaviour change)
   would have caught this. Rewritten to describe the now-current
   behaviour: WFI + yield, ADR-0022 first-rider Sub-rider closed,
   v1's IPC demo never reaching idle in practice (so WFI is
   structurally unreachable in the demo path).

2. `docs/decisions/0021-raw-pointer-scheduler-ipc-bridge.md:123` —
   Amendment header read "2026-04-27" but the cited commit `28c5ce9`
   landed 2026-04-28; the matching UNSAFE-2026-0014 Amendment in
   `docs/audits/unsafe-log.md:225` already had the correct
   2026-04-28 date. One-character fix bringing the two Amendments
   into agreement on when the discipline-extension landed.

Bilgi #3 (TrapFrame `_reserved` never written by trampoline) and
Bilgi #4 (`compiler_fence` on spurious-acknowledge path is defensive)
are noted but not actioned — observation-grade only.

Gates: cargo fmt / host-clippy / kernel-clippy / kernel-build clean.
148/148 host tests unchanged (no code paths touched).

Refs: T-012, ADR-0021 (Amendment), ADR-0022 (Sub-rider closure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cemililik added a commit that referenced this pull request Apr 28, 2026
Code (5 fixes):

1. `bsp-qemu-virt/src/exceptions.rs` — irq_entry and panic_entry are now
   `unsafe extern "C" fn` (was `extern "C" fn`). Both have caller-side
   safety preconditions (frame validity for irq_entry; class validity
   for panic_entry) already documented in `# Safety` doc-comments;
   making them `unsafe fn` enforces the contract at the type level for
   any future Rust caller. The asm trampolines `bl` them unchanged.

2. `bsp-qemu-virt/src/gic.rs` — enable/disable now assert
   `irq.0 < GIC_MAX_IRQ` (1020) per ARM IHI 0048B §4.3.2 before
   computing distributor offsets. Without the check, an out-of-range
   IrqNumber would compute an MMIO offset outside the distributor
   window even with saturating arithmetic — writing to reserved or
   wrong-window addresses. The IrqController trait returns no Result,
   so panic is the right mechanism for kernel-internal contract
   violations. SAFETY comments updated to cite the range invariant.

3. `hal/src/timer.rs` — ns_to_ticks switched from floor to ceiling
   division (via `u128::div_ceil`). ADR-0010 §Decision outcome
   specifies `arm_deadline` semantics as "When `now_ns()` reaches or
   exceeds `deadline_ns`, the hardware timer IRQ fires"; flooring
   could arm the comparator at a tick whose time-equivalent is
   sub-tick *before* deadline_ns, violating the contract. Ceiling
   guarantees the comparator's tick-equivalent is ≥ deadline_ns. New
   `# Rounding` doc section explains the choice. Existing tests
   (round-trip at 62.5 MHz divisor frequency, one-second-yields-
   frequency at multiple frequencies, zero-ns, panic-on-zero-frequency)
   all still pass — they exercise frequencies that divide evenly into
   1e9 ns/s, where ceiling and floor agree.

4. `hal/src/timer.rs::ns_to_ticks_saturates_at_u64_max` — added two
   over-boundary cases (`freq = 1_000_000_001` and `freq = u64::MAX`)
   so the saturation *branch* (not just the boundary) is exercised.
   The original test hit only the exact boundary
   (`huge_ns * 1e9 = u64::MAX * 1e9` divides exactly).

5. `bsp-qemu-virt/src/exceptions.rs::TrapFrame` — no code change;
   the doc-comment's "Padding" claim about `_reserved` was already
   correct. Reviewer's Bilgi #3 was an observation, not a fix
   request.

Documentation (8 fixes):

6. `.claude/skills/conduct-approval-review/SKILL.md` — Turkish quote
   in line 30 translated to English ("kabul edilen bulguların tamamı
   düzeltildi mi?" → "Have all accepted findings been fixed?");
   severity labels in the output-format template translated
   (`Yüksek`/`Orta`/`Düşük` → `High`/`Medium`/`Low`); the line-53
   "Yüksek finding" prose translated; the example fenced code block
   gained a `markdown` language tag. CLAUDE.md §3 mandates English in
   all committed artefacts.

7. `docs/analysis/reviews/business-reviews/2026-04-27-B0-closure.md` —
   "Five `unsafe` regions" → "Seven" to match the actual table count
   (UNSAFE-2026-0012 through 0018 inclusive = 7 entries).

8. `docs/analysis/tasks/phase-b/README.md` — T-012 row status changed
   from `In Progress (design-first 2026-04-28)` to
   `In Review (2026-04-28)` to match the task file's header.

9. `docs/architecture/exceptions.md` — IRQ dispatch flow Mermaid
   diagram updated to reflect shipped behaviour: timer-IRQ branch
   now shows "msr CNTV_CTL_EL0 → gic.end_of_interrupt → return"
   (ack-and-ignore for v1) instead of the placeholder
   sched::on_timer_irq call. Spurious-INTID arm added (returns
   without EOI per GICv2 architecture spec). Saved-register list
   corrected: trampoline saves `x0..x18, x30` (not `x0..x29`); the
   prose now explicitly states that `x19..x29` are AAPCS64
   callee-saved and preserved by the Rust handler, not pushed by
   the trampoline.

10. `docs/decisions/0021-raw-pointer-scheduler-ipc-bridge.md` — the
    Amendment paragraph that said "When the first scheduler-touching
    IRQ handler lands, ... UNSAFE-2026-0014 ... gains an Amendment"
    reworded to reflect that the Amendment **already exists** (commit
    `28c5ce9`) and v1's body has no live citation yet because
    `irq_entry` is ack-and-ignore. Future scheduler-touching arcs
    add a follow-up Amendment recording the activation.

11. `docs/roadmap/current.md` — pre-T-011 baseline coverage numbers
    reconciled with the canonical B0 closure retro: sched/mod.rs was
    `83.93%` → `84.16%`; workspace was `96.33%` → `96.37%`. The post-
    T-011 numbers (93.97% / 96.37%) were already correct.

12. `docs/roadmap/phases/phase-b.md` — "T-012 closed" → "T-012
    delivered (pending verification)". The bullet's existing
    parenthetical already explained that closure waits on QEMU
    verification; the lead label is now consistent with that.

13. `docs/analysis/tasks/phase-b/T-012-exception-and-irq-infrastructure.md`
    — Dependencies field: T-009 status `In Review` → `Done` (closed
    2026-04-27 via PR #9); ADR-0024 mention sharpened to record its
    Accept date (2026-04-27).

Bilgi findings #3 (`TrapFrame._reserved` never written) and #4
(spurious-acknowledge `compiler_fence` is defensive-not-load-bearing)
intentionally not actioned per their "no action today" tags.

Gates: cargo fmt / host-clippy / kernel-clippy / kernel-build all
clean. host-test 148 / 148 unchanged (24 hal + 90 kernel + 34
test-hal); the new ns_to_ticks_saturates_at_u64_max assertions
extend an existing test rather than add a new one.

Refs: T-012, ADR-0010 (rounding clarification rider candidate),
ADR-0021 (Amendment reword), ADR-0024 (precondition language)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cemililik added a commit that referenced this pull request May 8, 2026
… 2026-05-08 + 2026-05-07 follow-ups

Closes the 12 remaining items flagged by the 2026-05-08 multi-axis
review's §Follow-up backlog (3 Track-2 Majors, 1 Track-3 Major fix
already closed in `59c08e9`, 6 Track-3 / Track-2 Minors, 2 Track-4
Minors, 2 Track-2 Nits) plus the carry-forward 2026-05-07 Track-H
NIT-1.

**Track 2 Majors (forward-flagged → ADR-0027 / phase-b ledger riders):**
- M1 (escape-hatch doc): ADR-0027 §Decision outcome (c) gains a bullet
  documenting `mem::forget` / `ManuallyDrop` / `let _ = ...` as
  deliberate-but-rare escape hatches, mirroring the
  `x86_64::structures::paging::MapperFlush` precedent.
- M2 (MMU-instance binding): ADR-0027 §Decision outcome (c) gains a
  bullet noting `MapperFlush::flush(self, mmu: &impl Mmu)` accepts
  any `Mmu`; multi-`Mmu` deployments (B3+ per-task `AddressSpace`,
  Phase C multi-CPU) will need a stronger token type. Out of scope
  for v1.
- M3 (ADR-0034 placeholder): ADR-0027 §Decision outcome adds an
  "ADR-0034 (kernel-image section permissions) placeholder" block
  alongside ADR-0033, and `phase-b.md` ADR ledger gains rows for
  both ADR-0033 and ADR-0034 with named-but-unallocated discipline.

**Track 3 Minors (governance / wording polish):**
- m1 + m2 (current.md L52): drop "Phase-2" prefix on "§Simulation
  table" (the table walks Steps 0–4, not a "Phase 2"); "Accept will
  be" → "Accept landed as" + actual commit SHA `bb0a6ba`.
- m3 (commit-style.md PR-numbering rider): new §"PR-number references
  in committed artefacts" subsection naming the recurrence (PR #18 +
  PR #20 each had a one-commit PR-number fix-up) and codifying three
  acceptable disciplines (defer banner authoring; reference branch
  slug; or use commit SHA).
- n1 (framing alignment): "first to apply Simulation forward" wording
  in current.md (banner + Active decisions row) and phase-b.md §B2
  status block aligned to the precise "first non-recovery-primitive
  state-machine ADR drafted under §Simulation" phrasing — ADR-0032's
  Propose did land with a table; the prior framing was technically
  defensible only under a narrow reading of "retro-extracted".

**Track 2 Nits (substance riders in ADR-0027):**
- #4 (DSB ISH vs DSB NSH rationale): §Simulation gains a rationale
  paragraph after the table — `ISH` is forward-compatible with the
  eventual SMP boot, sub-microsecond cost on single-core, matches
  Linux aarch64 `arch/arm64/mm/proc.S` for the same reason.
- #5 (TCR_EL1.AS wording tighten): line 59 reworded — `AS = 0`
  selects 8-bit ASID *size*, not "the ASID value is 0" (the value
  is `TTBR0_EL1.ASID = 0` and is what's "globally used in v1").
- #7 (line 17 §-citation precision) + Track-3 n1: "first
  non-recovery-primitive state-machine" framing now precise.
- #8 (memory-management.md L88 page-table descriptor cosmetic):
  ASCII bit-field diagram redrawn to match L2 block-descriptor
  reality (OutputAddress[47:21], not [47:12]); explanatory note
  added for L1-block / L3-page variants.

**Track 4 Minors (perf-harness.sh):**
- #1 (Ctrl-C cleanup trap): new `cleanup_in_flight` shell function +
  `trap '...' EXIT INT TERM` that kills any in-flight QEMU + watchdog
  PIDs tracked in `CURRENT_CMD_PID` / `CURRENT_WATCHDOG_PID` shell
  globals. `run_with_timeout` updates the globals at every call so
  the trap addresses whichever pair is currently in flight; clears
  them at every clean exit so the trap is a no-op outside iterations.
- #2 (p99 small-N reporting hygiene): generated baseline report
  Methodology section gains a "**Note on p99 at small `n`**" paragraph
  explaining that under nearest-rank `p99 == max` for `n < 100` and
  callers should not over-read it as a tail-latency signal until
  `n >= 100`.

**Track 4 Nit #3 (read_stats refactor):** **Already closed by PR #21
review-round commit `ef30b5c`** — the 8 `echo | awk` parses became a
single `while read` loop. Verified at HEAD (`grep -c "while read -r
key val" tools/perf-harness.sh` → 1 hit).

**2026-05-07 Track-H NIT-1 (Pending Amendment closure-path indexing):**
UNSAFE-2026-0019 / 0020 / 0021 each gain a 2026-05-08 "closure-path
indexed" Amendment naming the canonical clearance trigger (B5
Milestone, ADR-0030 entry-point, deadline-arming syscall) explicitly,
so a future reader of `unsafe-log.md` alone has the full picture
without leaving the file. No semantic change; co-locates information
that was previously distributed across `phase-b.md` cross-references.

Verification gates re-run on the integration branch:
- `cargo fmt --all -- --check` clean
- `cargo host-test` 159/159 (25 + 100 + 34)
- `cargo host-clippy` clean (-D warnings)
- `cargo kernel-clippy` clean
- `cargo kernel-build` clean
- `tools/perf-harness.sh --iterations=3` runs end-to-end with the new
  trap + Methodology note + while-read parsing intact

This commit + the prior 2026-05-08 review's recommendations close all
follow-ups identified by both 2026-05-07 and 2026-05-08 multi-axis
reviews. Forward-flagged items (10/12/13 from 2026-05-07) and any
review-round bot input on this integration branch remain the only
open items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cemililik added a commit that referenced this pull request May 31, 2026
…xt) (#37)

* docs(security): T-022 high-half migration security review — Approve

Standalone, time-separated (2026-05-31) security pass over the T-022 /
ADR-0033 high-half kernel migration, discharging the "awaiting explicit
security review" flag current.md carried for UNSAFE-2026-0031.

Verdict: Approve — eight axes pass (cryptography N/A), no live finding,
no fix required. The migration is a structural kernel/user isolation
*strengthening*: the kernel becomes absent from the TTBR0_EL1 regime an
EL0 task runs on (removing the Meltdown substrate, replacing a standing
per-descriptor AP invariant with structural absence). UNSAFE-2026-0031 +
the 0022/0023/0024 T-022 Amendments are policy-conformant and
second-reviewer-signed; the migration is smoke-verified fault-clean. The
review-round-3 hardenings (fail-fast assert! range guards + regime-correct
panic-handler UART alias) are verified sound and cannot misfire in v1
(incl. the no-double-panic property on the panic path).

Forward-flagged (non-blocking, pre-B6): the three T-021 carry-forward
gates (gate #1 — per-task console_write window + per-page user-VA->kernel-VA
translation — is the single most important pre-B6 gate) and ADR-0034
(kernel-image section W^X; privileged-side gap, EL0-unreachable).

Refs: T-022, ADR-0033
Security-Review: @cemililik (+ Claude Opus 4.8 (1M context) agent)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(adr): propose ADR-0037 — EL0 entry context

Sibling ADR to ADR-0033 (which scoped itself to the migration and named the
EL0 context a separate B6 task). Settles how an EL0 task's first entry is
expressed and how it integrates with the cooperative scheduler.

Decision (Option 1): reuse the existing Aarch64TaskContext + a one-shot
enter_el0 ERET trampoline; carry (user_entry, user_sp) in the x19/x20
callee-saved slots the cooperative switch already restores; lr = enter_el0,
sp = kernel stack. SP_EL1 (gate #2) is closed by construction — the task
enters EL0 from its own kernel context, so a later +0x400 trap lands on that
kernel stack. D2 settled: neither add EL0 fields to the kernel-object Task
NOR a separate TaskContext struct — the 168-byte Aarch64TaskContext layout
(and the security-critical context_switch_asm offsets) stay UNCHANGED, the
exact drift the T-022 security review flagged. v1 simplification:
SPSR_EL1 = 0x3C0 (EL0t, DAIF masked — cooperative, no preemption).

Opens T-023 (Draft) in the same commit per ADR-0025 §Rule 1. Introduces
UNSAFE-2026-0032 (the enter_el0 asm; security-sensitive → second-reviewer).

Refs: ADR-0037, T-023
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(adr): Accept ADR-0037 — EL0 entry context

Careful re-read (write-adr §10) passed in a separate commit from Propose:
forward-references grounded (T-023 exists; downstream consumers
deliberately absent as consumers-not-dependencies, per ADR-0033's pattern),
dependency chain complete, negative consequences carry real mitigations,
the §Simulation table walks the EL0 first-entry + trap state machine.

Refs: ADR-0037
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(task): T-023 — EL0 entry context (init_user_context + enter_el0 trampoline + per-task SP_EL1)

Implements ADR-0037 (the B6 EL0 entry-context decision) — the dormant
mechanism a userspace task needs to drop to EL0 on first dispatch. Closes
T-021 carry-forward gate #2 (SP_EL1). No runnable EL0 task yet (that is the
B6 wire-up, after gate #1 / gate #3); the mechanism is dormant and the QEMU
trace is byte-stable.

What lands:
- HAL: `ContextSwitch::init_user_context(ctx, user_entry, user_sp,
  kernel_stack_top)` — additive sibling of `init_context`; its `# Safety`
  states the EL0-trap kernel-stack size contract (must hold the ~272-byte
  trap frame + handler call tree per +0x400, stronger than init_context's).
- BSP: `QemuVirtCpu::init_user_context` seeds the *existing* Aarch64TaskContext
  (x19=user_entry, x20=user_sp, lr=enter_el0, sp=kernel_stack_top) — no
  struct/layout change, so context_switch_asm offsets are untouched (D2) —
  plus the `enter_el0` `#[unsafe(naked)]` ERET trampoline.
- Scheduler: `Scheduler::add_user_task` (mirrors add_task, calls
  init_user_context; debug_asserts kernel_stack_top non-null + 16-aligned —
  gate #2 belt-and-braces).
- test-hal: FakeContextSwitch::init_user_context + FakeTaskContext
  {is_user,user_sp}; host tests for the test-hal recorder and the scheduler
  route. All four ContextSwitch implementors updated (BSP + test-hal +
  scheduler's FakeCpu/ResetQueuesCpu).
- Audit: new UNSAFE-2026-0032 (the enter_el0 ERET asm; second-reviewer-flagged).

2026-05-31 review-round (1 HIGH + 3 Medium + 4 nits) folded in BEFORE this
commit:
- HIGH — enter_el0 now SCRUBS the register file (x0–x30 + v0–v31) before
  ERET. The cooperative switch restores only callee-saved regs, so without
  the scrub the first EL0 instruction would observe kernel pointers/state in
  x0/x1/x8 (context ptrs / kernel stack), x30 (kernel code addr), and the
  caller-saved SIMD — a disclosure orthogonal to ADR-0033's AP/UXN memory
  isolation. Recorded as a load-bearing invariant (UNSAFE-2026-0032) and in
  ADR-0037 §Revision notes (CLAUDE.md #1).
- SPSR_EL1 written via the register form (mov x8,#0x3c0; msr spsr_el1,x8) —
  the immediate MSR form is PSTATE-fields-only.
- gate #2 reconciled (by-construction + the debug_assert); EL0-trap
  stack-size contract documented; test-scope claim narrowed (the host test
  pins the scheduler route via the fake — the real QemuVirtCpu slot write +
  the scrub are audit + kernel-build verified, the BSP being bare-metal).

Docs synced: ADR-0037 §Revision notes (rider, append-only per ADR-0025 §Rule
2), T-023 (AC + review-history + row-to-verification mapping), hal.md +
scheduler.md (ContextSwitch surface gains init_user_context / add_user_task),
current.md (B6-opening banner + active-task/working-branch).

Gates green: cargo fmt; host + kernel clippy -D warnings; 342 host tests
(+2: test-hal init_user_context recorder, scheduler add_user_task route);
kernel build (the 67-instruction scrub asm assembles); QEMU smoke
byte-stable + fault-clean (mechanism dormant); cargo +nightly miri test
--workspace --exclude tyrne-bsp-qemu-virt clean (0 UB).

Security-relevant (the EL1→EL0 first-entry trust-boundary primitive); the
ADR + task design review-round is folded in, the formal code security pass
on UNSAFE-2026-0032 remains a DoD item before a real EL0 task is wired.

Refs: T-023, ADR-0037, UNSAFE-2026-0032
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(task): T-023 review-round 2 — FPCR/FPSR/TPIDR scrub, offset_of! asserts, user_sp + AS-active guards

Folds in the 2026-05-31 second review-round against the committed
implementation (verdict: Approve-with-nits). 2 Low + 4 nits, all verified
against the code and addressed.

- Low-1 — enter_el0's scrub omitted the FP-control/status + EL0 thread-ID
  register classes. It zeroed x0–x30 + v0–v31 but NOT FPCR/FPSR/TPIDR_EL0/
  TPIDRRO_EL0 — all EL0-readable, with architecturally-UNKNOWN reset values on
  real HW (Pi 4), so "clean, defined register file" was over-claimed off-QEMU
  (FPCR governs EL0 rounding/FZ/default-NaN; Linux zeroes FPCR/FPSR on exec).
  enter_el0 now also `msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr`. No current
  secret leak (kernel does no FP arithmetic, never writes TPIDR), but the EL0
  environment is now defined on real HW, not just QEMU's zeroed reset.
- Low-2 — the implicit x19/x20/lr/sp hand-off was pinned only by
  size_of == 168; a layout-preserving field reorder (e.g. fp/lr swap) would
  silently corrupt both context_switch_asm and the enter_el0 hand-off. Added
  offset_of! asserts for all five offsets (0/80/88/96/104) — hardens the
  pre-existing context_switch_asm coupling too.
- Nit — add_user_task now debug_asserts user_sp 16-byte alignment (it becomes
  SP_EL0; SCTLR_EL1.SA0 faults a misaligned EL0 stack), symmetric with the
  kernel-stack assert.
- Nit — the AS-must-be-active obligation (TTBR0 installed + EPD0 cleared at
  first dispatch) is now stated on the caller-facing # Safety of both
  init_user_context and add_user_task (it previously lived only on enter_el0's
  doc, which the future task_create_from_image author would not read).
- Nit (process honesty) — the review-round-1 rider shipped WITH bbc42e8, after
  ADR-0037's Accept (546febc): the careful-re-read missed the scrub HIGH;
  recorded in ADR-0037 §Revision notes per ADR-0025 §Rule 2.
- Nit — "67-instruction scrub" wording dropped (it is 63 register-zeros + 4
  special-register writes); docs reworded.

Docs synced: UNSAFE-2026-0032 (scrub scope), ADR-0037 §Revision notes (second
rider), T-023 (AC + review-history), hal.md, current.md.

Gates green: cargo fmt; host + kernel clippy -D warnings; 342 host tests;
kernel build (offset_of! asserts + the new MSRs assemble); QEMU smoke
byte-stable + fault-clean (mechanism still dormant); cargo +nightly miri test
--workspace --exclude tyrne-bsp-qemu-virt clean (0 UB).

Refs: T-023, ADR-0037, UNSAFE-2026-0032
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(task): T-023 review-round 3 — fix non-existent SHA, init_count doc drift, T-023 status/ACs

A multi-agent adversarial review of the whole T-023 arc (b6549d7..HEAD; 5
lenses × per-finding skeptic verification) found NO code/security/correctness/
asm defect — the register scrub (incl FPCR/FPSR/TPIDR), the offset_of!
asserts, the SPSR register form, the dormancy + gate ordering, all four
ContextSwitch implementors, and the host tests were verified correct. The
only confirmed findings were three doc-only consistency defects, all fixed:

- The T-022 security review (2026-05-31) cited ADR-0033's Accept as `db392c1`,
  which does not exist (transposed digit) — corrected to `db892c1`.
- `FakeContextSwitch::init_count()`'s rustdoc said "init_context calls" but the
  shared counter is now also incremented by `init_user_context`; doc broadened
  to match (mirrors the already-updated `FakeTaskContext.initialized` field doc).
- T-023 was still `Status: Draft` with all nine ACs unchecked despite being
  implemented + gates-green; moved to "In Review (… awaiting explicit security
  review …)" per the T-022 house convention and checked the nine satisfied ACs.

Gates re-confirmed: cargo fmt, host clippy -D warnings, 342 host tests (kernel
binary unchanged → the prior byte-stable QEMU smoke + Miri-0-UB still hold).

Refs: T-023, ADR-0037
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(test-hal): T-023 review-round 4 — init_context clears prior user markers on slot reuse

FakeContextSwitch::init_context (the kernel path) seeded initialized/
entry_addr/stack_top but left is_user / user_sp untouched, so a context slot
re-seeded user-task -> kernel-task would report stale is_user=true / a stale
user_sp. init_context now fully re-seeds a kernel context: clears is_user and
user_sp. Adds a regression test (init_context_clears_prior_user_markers_on_reuse).
Test-double-only correctness; no runtime/kernel effect.

Skipped (with reason) from the same review:
- "msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr leaks SP" — INVALID. For MSR/MRS,
  Rt=31 is XZR (the X[]/ZR accessor), not SP (the 31=SP rule is for load/store
  base + SP-form arithmetic). Verified: kernel-build assembles these cleanly,
  and the shipped `msr ttbr0_el1, xzr` disassembles as `msr TTBR0_EL1, xzr`
  (d518201f) and is smoke-clean (TTBR0 -> 0, would fault if SP). No change.
- "factor the enter_el0 scrub into a .irp/macro" — SKIPPED. The explicit
  per-register form is deliberately maximally auditable for a security-critical
  scrub (every scrubbed register visible); the register set is fixed; no defect.
- "give init_user_context a default impl (unimplemented!/delegate)" — SKIPPED.
  A required trait method already fails loudest (compile error) if an
  implementor omits it; unimplemented! would defer that to a runtime panic
  (against the kernel's panic-free discipline + kernel-clippy denies panic),
  and delegating to init_context would silently produce a broken EL0 entry.

Gates: cargo fmt; host clippy -D warnings; 343 host tests (+1); Miri 0 UB
(kernel binary unchanged -> prior byte-stable QEMU smoke holds).

Refs: T-023
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant