Development#3
Conversation
T-002 PR merged to main; all acceptance criteria and DoD items met. Milestone A3 (Kernel objects) closed — covered by T-002 in one task. T-003 (IPC primitives, Milestone A4) opened at Draft status. Blocked on ADR-0017 (IPC primitive set); propose ADR-0017 to advance to Ready. ADR-0018 (Badge scheme) must also be Accepted or explicitly deferred before A4 closes. Refs: ADR-0016, ADR-0017 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Settles the A4 IPC interface: pure rendezvous (send + recv + notify), fixed-size 4-word Message struct, up to one capability per message. reply_recv fastpath and badge scheme explicitly deferred to ADR-0018. Refs: ADR-0017 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ADR-0017 (IPC primitive set) accepted: send + recv + notify, fixed-size 4-word Message struct, up to one capability per message atomically. reply_recv fastpath and badge scheme deferred to ADR-0018. T-003 advances from Draft → Ready → In Progress; implementation begins on development branch. Refs: ADR-0017 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements the three IPC operations settled in ADR-0017: - ipc_send: rendezvous send on an Endpoint; delivers to a waiting receiver or enqueues the sender (SendPending state). - ipc_recv: rendezvous receive; drains a waiting sender or registers the caller as a waiting receiver (RecvWaiting state). - ipc_notify: non-blocking bit-OR into a Notification word. Capability transfer is atomic: the sender's cap is extracted via cap_take, held in the endpoint's waiter state, and installed into the receiver's table via insert_root on delivery. Pre-flight checks prevent partial transfer. Supporting changes: - cap::rights: add SEND, RECV, NOTIFY bits; update KNOWN_BITS. - cap::table: add cap_take (move-out for IPC transfer) and is_full (pre-flight check for receiver table capacity). - obj::arena: remove #[cfg(test)] from SlotId::index so the IPC layer can index into IpcQueues by raw slot index. - lib.rs: declare pub mod ipc; add T-003 doc reference. 55 host tests pass (44 previous + 11 new IPC tests). fmt, host-clippy, and kernel-clippy all clean. Refs: ADR-0017 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Marks T-003 IPC primitives as In Review: - All acceptance criteria and DoD checkboxes ticked. - current.md and phase-a.md updated to reflect In Review status. - ADR-0018 (badge scheme) noted as deferred per ADR-0017. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer's GuideImplements the initial IPC subsystem (A4/T-003) with synchronous send/recv/notify primitives over endpoints and notifications, adds capability rights to gate IPC operations, extends the capability table and object arena APIs to support atomic capability transfer and IPC bookkeeping, introduces the IpcQueues state machine, and updates ADRs and roadmap/docs to reflect the new milestone and task status. Sequence diagram for ipc_send and ipc_recv with capability transfersequenceDiagram
actor Sender
participant SenderTable as CapabilityTable_sender
participant Receiver as Receiver
participant ReceiverTable as CapabilityTable_receiver
participant EndpointArena
participant IpcQueues
Sender->>SenderTable: lookup(ep_cap)
SenderTable-->>Sender: Capability (CapObject::Endpoint, rights)
Sender->>SenderTable: lookup(transfer_cap?)
SenderTable-->>Sender: Capability or error
Sender->>EndpointArena: get(EndpointHandle.slot())
EndpointArena-->>Sender: Endpoint or error
Sender->>IpcQueues: state_of(EndpointHandle)
IpcQueues-->>Sender: EndpointState
alt EndpointState is RecvWaiting
Sender->>SenderTable: cap_take(transfer_cap?)
SenderTable-->>Sender: Capability or None
Sender->>IpcQueues: set_state(RecvComplete{msg,cap})
Sender-->>Sender: SendOutcome::Delivered
else EndpointState is Idle
Sender->>SenderTable: cap_take(transfer_cap?)
SenderTable-->>Sender: Capability or None
Sender->>IpcQueues: set_state(SendPending{msg,cap})
Sender-->>Sender: SendOutcome::Enqueued
else EndpointState is SendPending or RecvComplete
Sender-->>Sender: IpcError::QueueFull
end
%% Receiver path
Receiver->>ReceiverTable: lookup(ep_cap)
ReceiverTable-->>Receiver: Capability (CapObject::Endpoint, rights)
Receiver->>EndpointArena: get(EndpointHandle.slot())
EndpointArena-->>Receiver: Endpoint or error
Receiver->>IpcQueues: peek_state(EndpointHandle)
IpcQueues-->>Receiver: EndpointState
alt state has cap and ReceiverTable.is_full()
Receiver-->>Receiver: IpcError::ReceiverTableFull
else state is SendPending or RecvComplete
Receiver->>IpcQueues: take_state() -> {msg,cap}
Receiver->>ReceiverTable: insert_root(cap?)
ReceiverTable-->>Receiver: CapHandle or None
Receiver-->>Receiver: RecvOutcome::Received{msg,cap}
else state is Idle
Receiver->>IpcQueues: set_state(RecvWaiting)
Receiver-->>Receiver: RecvOutcome::Pending
else state is RecvWaiting
Receiver-->>Receiver: IpcError::QueueFull
end
Class diagram for new IPC subsystem and related capability typesclassDiagram
class Message {
+u64 label
+u64[3] params
+Message()
}
class IpcError {
<<enum>>
+InvalidCapability
+QueueFull
+InvalidTransferCap
+ReceiverTableFull
}
class SendOutcome {
<<enum>>
+Delivered
+Enqueued
}
class RecvOutcome {
<<enum>>
+Received(msg: Message, cap: Option_CapHandle_)
+Pending
}
class EndpointState {
<<enum>>
+Idle
+SendPending(msg: Message, cap: Option_Capability_)
+RecvWaiting
+RecvComplete(msg: Message, cap: Option_Capability_)
}
class IpcQueues {
-EndpointState[ENDPOINT_ARENA_CAPACITY] states
+IpcQueues new()
+EndpointState state_of(handle: EndpointHandle)
+EndpointState peek_state(handle: EndpointHandle)
}
class CapabilityTable {
+Capability lookup(handle: CapHandle) Result
+CapHandle insert_root(cap: Capability) Result
+Capability cap_take(handle: CapHandle) Result
+bool is_full()
}
class Capability {
+CapRights rights()
+CapObject object()
}
class CapRights {
<<bitflags>>
+DUPLICATE
+DERIVE
+REVOKE
+TRANSFER
+SEND
+RECV
+NOTIFY
+KNOWN_BITS
+bool contains(other: CapRights)
}
class CapObject {
<<enum>>
+Endpoint(handle: EndpointHandle)
+Notification(handle: NotificationHandle)
}
class EndpointArena {
+Endpoint get(slot: SlotId) Option
+Endpoint get_mut(slot: SlotId) Option
}
class NotificationArena {
+Notification get(slot: SlotId) Option
+Notification get_mut(slot: SlotId) Option
}
class EndpointHandle {
+SlotId slot()
}
class NotificationHandle {
+SlotId slot()
}
class SlotId {
+u16 index()
}
class Endpoint {
+Endpoint new(id: u64)
}
class Notification {
+Notification new(bits: u64)
+void set(bits: u64)
+u64 word()
}
%% Top-level IPC functions
class ipc_mod {
+IpcError ipc_send(ep_arena: EndpointArena, queues: IpcQueues, ep_cap: CapHandle, caller_table: CapabilityTable, msg: Message, transfer: Option_CapHandle_)
+IpcError ipc_recv(ep_arena: EndpointArena, queues: IpcQueues, ep_cap: CapHandle, caller_table: CapabilityTable)
+IpcError ipc_notify(notif_arena: NotificationArena, notif_cap: CapHandle, caller_table: CapabilityTable, bits: u64)
}
%% Relationships
ipc_mod --> Message
ipc_mod --> IpcError
ipc_mod --> SendOutcome
ipc_mod --> RecvOutcome
ipc_mod --> IpcQueues
ipc_mod --> EndpointArena
ipc_mod --> NotificationArena
ipc_mod --> CapabilityTable
IpcQueues --> EndpointState
IpcQueues "1" o-- "*" EndpointState : states
CapabilityTable --> Capability
Capability --> CapRights
Capability --> CapObject
CapObject --> EndpointHandle
CapObject --> NotificationHandle
EndpointHandle --> SlotId
NotificationHandle --> SlotId
EndpointArena --> EndpointHandle
EndpointArena --> Endpoint
NotificationArena --> NotificationHandle
NotificationArena --> Notification
SlotId --> u16
RecvOutcome --> Message
RecvOutcome --> CapHandle
EndpointState --> Message
EndpointState --> Capability
State diagram for EndpointState IPC waiter lifecyclestateDiagram-v2
[*] --> Idle
Idle --> SendPending: ipc_send(no_receiver)
Idle --> RecvWaiting: ipc_recv(no_sender)
RecvWaiting --> RecvComplete: ipc_send(receiver_waiting)
SendPending --> Idle: ipc_recv(deliver_msg_and_cap)
RecvComplete --> Idle: ipc_recv(collect_delivery)
SendPending --> SendPending: ipc_send_when_SendPending / IpcError_QueueFull
RecvComplete --> RecvComplete: ipc_send_when_RecvComplete / IpcError_QueueFull
RecvWaiting --> RecvWaiting: ipc_recv_when_RecvWaiting / IpcError_QueueFull
Idle --> Idle: invalid_capability / IpcError_InvalidCapability
SendPending --> SendPending: receiver_table_full / IpcError_ReceiverTableFull
RecvComplete --> RecvComplete: receiver_table_full / IpcError_ReceiverTableFull
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 30 minutes and 47 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThe PR advances the project from Milestone A3 (Kernel objects, now marked Done) to Milestone A4 (IPC primitives, In Review). It adds three new capability rights for IPC operations, implements synchronous send/recv rendezvous and non-blocking notify primitives over Endpoint and Notification kernel objects, extends the capability system with capability transfer operations, and documents the complete IPC specification in ADR-0017. Changes
Sequence Diagram(s)sequenceDiagram
actor Caller1 as Sender Task
participant CTable1 as Sender<br/>CapabilityTable
participant IPC as IpcQueues<br/>EndpointState
participant CTable2 as Receiver<br/>CapabilityTable
actor Caller2 as Receiver Task
Caller1->>CTable1: cap_take(transfer_cap)?
alt transfer valid
CTable1-->>Caller1: Capability
else invalid/stale
CTable1-->>Caller1: CapError
Caller1->>IPC: ipc_send() fails
end
Caller1->>IPC: ipc_send(ep_cap, msg, transfer?)
alt RecvWaiting state
IPC->>CTable2: insert_root(capability)
CTable2-->>IPC: CapHandle
IPC-->>Caller1: Delivered + CapHandle
IPC-->>Caller2: msg + CapHandle ready
else Idle state
IPC-->>Caller1: Enqueued
Caller2->>IPC: ipc_recv(ep_cap)
IPC->>CTable2: insert_root(capability)
CTable2-->>IPC: CapHandle
IPC-->>Caller2: Received + CapHandle
else QueueFull
IPC-->>Caller1: QueueFull error
end
sequenceDiagram
actor Caller as Task
participant CTable as CapabilityTable
participant NotifObj as Notification<br/>Object
participant IPC as ipc_notify
Caller->>IPC: ipc_notify(notif_cap, bits=0x05)
IPC->>NotifObj: validate notif_cap & NOTIFY right
alt valid capability
NotifObj->>NotifObj: word \|= bits (with saturation)
NotifObj-->>IPC: success
IPC-->>Caller: Ok(())
else invalid/stale/wrong-kind
IPC-->>Caller: InvalidCapability error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="kernel/src/ipc/mod.rs" line_range="204-206" />
<code_context>
+ let ep_handle = validate_ep_cap(caller_table, ep_cap, CapRights::SEND)?;
+
+ // Pre-flight: validate the transfer cap before touching endpoint state.
+ if let Some(xfer) = transfer {
+ caller_table
+ .lookup(xfer)
+ .map_err(|_| IpcError::InvalidTransferCap)?;
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** TRANSFER right is defined but not enforced on the transferred capability
`CapRights::TRANSFER` is documented as controlling whether a capability may be sent over IPC, but here we only check that the handle is live via `lookup` and never enforce that the capability has the `TRANSFER` right. As written, any capability the caller holds can be transferred. If the intent is to gate IPC transfer on this right, add a check like `cap.rights().contains(CapRights::TRANSFER)` in the transfer pre-flight and reject otherwise.
</issue_to_address>
### Comment 2
<location path="docs/decisions/0017-ipc-primitive-set.md" line_range="196" />
<code_context>
+
+### Option H — Multiple capabilities per message
+
+- Pro: more expressive; covers protocols that handoff a set of capabilities atomically.
+- Con: no A4/A6 scenario requires it; adding a loop and array before there is a test case is premature complexity.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Use the correct verbal form "hand off" instead of "handoff" here.
Here it’s used as a verb, so it should be written as two words: change "handoff" to "hand off".
```suggestion
- Pro: more expressive; covers protocols that hand off a set of capabilities atomically.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request implements the IPC primitives for Milestone A4, including ipc_send, ipc_recv, and ipc_notify, as defined in the new ADR-0017. The changes introduce the ipc module, add specific capability rights for IPC operations, and implement atomic capability transfer mechanisms within the CapabilityTable. Review feedback highlights a critical logic error in IpcQueues regarding the lack of generation tracking, which could lead to stale state reuse across different endpoint lifecycles. Additionally, the implementation of ipc_send is flagged for potential state corruption during fallible transfers, and there is a recommendation to refactor duplicated validation logic in ipc_notify into a helper function.
| pub struct IpcQueues { | ||
| states: [EndpointState; ENDPOINT_ARENA_CAPACITY], | ||
| } |
There was a problem hiding this comment.
The IpcQueues structure currently only stores the EndpointState for each slot, which is a critical logic error. It does not account for the lifecycle of the endpoints. Since IpcQueues is indexed by the raw slot index, it can retain stale state (like RecvWaiting) from a previously destroyed endpoint that occupied the same slot. If a new endpoint is created in that slot, it will inherit this state, potentially allowing a task to receive a message intended for a different (now destroyed) endpoint. To fix this, IpcQueues should store the generation for each slot and verify it in state_of and peek_state, resetting to Idle if they differ. This will also require making SlotId::generation accessible to the crate by removing its #[cfg(test)] restriction in kernel/src/obj/arena.rs.
| let state = queues.state_of(ep_handle); | ||
| let old = core::mem::replace(state, EndpointState::Idle); | ||
| match old { | ||
| EndpointState::RecvWaiting => { | ||
| // A receiver is waiting. Extract the cap and transition to | ||
| // RecvComplete so the receiver can pick up the result. | ||
| let owned = take_cap_if_some(caller_table, transfer)?; | ||
| *state = EndpointState::RecvComplete { msg, cap: owned }; | ||
| Ok(SendOutcome::Delivered) | ||
| } | ||
| EndpointState::Idle => { | ||
| // No receiver. Store the message in the endpoint queue. | ||
| let owned = take_cap_if_some(caller_table, transfer)?; | ||
| *state = EndpointState::SendPending { msg, cap: owned }; | ||
| Ok(SendOutcome::Enqueued) | ||
| } | ||
| occupied @ (EndpointState::SendPending { .. } | EndpointState::RecvComplete { .. }) => { | ||
| // Restore the original state unchanged. | ||
| *state = occupied; | ||
| Err(IpcError::QueueFull) | ||
| } | ||
| } |
There was a problem hiding this comment.
This implementation of ipc_send is vulnerable to state corruption. By calling core::mem::replace(state, EndpointState::Idle) at line 216, the original state is moved out. If take_cap_if_some subsequently fails (for example, if the transfer capability has children, which is not checked in the pre-flight), the function returns an error but the endpoint state remains Idle. If the original state was RecvWaiting, the registered receiver is lost, leading to kernel state inconsistency. A safer approach is to check the state using peek_state first, perform the fallible capability transfer, and only then commit the state change.
if matches!(queues.peek_state(ep_handle), EndpointState::SendPending { .. } | EndpointState::RecvComplete { .. }) {
return Err(IpcError::QueueFull);
}
let owned = take_cap_if_some(caller_table, transfer)?;
let state = queues.state_of(ep_handle);
let old = core::mem::replace(state, EndpointState::Idle);
match old {
EndpointState::RecvWaiting => {
*state = EndpointState::RecvComplete { msg, cap: owned };
Ok(SendOutcome::Delivered)
}
EndpointState::Idle => {
*state = EndpointState::SendPending { msg, cap: owned };
Ok(SendOutcome::Enqueued)
}
_ => unreachable!("Occupied states handled by pre-flight check"),
}| let notif_handle = { | ||
| let cap = caller_table | ||
| .lookup(notif_cap) | ||
| .map_err(|_| IpcError::InvalidCapability)?; | ||
| if !cap.rights().contains(CapRights::NOTIFY) { | ||
| return Err(IpcError::InvalidCapability); | ||
| } | ||
| match cap.object() { | ||
| CapObject::Notification(h) => h, | ||
| _ => return Err(IpcError::InvalidCapability), | ||
| } | ||
| }; |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
kernel/src/cap/table.rs (2)
444-466: Consider reusingfree_slotto avoid duplicating free-list bookkeeping.After
entry.take(), the remaining three lines replicate exactly whatfree_slotat Line 542 does (entry is alreadyNone, sofree_slot'sslot.entry = Noneis idempotent). Delegating keeps the free-list invariants in one place — iffree_slotever changes (e.g., to zero outentrymore carefully, or to assert on generation overflow),cap_takewill stay consistent automatically.♻️ Proposed refactor
self.unlink_from_siblings(index)?; - // Take the SlotEntry out of the slot before bumping the generation. let entry = self.slots[index as usize] .entry .take() .ok_or(CapError::InvalidHandle)?; - // Free-slot bookkeeping (mirrors free_slot, but entry is already None). - let old_head = self.free_head; - self.free_head = Some(index); - self.slots[index as usize].generation = - self.slots[index as usize].generation.wrapping_add(1); - self.slots[index as usize].next_free = old_head; + self.free_slot(index); Ok(entry.capability)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kernel/src/cap/table.rs` around lines 444 - 466, The cap_take implementation duplicates the free-list bookkeeping that free_slot already encapsulates; after taking the entry (the entry is already None), remove the manual updates to free_head/generation/next_free and instead call self.free_slot(index) (or the appropriate free_slot method) to perform the slot recycling; keep the prior unlink_from_siblings(index)? call and return entry.capability as before, and ensure free_slot tolerates being called when slot.entry is already None (it should be idempotent).
444-466: Missing unit tests forcap_takeandis_full.The new public API surface has no direct test coverage in this file's
testsmodule. Givencap_take's invariants overlap with but diverge fromcap_drop(generation bump + returnsCapability, sameHasChildren/InvalidHandlerefusals, interaction with sibling unlinking, interaction withis_full), please add tests covering at minimum:
cap_takeon a leaf returns the expectedCapabilityand invalidates the old handle.cap_takeon an interior node returnsHasChildrenwithout freeing.cap_takeon a stale handle returnsInvalidHandle.cap_takeon a middle sibling preserves outer siblings (mirrorsdrop_middle_sibling_preserves_list_integrity).- After
cap_takethe slot is reusable with a bumped generation.is_fulltransitions: empty → not full; saturated → full; aftercap_take/cap_drop→ not full.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kernel/src/cap/table.rs` around lines 444 - 466, Add unit tests to the existing tests module exercising cap_take and is_full: write tests that (1) call cap_take on a leaf slot and assert it returns the expected Capability and that resolve_handle/using the old CapHandle returns CapError::InvalidHandle; (2) call cap_take on a node with children and assert it returns CapError::HasChildren and does not free or change sibling links; (3) call cap_take with a stale handle (old generation) and assert CapError::InvalidHandle; (4) mirror drop_middle_sibling_preserves_list_integrity by taking a middle sibling with cap_take and assert outer siblings remain linked; (5) after cap_take verify the slot is reusable and its generation was bumped (allocate a new cap into the same index and ensure the new handle differs); and (6) add an is_full transition test: start with empty table → assert !is_full, fill to capacity → assert is_full, then perform cap_take and cap_drop and assert is_full becomes false. Use the existing CapTable helpers and existing test patterns (e.g. drop_middle_sibling_preserves_list_integrity) and assert specific CapError variants and generation/handle changes for identification.kernel/src/ipc/mod.rs (2)
526-543: Minor: unused intermediate cap insend_transfers_cap_atomically.The block at lines 534-539 installs a cap whose handle is bound to
chand immediately discarded (let (_, xfer_ep_handle) = { …; (ch, h) };). Onlyxfer_ep_handleis subsequently used — the firstsender_table.insert_root(c).unwrap()just leaves an orphan cap in the sender table that has nothing to do with what the test asserts. Tightens the test to only create what it uses:♻️ Suggested simplification
- // Give sender a second endpoint cap to transfer. - let (_, xfer_ep_handle) = { - let h = create_endpoint(&mut ep_arena, Endpoint::new(1)).unwrap(); - let c = Capability::new(all_ep_rights(), CapObject::Endpoint(h)); - let ch = sender_table.insert_root(c).unwrap(); - (ch, h) - }; + // Give sender a second endpoint to reference via a transfer cap. + let xfer_ep_handle = create_endpoint(&mut ep_arena, Endpoint::new(1)).unwrap(); let xfer_cap_h = { let c = Capability::new(all_task_rights(), CapObject::Endpoint(xfer_ep_handle)); sender_table.insert_root(c).unwrap() };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kernel/src/ipc/mod.rs` around lines 526 - 543, The test send_transfers_cap_atomically creates an unused intermediate capability when constructing (ch, h) and discarding ch; remove that orphan insertion and only create the endpoint handle and the single capability you actually use (xfer_ep_handle -> xfer_cap_h). Concretely, stop calling sender_table.insert_root(...) for the discarded ch returned from the block that calls create_endpoint, instead just obtain the endpoint handle from create_endpoint (create_endpoint -> xfer_ep_handle) and then insert only the Capability::new(all_task_rights(), CapObject::Endpoint(xfer_ep_handle)) into sender_table to produce xfer_cap_h.
269-297: Pre-flight check inipc_recvrelies on the sender's cap having been installable — worth a brief rationale comment.The
pending_has_cap && caller_table.is_full()guard (lines 271-278) is what keeps theSendPending { msg, cap }→install_cap_if_somepath from losing the message+cap when the receiver's table is full: without the guard, line 281'smem::replace(state, Idle)would commit beforeinsert_rootis attempted. The invariant that "install_cap_if_somecannot fail here" is subtle and load-bearing — a short comment would help future readers (and anyone touchinginstall_cap_if_someerror variants) avoid regressing it.📝 Suggested comment
let state = queues.state_of(ep_handle); let old = core::mem::replace(state, EndpointState::Idle); match old { EndpointState::SendPending { msg, cap } | EndpointState::RecvComplete { msg, cap } => { - // Deliver the message. Install cap (if any) into the receiver's table. + // Deliver the message. The pre-flight `is_full` check above + // guarantees `insert_root` has a free slot for `cap`, so + // `install_cap_if_some` cannot fail here — the state commit to + // `Idle` is therefore safe. let xfer = install_cap_if_some(caller_table, cap)?; Ok(RecvOutcome::Received { msg, cap: xfer }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kernel/src/ipc/mod.rs` around lines 269 - 297, Add a short rationale comment in ipc_recv just above the pre-flight guard that checks pending_has_cap && caller_table.is_full(), explaining that this pre-flight ensures install_cap_if_some(caller_table, cap) cannot fail when we later replace the endpoint state with EndpointState::Idle (via core::mem::replace(state, EndpointState::Idle)), so the SendPending/RecvComplete -> install_cap_if_some path won't lose the message+cap; also note that if install_cap_if_some's error variants or caller_table capacity semantics change, this invariant must be revisited.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/analysis/tasks/phase-a/T-003-ipc-primitives.md`:
- Around line 83-84: Remove the stale "(to be written before code lands)" note
from the ADR-0017 link and mark ADR-0017 as Accepted in the document
(referencing ADR-0017 / docs/decisions/0017-ipc-primitive-set.md), and for
ADR-0018 either remove the dead link or replace it with a deferral note that
points readers to ADR-0017's "Open questions" section (reference ADR-0018 /
docs/decisions/0018-badge-scheme.md) until ADR-0018 is authored; ensure the
"Acceptance criteria" section remains consistent with ADR-0017's Accepted
status.
In `@docs/roadmap/current.md`:
- Around line 9-14: The "Active task: T-003 — IPC primitives" is marked In
Review but the "Next review trigger" still says it triggers when T-003 reaches
In Review; reconcile by either (A) change the active task status in the "Active
task: T-003 — IPC primitives" bullet from "In Review" back to the correct
pre-review state (e.g., "In Progress") if it hasn't actually entered review, or
(B) update the "Next review trigger" bullet to reference the appropriate next
event now that T-003 is already In Review (e.g., "T-003 merge/PR merge" or the
A6 closure per phase-a.md). Ensure you edit the two bullets referencing "T-003 —
IPC primitives" and "Next review trigger" so they are consistent and point to
the actual next pending review (use explicit wording like "code + security
review on merge" or "A6 business review") and keep date/status text consistent.
In `@kernel/src/ipc/mod.rs`:
- Around line 203-237: ipc_send currently replaces the endpoint state with
EndpointState::Idle before attempting to take the transfer cap, so if
take_cap_if_some (which calls cap_take and can fail with CapError::HasChildren)
fails the prior RecvWaiting/SendPending state is lost; fix by performing the
non-destructive cap check or the actual cap_take before mutating the endpoint
state: either (a) extend the pre-flight to reject caps with children by calling
the same validation used by cap_take (or exposing a non-destructive has-children
check) on the transfer cap referenced by caller_table.lookup(xfer), or (b) move
the call to take_cap_if_some/cap_take so it happens before
core::mem::replace(state, EndpointState::Idle) and ensure on QueueFull branch
you return or restore the cap to the caller (or surface a new error) so no
receiver state (EndpointState::RecvWaiting) is overwritten; update ipc_send and
the helper take_cap_if_some accordingly and add the suggested regression test to
ensure RecvWaiting is preserved on InvalidTransferCap.
---
Nitpick comments:
In `@kernel/src/cap/table.rs`:
- Around line 444-466: The cap_take implementation duplicates the free-list
bookkeeping that free_slot already encapsulates; after taking the entry (the
entry is already None), remove the manual updates to
free_head/generation/next_free and instead call self.free_slot(index) (or the
appropriate free_slot method) to perform the slot recycling; keep the prior
unlink_from_siblings(index)? call and return entry.capability as before, and
ensure free_slot tolerates being called when slot.entry is already None (it
should be idempotent).
- Around line 444-466: Add unit tests to the existing tests module exercising
cap_take and is_full: write tests that (1) call cap_take on a leaf slot and
assert it returns the expected Capability and that resolve_handle/using the old
CapHandle returns CapError::InvalidHandle; (2) call cap_take on a node with
children and assert it returns CapError::HasChildren and does not free or change
sibling links; (3) call cap_take with a stale handle (old generation) and assert
CapError::InvalidHandle; (4) mirror drop_middle_sibling_preserves_list_integrity
by taking a middle sibling with cap_take and assert outer siblings remain
linked; (5) after cap_take verify the slot is reusable and its generation was
bumped (allocate a new cap into the same index and ensure the new handle
differs); and (6) add an is_full transition test: start with empty table →
assert !is_full, fill to capacity → assert is_full, then perform cap_take and
cap_drop and assert is_full becomes false. Use the existing CapTable helpers and
existing test patterns (e.g. drop_middle_sibling_preserves_list_integrity) and
assert specific CapError variants and generation/handle changes for
identification.
In `@kernel/src/ipc/mod.rs`:
- Around line 526-543: The test send_transfers_cap_atomically creates an unused
intermediate capability when constructing (ch, h) and discarding ch; remove that
orphan insertion and only create the endpoint handle and the single capability
you actually use (xfer_ep_handle -> xfer_cap_h). Concretely, stop calling
sender_table.insert_root(...) for the discarded ch returned from the block that
calls create_endpoint, instead just obtain the endpoint handle from
create_endpoint (create_endpoint -> xfer_ep_handle) and then insert only the
Capability::new(all_task_rights(), CapObject::Endpoint(xfer_ep_handle)) into
sender_table to produce xfer_cap_h.
- Around line 269-297: Add a short rationale comment in ipc_recv just above the
pre-flight guard that checks pending_has_cap && caller_table.is_full(),
explaining that this pre-flight ensures install_cap_if_some(caller_table, cap)
cannot fail when we later replace the endpoint state with EndpointState::Idle
(via core::mem::replace(state, EndpointState::Idle)), so the
SendPending/RecvComplete -> install_cap_if_some path won't lose the message+cap;
also note that if install_cap_if_some's error variants or caller_table capacity
semantics change, this invariant must be revisited.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bea7121b-136f-4bdb-8ee8-f715238302dc
📒 Files selected for processing (12)
docs/analysis/tasks/phase-a/README.mddocs/analysis/tasks/phase-a/T-002-kernel-object-storage.mddocs/analysis/tasks/phase-a/T-003-ipc-primitives.mddocs/decisions/0017-ipc-primitive-set.mddocs/decisions/README.mddocs/roadmap/current.mddocs/roadmap/phases/phase-a.mdkernel/src/cap/rights.rskernel/src/cap/table.rskernel/src/ipc/mod.rskernel/src/lib.rskernel/src/obj/arena.rs
| - **Active task:** [T-003 — IPC primitives](../analysis/tasks/phase-a/T-003-ipc-primitives.md) (status: **In Review**). | ||
| - **Working branch:** `development`. | ||
| - **Last completed milestone:** A2 — Capability table foundation, on 2026-04-21 (PR #1 merged to `main`). | ||
| - **Last completed task:** [T-001 — Capability table foundation](../analysis/tasks/phase-a/T-001-capability-table-foundation.md) — `Done` 2026-04-21. | ||
| - **Last completed milestone:** A3 — Kernel objects, on 2026-04-21 (PR merged to `main`). | ||
| - **Last completed task:** [T-002 — Kernel object storage foundation](../analysis/tasks/phase-a/T-002-kernel-object-storage.md) — `Done` 2026-04-21. | ||
| - **Last review:** [A2 completion business review](../analysis/reviews/business-reviews/2026-04-21-A2-completion.md) — 2026-04-21. | ||
| - **Next review trigger:** code + security review of T-002's implementation when it reaches `In Review`; business review waits for A6 per [phase-a.md closure](phases/phase-a.md). | ||
| - **Next review trigger:** code + security review of T-003's implementation when it reaches `In Review`; business review waits for A6 per [phase-a.md closure](phases/phase-a.md). |
There was a problem hiding this comment.
Minor inconsistency: "next review trigger" condition already satisfied.
Line 9 states T-003's status is already In Review, but Line 14 says the next review trigger is T-003 reaching In Review. Either the trigger has already fired (and this bullet should be updated to point at the next pending review, e.g., A6 closure or T-003 merge) or T-003 hasn't actually entered review yet and Line 9's status is premature. Please reconcile.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/roadmap/current.md` around lines 9 - 14, The "Active task: T-003 — IPC
primitives" is marked In Review but the "Next review trigger" still says it
triggers when T-003 reaches In Review; reconcile by either (A) change the active
task status in the "Active task: T-003 — IPC primitives" bullet from "In Review"
back to the correct pre-review state (e.g., "In Progress") if it hasn't actually
entered review, or (B) update the "Next review trigger" bullet to reference the
appropriate next event now that T-003 is already In Review (e.g., "T-003
merge/PR merge" or the A6 closure per phase-a.md). Ensure you edit the two
bullets referencing "T-003 — IPC primitives" and "Next review trigger" so they
are consistent and point to the actual next pending review (use explicit wording
like "code + security review on merge" or "A6 business review") and keep
date/status text consistent.
Bugs: - ipc_send: move take_cap_if_some before core::mem::replace so that a failing cap_take (e.g. HasChildren) leaves endpoint state unchanged; RecvWaiting is now preserved on InvalidTransferCap. A queue-full pre- check via peek_state avoids any cap manipulation on that path. - ipc_send: enforce CapRights::TRANSFER on the capability being sent; previously any live capability could be transferred regardless of rights. - IpcQueues: track arena generation per slot; state_of/peek_state now reset the slot to Idle when the handle's generation differs, preventing a newly allocated endpoint from inheriting stale waiter state left by a destroyed predecessor in the same slot. - SlotId::generation: remove #[cfg(test)] so the IPC generation-tracking logic can call it in production builds. Nitpicks: - cap_take: delegate free-list bookkeeping to free_slot instead of duplicating it inline. - cap/table: add six cap_take and is_full unit tests. - ipc_notify: extract validate_notif_cap helper (mirrors validate_ep_cap). - ipc_recv: add invariant comment on the receiver-table-full pre-flight. - tests: remove orphan inserted-but-unused caps in send_transfers_cap_* tests; add regression tests for the three bugs fixed above. Docs: - T-003-ipc-primitives.md: remove stale "(to be written before code lands)" note on ADR-0017; update ADR-0018 link to deferral note. - current.md: reconcile "Next review trigger" now that T-003 is In Review. - 0017-ipc-primitive-set.md: fix verb form "handoff" → "hand off". 64 host tests pass; fmt, host-clippy, kernel-clippy all clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@coderabbirai review |
Integrate every follow-up from the 2026-04-21 Phase-A code and security reviews into phase-b.md. No review item remains in an ad-hoc ledger; each is mapped to a specific B0..B6 milestone, and every decision that must close during Phase B is 🚩-flagged in its milestone and collected under the plan's "Open questions" section. New milestone **B0 — Phase A exit hygiene** prepends the original EL-drop pipeline and absorbs: - Security-review blockers #1 / #2 / #3 — UNSAFE-2026-0012 raw-pointer refactor, cross-table revocation policy, scheduler deadlock panic → idle task + typed error — as ADR-0021 / ADR-0022 / ADR-0023. - Code-review follow-ups: architecture docs for kernel-objects / IPC / scheduler; missing tests (ReceiverTableFull, slot-reuse with pending transfer cap, returns-Err replacements for should-panic tests); const {assert!(N>0)} type-level invariants on SchedQueue and CapabilityTable; debug_assert → typed-error hardening. - Non-blocking tracking items, placed in their natural milestone: generation wrap (B2 flag), fault containment scope (B5 flag → Phase E for full supervisor), Capability::Debug redaction (B5 before console_write syscall), write_bytes TX timeout (B6 flag), QEMU-smoke CI gate (B6 flag), cargo-vet init (B6 flag), TaskArena local → global StaticCell migration (bundled with T-006), boot.s early DAIFSet (B1 BSP checklist), TPIDR_EL0 save-set watch (TLS trigger, no decision yet). ADR numbers shifted: the pre-review 0021..0026 (EL-drop, MMU, AS, loader, syscall ABI, initial syscalls) become 0024..0029 so the B0 blockers can claim 0021..0023. Final numbers are assigned when the ADR is actually written per ADR-0013. current.md updated: active phase B, active milestone B0, first task to open is T-006 (raw-pointer scheduler API refactor). Also: docs/analysis/temp/ added to .gitignore as a local-only transient workbook directory (follow-up ledgers, drafts, scratch notes that do not belong in git history). Refs: ADR-0013 Code-Review: docs/analysis/reviews/code-reviews/2026-04-21-umbrix-to-phase-a.md Security-Review: docs/analysis/reviews/security-reviews/2026-04-21-umbrix-to-phase-a.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verified each open item from the second review pass. All applied; the deferred Q1 settled with a strict-§3 discipline call. Q1 (HCR_EL2 Amendment) — strict discipline picked over carve-out The first review-round (commit 9a8e312) added the HCR_EL2 literal- write rationale to UNSAFE-2026-0017's §Invariants relied on as an in-place body edit. The second review-round flagged this as asymmetric against V13/V14 (which went via Amendment) and offered two paths: (1) codify a same-PR-mutable carve-out in unsafe-policy.md §3, or (2) convert V7 to an Amendment. Picked Path 2 (strict). Reasoning, in detail: - The asymmetry within the PR is the actual problem; Path 1 does not fix it because V13/V14 went via Amendment despite being same- PR. The carve-out would need a sub-rule "additions in-place, corrections via Amendment" — a gray-zone rule that just opens more gray zones. - Precedent: UNSAFE-2026-0006's 2026-04-23 Amendment for the post- T-009 QemuVirtCpu struct shape is exactly this pattern — adding to an existing entry's content via Amendment. Strict §3 reading matches the precedent. - The §3 text "must not be rewritten once committed" is unambiguous when read literally. The strict reading is what the rule says. - "Merge boundary" as the locking event introduces a second ambiguity (PR merge ≠ commit, and not observable from inside the audit log). Strict introducing-commit boundary is auditable. Mechanics: removed the HCR_EL2 paragraph from §Invariants relied on (reverting to f289d4d's introducing-commit shape); added a new Amendment block at the end of UNSAFE-2026-0017 with the same content + a "Discipline note for future readers" paragraph documenting the introducing-commit boundary so the next PR does not stumble into the same trap. The existing 2026-04-27-second-review-round Amendment (GAS halt-loop syntax correction) stays as-is, now placed after the HCR_EL2 Amendment in chronological order (first review-round commit 9a8e312 → HCR_EL2; second review-round commit 39dd978 → GAS-syntax). Orta #1 — T-008 + T-013 task-file checkbox flips Same discipline T-011 received in 9a8e312, applied to T-008 and T-013. Every delivered AC + DoD item flipped to [x] with parenthetical delivery notes. T-013's two QEMU smoke items (AC "Tests — QEMU smoke at default config" / "QEMU smoke at -machine virtualization=on" + DoD's QEMU smoke line) are *intentionally* left at [ ] — settled item 8 from the second review pass: those gates are deferred to the maintainer / CI runner and the PR test plan checklists them explicitly. T-008's "Cross-references go both ways" DoD item is now [x] thanks to Düşük #3 below. Düşük #1 — boot.s:21 leading-comment typo Same wfe; b .- malformed token that V13 corrected via Amendment in UNSAFE-2026-0017 also appeared in bsp-qemu-virt/src/boot.s line 21 (the leading comment block, not the actual asm — the asm uses the named-label form correctly). In-source comment, not audit-log body; in-place fine. Replaced with the named-label loop description so the comment matches the asm. Düşük #2 — coverage rerun report follow-up note The 2026-04-27 coverage report's tables describe T-011's 761af95-tip state. The second review-round's drop_first_child_… test fix (in 9a8e312) bumped cap/table.rs regions from 97.46 % → 97.60 % (+0.14 pp), with a corresponding +0.04 pp on the workspace total (96.33 % → 96.37 %). The original tables are intentionally not rewritten — the report describes T-011's tip; the post-fix delta lives in a follow-up note appended to the report. Both AC gates (sched ≥ 90 %, workspace ≥ 96 %) remain comfortably met. Düşük #3 — bidirectional ADR ↔ architecture-doc cross-references T-008's DoD line 88 ("Cross-references go both ways") was deferred ("ADRs cited from architecture docs are the same ADRs whose §References sections cite the new architecture docs — or will cite, in their next rider"). Closed in this commit with five short §Revision notes riders pointing back at the architecture docs: - ADR-0010 (Timer trait) → architecture/hal.md Timer subsection. - ADR-0017 (IPC primitive set) → architecture/ipc.md. - ADR-0019 (Scheduler shape) → architecture/scheduler.md. - ADR-0020 (ContextSwitch + Cpu v2) → architecture/scheduler.md. - ADR-0022 (Idle task + typed Deadlock) → architecture/scheduler.md. ADR-0021 already had a direct architecture/scheduler.md citation (observed by the second review pass), so it does not need a rider. Each rider is one short paragraph dated 2026-04-27, framed "pointer to architecture doc" — same shape as ADR-0013's 2026-04-27 pointer-to-ADR-0025 rider. ADR-0019's rider also notes that its previously-open "Idle task" question was settled by ADR-0022. Verification: cargo fmt clean; host-clippy / kernel-clippy / kernel-build clean; host-test 143 / 143 green. Refs: ADR-0010, ADR-0017, ADR-0019, ADR-0020, ADR-0022, ADR-0024 Audit: UNSAFE-2026-0017 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Düşük findings from the T-012-arc approval review:
1. `bsp-qemu-virt/src/main.rs::idle_entry` — function-level doc-comment
was stale: still described the body as `spin_loop` + `yield_now`
("not `wfi` + `yield_now` — until a timer IRQ source lands (T-009)")
and projected `wait_for_interrupt` as future work. The body itself
landed in commit b4ed68c using `cpu.wait_for_interrupt()`; only the
doc-comment lagged. The PR-#9 post-fix-sweep DoD rule
(code-review.md item 7 — `git grep -F` after a behaviour change)
would have caught this. Rewritten to describe the now-current
behaviour: WFI + yield, ADR-0022 first-rider Sub-rider closed,
v1's IPC demo never reaching idle in practice (so WFI is
structurally unreachable in the demo path).
2. `docs/decisions/0021-raw-pointer-scheduler-ipc-bridge.md:123` —
Amendment header read "2026-04-27" but the cited commit `28c5ce9`
landed 2026-04-28; the matching UNSAFE-2026-0014 Amendment in
`docs/audits/unsafe-log.md:225` already had the correct
2026-04-28 date. One-character fix bringing the two Amendments
into agreement on when the discipline-extension landed.
Bilgi #3 (TrapFrame `_reserved` never written by trampoline) and
Bilgi #4 (`compiler_fence` on spurious-acknowledge path is defensive)
are noted but not actioned — observation-grade only.
Gates: cargo fmt / host-clippy / kernel-clippy / kernel-build clean.
148/148 host tests unchanged (no code paths touched).
Refs: T-012, ADR-0021 (Amendment), ADR-0022 (Sub-rider closure)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code (5 fixes): 1. `bsp-qemu-virt/src/exceptions.rs` — irq_entry and panic_entry are now `unsafe extern "C" fn` (was `extern "C" fn`). Both have caller-side safety preconditions (frame validity for irq_entry; class validity for panic_entry) already documented in `# Safety` doc-comments; making them `unsafe fn` enforces the contract at the type level for any future Rust caller. The asm trampolines `bl` them unchanged. 2. `bsp-qemu-virt/src/gic.rs` — enable/disable now assert `irq.0 < GIC_MAX_IRQ` (1020) per ARM IHI 0048B §4.3.2 before computing distributor offsets. Without the check, an out-of-range IrqNumber would compute an MMIO offset outside the distributor window even with saturating arithmetic — writing to reserved or wrong-window addresses. The IrqController trait returns no Result, so panic is the right mechanism for kernel-internal contract violations. SAFETY comments updated to cite the range invariant. 3. `hal/src/timer.rs` — ns_to_ticks switched from floor to ceiling division (via `u128::div_ceil`). ADR-0010 §Decision outcome specifies `arm_deadline` semantics as "When `now_ns()` reaches or exceeds `deadline_ns`, the hardware timer IRQ fires"; flooring could arm the comparator at a tick whose time-equivalent is sub-tick *before* deadline_ns, violating the contract. Ceiling guarantees the comparator's tick-equivalent is ≥ deadline_ns. New `# Rounding` doc section explains the choice. Existing tests (round-trip at 62.5 MHz divisor frequency, one-second-yields- frequency at multiple frequencies, zero-ns, panic-on-zero-frequency) all still pass — they exercise frequencies that divide evenly into 1e9 ns/s, where ceiling and floor agree. 4. `hal/src/timer.rs::ns_to_ticks_saturates_at_u64_max` — added two over-boundary cases (`freq = 1_000_000_001` and `freq = u64::MAX`) so the saturation *branch* (not just the boundary) is exercised. The original test hit only the exact boundary (`huge_ns * 1e9 = u64::MAX * 1e9` divides exactly). 5. `bsp-qemu-virt/src/exceptions.rs::TrapFrame` — no code change; the doc-comment's "Padding" claim about `_reserved` was already correct. Reviewer's Bilgi #3 was an observation, not a fix request. Documentation (8 fixes): 6. `.claude/skills/conduct-approval-review/SKILL.md` — Turkish quote in line 30 translated to English ("kabul edilen bulguların tamamı düzeltildi mi?" → "Have all accepted findings been fixed?"); severity labels in the output-format template translated (`Yüksek`/`Orta`/`Düşük` → `High`/`Medium`/`Low`); the line-53 "Yüksek finding" prose translated; the example fenced code block gained a `markdown` language tag. CLAUDE.md §3 mandates English in all committed artefacts. 7. `docs/analysis/reviews/business-reviews/2026-04-27-B0-closure.md` — "Five `unsafe` regions" → "Seven" to match the actual table count (UNSAFE-2026-0012 through 0018 inclusive = 7 entries). 8. `docs/analysis/tasks/phase-b/README.md` — T-012 row status changed from `In Progress (design-first 2026-04-28)` to `In Review (2026-04-28)` to match the task file's header. 9. `docs/architecture/exceptions.md` — IRQ dispatch flow Mermaid diagram updated to reflect shipped behaviour: timer-IRQ branch now shows "msr CNTV_CTL_EL0 → gic.end_of_interrupt → return" (ack-and-ignore for v1) instead of the placeholder sched::on_timer_irq call. Spurious-INTID arm added (returns without EOI per GICv2 architecture spec). Saved-register list corrected: trampoline saves `x0..x18, x30` (not `x0..x29`); the prose now explicitly states that `x19..x29` are AAPCS64 callee-saved and preserved by the Rust handler, not pushed by the trampoline. 10. `docs/decisions/0021-raw-pointer-scheduler-ipc-bridge.md` — the Amendment paragraph that said "When the first scheduler-touching IRQ handler lands, ... UNSAFE-2026-0014 ... gains an Amendment" reworded to reflect that the Amendment **already exists** (commit `28c5ce9`) and v1's body has no live citation yet because `irq_entry` is ack-and-ignore. Future scheduler-touching arcs add a follow-up Amendment recording the activation. 11. `docs/roadmap/current.md` — pre-T-011 baseline coverage numbers reconciled with the canonical B0 closure retro: sched/mod.rs was `83.93%` → `84.16%`; workspace was `96.33%` → `96.37%`. The post- T-011 numbers (93.97% / 96.37%) were already correct. 12. `docs/roadmap/phases/phase-b.md` — "T-012 closed" → "T-012 delivered (pending verification)". The bullet's existing parenthetical already explained that closure waits on QEMU verification; the lead label is now consistent with that. 13. `docs/analysis/tasks/phase-b/T-012-exception-and-irq-infrastructure.md` — Dependencies field: T-009 status `In Review` → `Done` (closed 2026-04-27 via PR #9); ADR-0024 mention sharpened to record its Accept date (2026-04-27). Bilgi findings #3 (`TrapFrame._reserved` never written) and #4 (spurious-acknowledge `compiler_fence` is defensive-not-load-bearing) intentionally not actioned per their "no action today" tags. Gates: cargo fmt / host-clippy / kernel-clippy / kernel-build all clean. host-test 148 / 148 unchanged (24 hal + 90 kernel + 34 test-hal); the new ns_to_ticks_saturates_at_u64_max assertions extend an existing test rather than add a new one. Refs: T-012, ADR-0010 (rounding clarification rider candidate), ADR-0021 (Amendment reword), ADR-0024 (precondition language) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 2026-05-08 + 2026-05-07 follow-ups Closes the 12 remaining items flagged by the 2026-05-08 multi-axis review's §Follow-up backlog (3 Track-2 Majors, 1 Track-3 Major fix already closed in `59c08e9`, 6 Track-3 / Track-2 Minors, 2 Track-4 Minors, 2 Track-2 Nits) plus the carry-forward 2026-05-07 Track-H NIT-1. **Track 2 Majors (forward-flagged → ADR-0027 / phase-b ledger riders):** - M1 (escape-hatch doc): ADR-0027 §Decision outcome (c) gains a bullet documenting `mem::forget` / `ManuallyDrop` / `let _ = ...` as deliberate-but-rare escape hatches, mirroring the `x86_64::structures::paging::MapperFlush` precedent. - M2 (MMU-instance binding): ADR-0027 §Decision outcome (c) gains a bullet noting `MapperFlush::flush(self, mmu: &impl Mmu)` accepts any `Mmu`; multi-`Mmu` deployments (B3+ per-task `AddressSpace`, Phase C multi-CPU) will need a stronger token type. Out of scope for v1. - M3 (ADR-0034 placeholder): ADR-0027 §Decision outcome adds an "ADR-0034 (kernel-image section permissions) placeholder" block alongside ADR-0033, and `phase-b.md` ADR ledger gains rows for both ADR-0033 and ADR-0034 with named-but-unallocated discipline. **Track 3 Minors (governance / wording polish):** - m1 + m2 (current.md L52): drop "Phase-2" prefix on "§Simulation table" (the table walks Steps 0–4, not a "Phase 2"); "Accept will be" → "Accept landed as" + actual commit SHA `bb0a6ba`. - m3 (commit-style.md PR-numbering rider): new §"PR-number references in committed artefacts" subsection naming the recurrence (PR #18 + PR #20 each had a one-commit PR-number fix-up) and codifying three acceptable disciplines (defer banner authoring; reference branch slug; or use commit SHA). - n1 (framing alignment): "first to apply Simulation forward" wording in current.md (banner + Active decisions row) and phase-b.md §B2 status block aligned to the precise "first non-recovery-primitive state-machine ADR drafted under §Simulation" phrasing — ADR-0032's Propose did land with a table; the prior framing was technically defensible only under a narrow reading of "retro-extracted". **Track 2 Nits (substance riders in ADR-0027):** - #4 (DSB ISH vs DSB NSH rationale): §Simulation gains a rationale paragraph after the table — `ISH` is forward-compatible with the eventual SMP boot, sub-microsecond cost on single-core, matches Linux aarch64 `arch/arm64/mm/proc.S` for the same reason. - #5 (TCR_EL1.AS wording tighten): line 59 reworded — `AS = 0` selects 8-bit ASID *size*, not "the ASID value is 0" (the value is `TTBR0_EL1.ASID = 0` and is what's "globally used in v1"). - #7 (line 17 §-citation precision) + Track-3 n1: "first non-recovery-primitive state-machine" framing now precise. - #8 (memory-management.md L88 page-table descriptor cosmetic): ASCII bit-field diagram redrawn to match L2 block-descriptor reality (OutputAddress[47:21], not [47:12]); explanatory note added for L1-block / L3-page variants. **Track 4 Minors (perf-harness.sh):** - #1 (Ctrl-C cleanup trap): new `cleanup_in_flight` shell function + `trap '...' EXIT INT TERM` that kills any in-flight QEMU + watchdog PIDs tracked in `CURRENT_CMD_PID` / `CURRENT_WATCHDOG_PID` shell globals. `run_with_timeout` updates the globals at every call so the trap addresses whichever pair is currently in flight; clears them at every clean exit so the trap is a no-op outside iterations. - #2 (p99 small-N reporting hygiene): generated baseline report Methodology section gains a "**Note on p99 at small `n`**" paragraph explaining that under nearest-rank `p99 == max` for `n < 100` and callers should not over-read it as a tail-latency signal until `n >= 100`. **Track 4 Nit #3 (read_stats refactor):** **Already closed by PR #21 review-round commit `ef30b5c`** — the 8 `echo | awk` parses became a single `while read` loop. Verified at HEAD (`grep -c "while read -r key val" tools/perf-harness.sh` → 1 hit). **2026-05-07 Track-H NIT-1 (Pending Amendment closure-path indexing):** UNSAFE-2026-0019 / 0020 / 0021 each gain a 2026-05-08 "closure-path indexed" Amendment naming the canonical clearance trigger (B5 Milestone, ADR-0030 entry-point, deadline-arming syscall) explicitly, so a future reader of `unsafe-log.md` alone has the full picture without leaving the file. No semantic change; co-locates information that was previously distributed across `phase-b.md` cross-references. Verification gates re-run on the integration branch: - `cargo fmt --all -- --check` clean - `cargo host-test` 159/159 (25 + 100 + 34) - `cargo host-clippy` clean (-D warnings) - `cargo kernel-clippy` clean - `cargo kernel-build` clean - `tools/perf-harness.sh --iterations=3` runs end-to-end with the new trap + Methodology note + while-read parsing intact This commit + the prior 2026-05-08 review's recommendations close all follow-ups identified by both 2026-05-07 and 2026-05-08 multi-axis reviews. Forward-flagged items (10/12/13 from 2026-05-07) and any review-round bot input on this integration branch remain the only open items. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xt) (#37) * docs(security): T-022 high-half migration security review — Approve Standalone, time-separated (2026-05-31) security pass over the T-022 / ADR-0033 high-half kernel migration, discharging the "awaiting explicit security review" flag current.md carried for UNSAFE-2026-0031. Verdict: Approve — eight axes pass (cryptography N/A), no live finding, no fix required. The migration is a structural kernel/user isolation *strengthening*: the kernel becomes absent from the TTBR0_EL1 regime an EL0 task runs on (removing the Meltdown substrate, replacing a standing per-descriptor AP invariant with structural absence). UNSAFE-2026-0031 + the 0022/0023/0024 T-022 Amendments are policy-conformant and second-reviewer-signed; the migration is smoke-verified fault-clean. The review-round-3 hardenings (fail-fast assert! range guards + regime-correct panic-handler UART alias) are verified sound and cannot misfire in v1 (incl. the no-double-panic property on the panic path). Forward-flagged (non-blocking, pre-B6): the three T-021 carry-forward gates (gate #1 — per-task console_write window + per-page user-VA->kernel-VA translation — is the single most important pre-B6 gate) and ADR-0034 (kernel-image section W^X; privileged-side gap, EL0-unreachable). Refs: T-022, ADR-0033 Security-Review: @cemililik (+ Claude Opus 4.8 (1M context) agent) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(adr): propose ADR-0037 — EL0 entry context Sibling ADR to ADR-0033 (which scoped itself to the migration and named the EL0 context a separate B6 task). Settles how an EL0 task's first entry is expressed and how it integrates with the cooperative scheduler. Decision (Option 1): reuse the existing Aarch64TaskContext + a one-shot enter_el0 ERET trampoline; carry (user_entry, user_sp) in the x19/x20 callee-saved slots the cooperative switch already restores; lr = enter_el0, sp = kernel stack. SP_EL1 (gate #2) is closed by construction — the task enters EL0 from its own kernel context, so a later +0x400 trap lands on that kernel stack. D2 settled: neither add EL0 fields to the kernel-object Task NOR a separate TaskContext struct — the 168-byte Aarch64TaskContext layout (and the security-critical context_switch_asm offsets) stay UNCHANGED, the exact drift the T-022 security review flagged. v1 simplification: SPSR_EL1 = 0x3C0 (EL0t, DAIF masked — cooperative, no preemption). Opens T-023 (Draft) in the same commit per ADR-0025 §Rule 1. Introduces UNSAFE-2026-0032 (the enter_el0 asm; security-sensitive → second-reviewer). Refs: ADR-0037, T-023 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(adr): Accept ADR-0037 — EL0 entry context Careful re-read (write-adr §10) passed in a separate commit from Propose: forward-references grounded (T-023 exists; downstream consumers deliberately absent as consumers-not-dependencies, per ADR-0033's pattern), dependency chain complete, negative consequences carry real mitigations, the §Simulation table walks the EL0 first-entry + trap state machine. Refs: ADR-0037 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(task): T-023 — EL0 entry context (init_user_context + enter_el0 trampoline + per-task SP_EL1) Implements ADR-0037 (the B6 EL0 entry-context decision) — the dormant mechanism a userspace task needs to drop to EL0 on first dispatch. Closes T-021 carry-forward gate #2 (SP_EL1). No runnable EL0 task yet (that is the B6 wire-up, after gate #1 / gate #3); the mechanism is dormant and the QEMU trace is byte-stable. What lands: - HAL: `ContextSwitch::init_user_context(ctx, user_entry, user_sp, kernel_stack_top)` — additive sibling of `init_context`; its `# Safety` states the EL0-trap kernel-stack size contract (must hold the ~272-byte trap frame + handler call tree per +0x400, stronger than init_context's). - BSP: `QemuVirtCpu::init_user_context` seeds the *existing* Aarch64TaskContext (x19=user_entry, x20=user_sp, lr=enter_el0, sp=kernel_stack_top) — no struct/layout change, so context_switch_asm offsets are untouched (D2) — plus the `enter_el0` `#[unsafe(naked)]` ERET trampoline. - Scheduler: `Scheduler::add_user_task` (mirrors add_task, calls init_user_context; debug_asserts kernel_stack_top non-null + 16-aligned — gate #2 belt-and-braces). - test-hal: FakeContextSwitch::init_user_context + FakeTaskContext {is_user,user_sp}; host tests for the test-hal recorder and the scheduler route. All four ContextSwitch implementors updated (BSP + test-hal + scheduler's FakeCpu/ResetQueuesCpu). - Audit: new UNSAFE-2026-0032 (the enter_el0 ERET asm; second-reviewer-flagged). 2026-05-31 review-round (1 HIGH + 3 Medium + 4 nits) folded in BEFORE this commit: - HIGH — enter_el0 now SCRUBS the register file (x0–x30 + v0–v31) before ERET. The cooperative switch restores only callee-saved regs, so without the scrub the first EL0 instruction would observe kernel pointers/state in x0/x1/x8 (context ptrs / kernel stack), x30 (kernel code addr), and the caller-saved SIMD — a disclosure orthogonal to ADR-0033's AP/UXN memory isolation. Recorded as a load-bearing invariant (UNSAFE-2026-0032) and in ADR-0037 §Revision notes (CLAUDE.md #1). - SPSR_EL1 written via the register form (mov x8,#0x3c0; msr spsr_el1,x8) — the immediate MSR form is PSTATE-fields-only. - gate #2 reconciled (by-construction + the debug_assert); EL0-trap stack-size contract documented; test-scope claim narrowed (the host test pins the scheduler route via the fake — the real QemuVirtCpu slot write + the scrub are audit + kernel-build verified, the BSP being bare-metal). Docs synced: ADR-0037 §Revision notes (rider, append-only per ADR-0025 §Rule 2), T-023 (AC + review-history + row-to-verification mapping), hal.md + scheduler.md (ContextSwitch surface gains init_user_context / add_user_task), current.md (B6-opening banner + active-task/working-branch). Gates green: cargo fmt; host + kernel clippy -D warnings; 342 host tests (+2: test-hal init_user_context recorder, scheduler add_user_task route); kernel build (the 67-instruction scrub asm assembles); QEMU smoke byte-stable + fault-clean (mechanism dormant); cargo +nightly miri test --workspace --exclude tyrne-bsp-qemu-virt clean (0 UB). Security-relevant (the EL1→EL0 first-entry trust-boundary primitive); the ADR + task design review-round is folded in, the formal code security pass on UNSAFE-2026-0032 remains a DoD item before a real EL0 task is wired. Refs: T-023, ADR-0037, UNSAFE-2026-0032 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(task): T-023 review-round 2 — FPCR/FPSR/TPIDR scrub, offset_of! asserts, user_sp + AS-active guards Folds in the 2026-05-31 second review-round against the committed implementation (verdict: Approve-with-nits). 2 Low + 4 nits, all verified against the code and addressed. - Low-1 — enter_el0's scrub omitted the FP-control/status + EL0 thread-ID register classes. It zeroed x0–x30 + v0–v31 but NOT FPCR/FPSR/TPIDR_EL0/ TPIDRRO_EL0 — all EL0-readable, with architecturally-UNKNOWN reset values on real HW (Pi 4), so "clean, defined register file" was over-claimed off-QEMU (FPCR governs EL0 rounding/FZ/default-NaN; Linux zeroes FPCR/FPSR on exec). enter_el0 now also `msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr`. No current secret leak (kernel does no FP arithmetic, never writes TPIDR), but the EL0 environment is now defined on real HW, not just QEMU's zeroed reset. - Low-2 — the implicit x19/x20/lr/sp hand-off was pinned only by size_of == 168; a layout-preserving field reorder (e.g. fp/lr swap) would silently corrupt both context_switch_asm and the enter_el0 hand-off. Added offset_of! asserts for all five offsets (0/80/88/96/104) — hardens the pre-existing context_switch_asm coupling too. - Nit — add_user_task now debug_asserts user_sp 16-byte alignment (it becomes SP_EL0; SCTLR_EL1.SA0 faults a misaligned EL0 stack), symmetric with the kernel-stack assert. - Nit — the AS-must-be-active obligation (TTBR0 installed + EPD0 cleared at first dispatch) is now stated on the caller-facing # Safety of both init_user_context and add_user_task (it previously lived only on enter_el0's doc, which the future task_create_from_image author would not read). - Nit (process honesty) — the review-round-1 rider shipped WITH bbc42e8, after ADR-0037's Accept (546febc): the careful-re-read missed the scrub HIGH; recorded in ADR-0037 §Revision notes per ADR-0025 §Rule 2. - Nit — "67-instruction scrub" wording dropped (it is 63 register-zeros + 4 special-register writes); docs reworded. Docs synced: UNSAFE-2026-0032 (scrub scope), ADR-0037 §Revision notes (second rider), T-023 (AC + review-history), hal.md, current.md. Gates green: cargo fmt; host + kernel clippy -D warnings; 342 host tests; kernel build (offset_of! asserts + the new MSRs assemble); QEMU smoke byte-stable + fault-clean (mechanism still dormant); cargo +nightly miri test --workspace --exclude tyrne-bsp-qemu-virt clean (0 UB). Refs: T-023, ADR-0037, UNSAFE-2026-0032 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(task): T-023 review-round 3 — fix non-existent SHA, init_count doc drift, T-023 status/ACs A multi-agent adversarial review of the whole T-023 arc (b6549d7..HEAD; 5 lenses × per-finding skeptic verification) found NO code/security/correctness/ asm defect — the register scrub (incl FPCR/FPSR/TPIDR), the offset_of! asserts, the SPSR register form, the dormancy + gate ordering, all four ContextSwitch implementors, and the host tests were verified correct. The only confirmed findings were three doc-only consistency defects, all fixed: - The T-022 security review (2026-05-31) cited ADR-0033's Accept as `db392c1`, which does not exist (transposed digit) — corrected to `db892c1`. - `FakeContextSwitch::init_count()`'s rustdoc said "init_context calls" but the shared counter is now also incremented by `init_user_context`; doc broadened to match (mirrors the already-updated `FakeTaskContext.initialized` field doc). - T-023 was still `Status: Draft` with all nine ACs unchecked despite being implemented + gates-green; moved to "In Review (… awaiting explicit security review …)" per the T-022 house convention and checked the nine satisfied ACs. Gates re-confirmed: cargo fmt, host clippy -D warnings, 342 host tests (kernel binary unchanged → the prior byte-stable QEMU smoke + Miri-0-UB still hold). Refs: T-023, ADR-0037 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test-hal): T-023 review-round 4 — init_context clears prior user markers on slot reuse FakeContextSwitch::init_context (the kernel path) seeded initialized/ entry_addr/stack_top but left is_user / user_sp untouched, so a context slot re-seeded user-task -> kernel-task would report stale is_user=true / a stale user_sp. init_context now fully re-seeds a kernel context: clears is_user and user_sp. Adds a regression test (init_context_clears_prior_user_markers_on_reuse). Test-double-only correctness; no runtime/kernel effect. Skipped (with reason) from the same review: - "msr {fpcr,fpsr,tpidr_el0,tpidrro_el0}, xzr leaks SP" — INVALID. For MSR/MRS, Rt=31 is XZR (the X[]/ZR accessor), not SP (the 31=SP rule is for load/store base + SP-form arithmetic). Verified: kernel-build assembles these cleanly, and the shipped `msr ttbr0_el1, xzr` disassembles as `msr TTBR0_EL1, xzr` (d518201f) and is smoke-clean (TTBR0 -> 0, would fault if SP). No change. - "factor the enter_el0 scrub into a .irp/macro" — SKIPPED. The explicit per-register form is deliberately maximally auditable for a security-critical scrub (every scrubbed register visible); the register set is fixed; no defect. - "give init_user_context a default impl (unimplemented!/delegate)" — SKIPPED. A required trait method already fails loudest (compile error) if an implementor omits it; unimplemented! would defer that to a runtime panic (against the kernel's panic-free discipline + kernel-clippy denies panic), and delegating to init_context would silently produce a broken EL0 entry. Gates: cargo fmt; host clippy -D warnings; 343 host tests (+1); Miri 0 UB (kernel binary unchanged -> prior byte-stable QEMU smoke holds). Refs: T-023 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary by Sourcery
Introduce the initial IPC subsystem with synchronous endpoint-based send/receive/notify operations, including capability transfer, and update documentation and roadmap to mark the kernel-object milestone complete and define the new IPC milestone.
New Features:
Enhancements:
Documentation:
Tests:
Summary by CodeRabbit
New Features
send,recv, andnotifyoperations for synchronous message passing between kernel objects with capability-gated access.Documentation