Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 41 additions & 36 deletions notes/api-redesign-prototype/api_redesign_proposal.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Current `boa_gc` uses implicit rooting via `Clone`/`Drop` on `Gc<T>`. Every clone touches root counts, adding overhead in hot VM paths. It also needs `thread_local`, blocking `no_std`.

This proposes lifetime-branded `Gc<'gc, T>` for zero cost pointers and explicit `Root<T>` for persistence.
This proposes lifetime-branded `Gc<'gc, T>` for zero cost pointers and explicit `Root<'id, T>` for persistence.

## Core API

Expand All @@ -31,62 +31,65 @@ pub struct GcRefCell<T: Trace> {

### Weak Reference Separation
```rust
pub struct WeakGc<T: Trace + ?Sized> {
pub struct WeakGc<'id, T: Trace + ?Sized> {
ptr: NonNull<GcBox<T>>,
_marker: PhantomData<*mut &'id ()>,
}

impl<T: Trace + ?Sized> WeakGc<T> {
pub fn upgrade<'gc>(&self, cx: &MutationContext<'gc>) -> Option<Gc<'gc, T>> { ... }
impl<'id, T: Trace + ?Sized> WeakGc<'id, T> {
pub fn upgrade<'gc>(&self, cx: &MutationContext<'id, 'gc>) -> Option<Gc<'gc, T>> { ... }
}
```
Weak references drop their tie to the single `'gc` lifetime. Instead, they are upgraded back into strong `Gc` pointers only when explicitly bound against an active safe `MutationContext<'gc>`.
Weak references carry the same `'id` brand as the context they came from. `upgrade` requires a matching `MutationContext<'id, 'gc>`, so cross-context upgrade is a compile error.

The `'gc` lifetime ties the pointer to its collector. Copying is free, no root count manipulation.

### Root for Persistence

```rust
pub struct Root<T: Trace> {
link: RootLink, // Intrusive list node (prev/next only), at offset 0 so bare link* == Root*
gc_ptr: NonNull<GcBox<T>>, // T: Sized keeps this thin for type erased offset_of!
/// Cross collector misuse detection only, plays no role in unlinking.
collector_id: u64,
_marker: PhantomData<*const ()>,
pub struct Root<'id, T: Trace> {
raw: NonNull<RootNode<'id, T>>,
}

impl<T: Trace> Root<T> {
pub fn get<'gc>(&self, cx: &MutationContext<'gc>) -> Gc<'gc, T> {
assert_eq!(self.collector_id, cx.collector.id);
// ...
}
#[repr(C)]
pub(crate) struct RootNode<'id, T: Trace> {
link: RootLink, // at offset 0, bare link* == RootNode*
gc_ptr: NonNull<GcBox<T>>, // T: Sized keeps this thin for type-erased offset_of!
_marker: PhantomData<*mut &'id ()>,
}

impl<'id, T: Trace> Root<'id, T> {
pub fn get<'gc>(&self, _cx: &MutationContext<'id, 'gc>) -> Gc<'gc, T> { ... }
}

impl<T: Trace> Drop for Root<T> {
impl<'id, T: Trace> Drop for Root<'id, T> {
fn drop(&mut self) {
// O(1) self unlink: splice prev/next together, no list reference needed
if self.link.is_linked() {
unsafe {
RootLink::unlink(NonNull::from(&self.link));
unsafe {
let node = Box::from_raw(self.raw.as_ptr());
if node.link.is_linked() {
RootLink::unlink(NonNull::from(&node.link));
}
}
}
}
```

`Root<T>` escapes the `'gc` lifetime. Returned as `Pin<Box<Root<T>>>` for stable addresses (required by the intrusive list). Stores `collector_id` to catch cross-collector misuse at runtime — it is **not** used during unlink; `Drop` only touches the embedded `prev`/`next` pointers.
`Root<'id, T>` escapes the `'gc` lifetime but is tied to the `GcContext<'id>` that created it. The node is heap-allocated via `Box::into_raw`, keeping its address stable for the intrusive list without requiring `Pin` on the public API. `Drop` reclaims the allocation after unlinking. Cross-context misuse is a compile error, not a runtime panic.

**No `Rc` required.** A root only needs its own embedded `prev`/`next` pointers to remove itself from the list. The `Collector` owns a **sentinel** node; insertion and removal are pure pointer surgery with no allocation and no reference counting.

### MutationContext

```rust
pub struct MutationContext<'gc> {
pub struct MutationContext<'id, 'gc> {
collector: &'gc Collector,
_marker: PhantomData<*mut &'id ()>,
}

impl<'gc> MutationContext<'gc> {
impl<'id, 'gc> MutationContext<'id, 'gc> {
pub fn alloc<T: Trace>(&self, value: T) -> Gc<'gc, T> { ... }
pub fn root<T: Trace>(&self, gc: Gc<'gc, T>) -> Pin<Box<Root<T>>> { ... }
pub fn alloc_weak<T: Trace>(&self, value: T) -> WeakGc<'id, T> { ... }
pub fn root<T: Trace>(&self, gc: Gc<'gc, T>) -> Root<'id, T> { ... }
pub fn collect(&self) { ... }
}
```
Expand All @@ -101,22 +104,24 @@ The `Collector` owns one **pinned sentinel** `RootLink` (a bare link node with n
Collector::sentinel -> root_a.link -> root_b.link -> root_c.link -> None
```

Roots insert themselves immediately after the sentinel via `RootLink::link_after`. During collection, `RootLink::iter_from_sentinel(sentinel)` starts from `sentinel.next`, so the sentinel itself is never yielded. For each link, `gc_ptr` is recovered via `offset_of!(Root<i32>, gc_ptr)` and used to mark the allocation.
Roots insert themselves immediately after the sentinel via `RootLink::link_after`. During collection, `RootLink::iter_from_sentinel(sentinel)` starts from `sentinel.next`, so the sentinel itself is never yielded. For each link, `gc_ptr` is recovered via `offset_of!(RootNode<i32>, gc_ptr)` and used to mark the allocation. A `debug_assert_eq!` with a second concrete type verifies the offset is stable across all `T: Sized`.

### Entry Point

```rust
pub struct GcContext {
pub struct GcContext<'id> {
collector: Collector,
_marker: PhantomData<*mut &'id ()>,
}

impl GcContext {
pub fn new() -> Self { ... }
pub fn mutate<R>(&self, f: impl for<'gc> FnOnce(&MutationContext<'gc>) -> R) -> R { ... }
pub fn with_gc<R, F: for<'id> FnOnce(GcContext<'id>) -> R>(f: F) -> R { ... }

impl<'id> GcContext<'id> {
pub fn mutate<R>(&self, f: impl for<'gc> FnOnce(&MutationContext<'id, 'gc>) -> R) -> R { ... }
}
```

By owning the `Collector`, `GcContext` defines the entire host timeline. The `for<'gc>` pattern from gc-arena creates a unique lifetime isolating active context mutations per arena.
`with_gc` is the only way to create a `GcContext`. The `for<'id>` bound gives each context a fresh, unique lifetime that cannot unify with any other context's `'id`. `GcContext::mutate` threads that same `'id` into every `MutationContext` produced inside the closure.

### Tracing Mechanism
```rust
Expand All @@ -139,7 +144,7 @@ Note: `trace` takes `&mut self` instead of `&self`, ensuring that potential movi
| **Rooting** | Implicit (inc/dec on clone/drop) | Explicit (`Root<T>`) |
| **Copy cost** | Cell write | Zero |
| **Drop cost** | TLS access (futex lock) | Zero (Copy type) |
| **Isolation** | Runtime only | Compile-time + runtime validation |
| **Isolation** | Runtime only | Compile-time only |

## Why This Works

Expand All @@ -150,12 +155,12 @@ Note: `trace` takes `&mut self` instead of `&self`, ensuring that potential movi
**Allocation**: Uses `mempool3::PoolAllocator` with size-class pooling instead of individual `Box` allocations, avoiding fragmentation.

**Safety**:
- Cross-context caught at compile time for `Gc`
- Cross-collector caught at runtime for `Root`
- Cross-context use of `Gc`, `Root`, and `WeakGc` is a compile error, not a runtime panic
- No `collector_id` field, no atomic counter, no branch in `Root::get`
- Explicit `!Send`/`!Sync` prevents threading bugs
- Intrusive sentinel based linked list for O(1) insertion and self-unlink
- Intrusive sentinel-based linked list for O(1) insertion and self-unlink
- `Root` holds **no `Rc`**, unlink is pure pointer surgery on embedded `prev`/`next`
- `Pin<Box<Root<T>>>` guarantees stable node addresses while linked
- Node address stability comes from `Box::into_raw`, `Pin` is not required on the public API

## Open Questions

Expand Down
78 changes: 21 additions & 57 deletions notes/api-redesign-prototype/prototype_findings.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Prototype Findings

Prototyping lifetime-branded GC API for Boa. Testing if `Gc<'gc, T>` + `Root<T>` is viable.
Prototyping lifetime-branded GC API for Boa. Testing if `Gc<'gc, T>` + `Root<'id, T>` is viable.

Works, but migration will be challenging.

Expand Down Expand Up @@ -51,28 +51,25 @@ Fix: `RefCell` inside collector, take `&self`.

```rust
struct JsContext {
global_object: Root<JsObject>, // escapes 'gc
global_object: Root<'id, JsObject>, // escapes 'gc, tied to its GcContext<'id>
}
```

Root re-enters via `root.get(&cx)`.
Root re-enters via `root.get(cx)` where `cx: &MutationContext<'id, 'gc>` must share the same `'id`.

### Collector ID Validation
### Cross-Context Safety via `'id` Brand

Problem: `Root<T>` from collector A used with context B dangling pointer.
Problem: `Root<T>` from context A used with context B -> dangling pointer.

Solution: Each collector gets unique ID, `Root` validates:
Solution: `with_gc` gives each context a fresh, unnamed `'id` lifetime via `for<'id>`. `Root<'id, T>` and `MutationContext<'id, 'gc>` share that brand, so the borrow checker rejects any mismatch at compile time:

```rust
impl<T: Trace> Root<T> {
pub fn get<'gc>(&self, cx: &MutationContext<'gc>) -> Gc<'gc, T> {
assert_eq!(self.collector_id, cx.collector.id);
// ...
}
impl<'id, T: Trace> Root<'id, T> {
pub fn get<'gc>(&self, _cx: &MutationContext<'id, 'gc>) -> Gc<'gc, T> { ... }
}
```

Catches cross-collector misuse where lifetimes can't help.
No runtime check, no `collector_id` field, no atomic counter. The compiler does all the work.

### Gc Access Safety

Expand Down Expand Up @@ -103,7 +100,7 @@ Taking the `intrusive_collections` crate as inspiration, here is what we adopted
2. **O(1) Self Removal**: `unlink` drops nodes safely without a reference to the `Collector`.
3. **Double Unlink Protection**: `is_linked()` enforces safe dropping.
4. **Sentinel Node**: `Collector` owns a pinned `RootLink` as the list head.
5. **Type Erased Marking**: `Root<T>` is `#[repr(C)]` with `link` at offset 0. The GC walks the links and recovers pointers using `offset_of!`. No `Trace` bound is needed.
5. **Type Erased Marking**: `RootNode<T>` is `#[repr(C)]` with `link` at offset 0. The GC walks the links and recovers `gc_ptr` using `offset_of!(RootNode<i32>, gc_ptr)`. A `debug_assert_eq!` with a second concrete type checks the offset is stable across all `T: Sized`. No `Trace` bound is needed.

#### Evolution of approaches

Expand Down Expand Up @@ -132,17 +129,15 @@ Single threaded GC. Explicit bounds prevent cross thread bugs.

## Validated

**Compile-time isolation**: Borrow checker prevents mixing `Gc` from different contexts.

**Runtime cross-collector detection**: `Root::get()` panics on wrong collector.
**Compile-time isolation**: Borrow checker prevents mixing `Gc`, `Root`, and `WeakGc` from different contexts. Cross-context use is a compile error, not a runtime panic.

**Root cleanup**: Drop removes from root list.
**Root cleanup**: Drop unlinking removes from root list. `Box::from_raw` reclaims the node allocation.

**Interior Mutability Tracing**: Using `GcRefCell<T>` allows `RefCell` semantics to persist efficiently while fulfilling `Trace` safety requirements without borrowing errors.

**Scopeless Weak Binding**: `WeakGc<T>` survives successfully unbranded and can trace/upgrade against an arbitrary temporal `MutationContext` when actively touched again.
**Branded Weak Binding**: `WeakGc<'id, T>` carries the same context brand. `upgrade` requires a matching `MutationContext<'id, 'gc>`, so cross-context upgrade is also a compile error.

**Functional Builtin Prototyping**: Explicit tests matching exactly against definitions like `Array.prototype.push` (taking a `&Gc<'gc, GcRefCell<JsArray<'gc>>>` + `arg` buffer bound to `_cx: &MutationContext<'gc>`) compiled gracefully and safely.
**Functional Builtin Prototyping**: Explicit tests matching exactly against definitions like `Array.prototype.push` (taking a `&Gc<'gc, GcRefCell<JsArray<'gc>>>` + `arg` buffer bound to `_cx: &MutationContext<'id, 'gc>`) compiled gracefully and safely.

### Performance

Expand All @@ -161,51 +156,20 @@ Single threaded GC. Explicit bounds prevent cross thread bugs.

**Migration**: Boa has thousands of `Gc<T>` uses. Need to add `'gc` everywhere. Phasing gradually starting with isolated systems can be done

### `Pin<&mut Root<T>>` for Escaping Roots

Raised during review: could we use `Pin<&mut Root<T>>` instead of `Pin<Box<Root<T>>>` to avoid a heap allocation per root?

**No, not for escaping roots.** Stack allocation fails because:

1. `Root` is created inside `mutate()`.
2. Escaping roots must outlive `mutate()`.
3. `Pin<&mut>` requires a stable address.

We cannot move a `&mut` out of its closure frame without changing its address and violating `Pin`

`Pin<Box<Root>>` fixes this: the pointer moves out, but the heap allocation stays fixed. Cost belongs to one `Box` per root.

#### Workaround: `root_in_place`

Zero allocation is possible if the caller pre-allocates the `Root<T>` slot on the outer stack:

```rust
let mut slot = std::mem::MaybeUninit::<Root<JsObject>>::uninit();

ctx.mutate(|cx| {
let obj = cx.alloc(JsObject { name: "global".into(), value: 0 });
let root = cx.root_in_place(&mut slot, obj);
});

let root = unsafe { slot.assume_init_ref() };
```
### Root Node Stability via `Box::into_raw`

`root_in_place` writes into the slot, pins it, links it and returns `Pin<&mut Root<T>>`. This matches V8's `HandleScope`: no allocation, O(1) creation.
`Pin<Box<Root<T>>>` was the original approach: pinning kept the intrusive list node address stable.

**Reasons to skip this for now:**
1. Caller must know `T` upfront to size the `MaybeUninit` slot.
2. Requires `unsafe` to read the slot later.
3. `Pin<Box<Root>>` is simpler and safer for validating the core API right now.
The current approach is simpler: `cx.root()` allocates the node with `Box::new`, calls `Box::into_raw` immediately, and stores the raw `NonNull` inside a thin `Root<'id, T>` handle. The heap address is stable by construction. `Drop` calls `Box::from_raw` to reclaim it after unlinking.

*We can prototype this later if needed.*
This removes `Pin` from the public API entirely. `root()` returns `Root<'id, T>` (one word on the stack), not `Pin<Box<Root<T>>>`. The cost is still one heap allocation per escaping root, same as before.


## Conclusion

`Gc<'gc, T>` + `Root<T>` is:
- **Sound**: Compile-time catches misuse
- **Runtime-safe**: Collector ID validation catches Root misuse
- **Fast**: Zero cost transient pointers
`Gc<'gc, T>` + `Root<'id, T>` is:
- **Sound**: Compile-time catches all cross-context misuse for `Gc`, `Root` and `WeakGc`
- **Fast**: Zero cost transient pointers, no atomic counters, no branch in `Root::get`
- **Feasible**: Can coexist with current API

Main risk is migration effort, we can go with the phased approach
Loading