Skip to content

refactor(ocap-kernel): Extract VatManager and SubclusterManager from Kernel class#651

Merged
sirtimid merged 6 commits intomainfrom
sirtimid/extract-vat-managers
Oct 15, 2025
Merged

refactor(ocap-kernel): Extract VatManager and SubclusterManager from Kernel class#651
sirtimid merged 6 commits intomainfrom
sirtimid/extract-vat-managers

Conversation

@sirtimid
Copy link
Copy Markdown
Contributor

@sirtimid sirtimid commented Sep 30, 2025

Ref: #632

Split the monolithic Kernel class into focused manager classes to improve separation of concerns and maintainability. Created VatManager to handle all vat lifecycle operations (launch, terminate, restart, ping) and SubclusterManager for subcluster operations (launch, reload, terminate). This refactoring reduces the Kernel class from handling 15+ responsibilities to focusing on core coordination, while delegating vat and subcluster management to specialized classes. Added comprehensive test coverage for both new manager classes.


Note

Split vat and subcluster lifecycle logic into new managers, refactoring Kernel to delegate to them and adding comprehensive tests with updated coverage thresholds.

  • Core refactor:
    • Kernel: Replaces in-class vat/subcluster lifecycle logic with delegation to VatManager and SubclusterManager (e.g., initializeAllVats, launchSubcluster, terminateSubcluster, reloadSubcluster, restartVat, terminateVat, getVats/Ids, pin/unpin, pingVat, collectGarbage, reload). Removes direct vat map management and related helpers.
    • New Managers:
      • vats/VatManager.ts: Handles vat lifecycle (run/launch/restart/terminate/pin/unpin/ping, reap, collect garbage, initialize from store).
      • vats/SubclusterManager.ts: Handles subcluster lifecycle (launch/reload/terminate, membership queries, reload all subclusters).
  • Tests:
    • Add vats/VatManager.test.ts and vats/SubclusterManager.test.ts with extensive coverage of manager behaviors.
    • Update Kernel.test.ts to align with delegation and behavior changes (e.g., restartVat now returns the new handle and preserves vatId).
  • Build/Config:
    • Tweak coverage thresholds in vitest.config.ts for packages/ocap-kernel/**.

Written by Cursor Bugbot for commit e12bf61. This will update automatically on new commits. Configure here.

@sirtimid sirtimid requested a review from a team as a code owner September 30, 2025 19:30
@sirtimid sirtimid enabled auto-merge (squash) September 30, 2025 19:42
cursor[bot]

This comment was marked as outdated.

@sirtimid sirtimid force-pushed the sirtimid/extract-vat-managers branch from af6dcc8 to 9bc842d Compare October 2, 2025 17:42
cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@rekmarks rekmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this PR and its sibling #653, I'm left with some questions that I don't have easy answers to. The Kernel.ts we're left with after #653 (ref) is much shorter than on current main (ref). It maintains its existing interface, and the primary difference is that the body of function calls into a method that lives on one of the new manager classes. In other words, the refactor amounts to turning the Kernel into a facade.

Normally, I understand a facade to be something you erect on top of a messy agglomeration of stuff to present a single interface to some consumer. Here, we already have a single interface for our consumers in the form of the Kernel class, and are instead rearranging dividing its internals into separate components.

The goal of this refactor is to improve maintainability. It seems that the refactor is a net gain in tests, and I imagine it may be difficult to unit test the Kernel as thoroughly as we have unit tested its constituent components in this PR. However, I wonder if
coordinating the new sub-components of the kernel will be more work than keeping them in a single place? I have seen this kind of "fat"
orchestrator class grow beyond all reason, but I don't believe we are there yet with the kernel.

I'm curious as to how you view the tradeoffs here @sirtimid.

@sirtimid
Copy link
Copy Markdown
Contributor Author

sirtimid commented Oct 3, 2025

Looking at this PR and its sibling #653, I'm left with some questions that I don't have easy answers to. The Kernel.ts we're left with after #653 (ref) is much shorter than on current main (ref). It maintains its existing interface, and the primary difference is that the body of function calls into a method that lives on one of the new manager classes. In other words, the refactor amounts to turning the Kernel into a facade.

Normally, I understand a facade to be something you erect on top of a messy agglomeration of stuff to present a single interface to some consumer. Here, we already have a single interface for our consumers in the form of the Kernel class, and are instead rearranging dividing its internals into separate components.

The goal of this refactor is to improve maintainability. It seems that the refactor is a net gain in tests, and I imagine it may be difficult to unit test the Kernel as thoroughly as we have unit tested its constituent components in this PR. However, I wonder if coordinating the new sub-components of the kernel will be more work than keeping them in a single place? I have seen this kind of "fat" orchestrator class grow beyond all reason, but I don't believe we are there yet with the kernel.

I'm curious as to how you view the tradeoffs here @sirtimid.

@rekmarks I would argue this isn't a facade pattern, but rather a separation of concerns refactor following the Single Responsibility Principle. We're not hiding complexity, we're organizing it better. We didn't think "Kernel is hard to use, let's add a simple interface." We thought "Kernel is doing too many things in one place, let's organize it better." Here's why it's worth it:

Clearer Responsibilities

Instead of one 600+ line file doing everything, we now have smaller, focused files:

  • VatManager (317 lines) - manages vat lifecycle
  • SubclusterManager (226 lines) - handles vat clusters
  • RemoteManager (196 lines) - handles remote connections
  • KernelServiceManager (146 lines) - manages kernel services
  • OcapURLManager (124 lines) - handles OCAP URL operations

Concrete Benefits

  • Safer to change: When fixing remote communication bugs, you can't accidentally break vat management since they're in separate files
  • Better testing: We added 56 focused tests with 100% coverage. Testing everything through the main Kernel class would be much harder
  • Simple coordination: The managers don't require complex orchestration - they're passed shared dependencies (store, queue) during construction and operate independently

Why Refactor Now?

While 600 lines isn't "unreasonably large" yet, waiting until it gets worse makes refactoring harder. The code was already naturally grouped (all remote methods together, all service methods together), we just made those boundaries official. Early refactoring prevents the technical debt that comes from waiting until it's "bad enough."

Addressing Coordination Overhead

You raised a valid concern about coordination complexity. In practice, it's minimal:
The managers are mostly independent:

  • KernelServiceManager doesn't interact with RemoteManager
  • VatManager doesn't know about OcapURLManager
  • They share dependencies (KernelStore, KernelQueue) but not mutable state

The coordination is minimal, managers share kernelStore/kernelQueue and a few necessary callbacks where domains overlap (like SubclusterManager needing to get kernel services). But there's no complex state synchronization or distributed transactions. Each does its own job without requiring synchronization with others.

Changes stay localized:

  • Bug in OCAP URLs? Only touch OcapURLManager (124 lines)
  • New vat feature? Only modify VatManager (317 lines)
  • Compare to searching through 600+ lines to find the right section

The coordination "cost" is essentially zero, just one function call, while the benefits (focused testing, safer changes, clearer boundaries) are immediate.

The Public API Benefit

Most importantly, the Kernel now clearly defines our public API. Before, it mixed public methods with internal implementation details. Now, anyone reading Kernel.ts immediately sees the complete public API without wading through implementation code. The one-liner delegations aren't boilerplate, they're explicit declarations of what's public.

The refactor makes the codebase more intentional: public API in Kernel, implementation in managers. What do you think works better for the team?

cursor[bot]

This comment was marked as outdated.

@sirtimid sirtimid requested a review from rekmarks October 3, 2025 16:40
rekmarks
rekmarks previously approved these changes Oct 3, 2025
Copy link
Copy Markdown
Member

@rekmarks rekmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's see how this plays out!

@sirtimid sirtimid disabled auto-merge October 3, 2025 17:43
cursor[bot]

This comment was marked as outdated.

@sirtimid sirtimid force-pushed the sirtimid/extract-vat-managers branch from 2fc63d8 to 3b3846a Compare October 13, 2025 17:15
@sirtimid sirtimid requested a review from rekmarks October 13, 2025 17:15
throw new SubclusterNotFoundError(subclusterId);
}
const vatIdsToTerminate = this.#kernelStore.getSubclusterVats(subclusterId);
for (const vatId of vatIdsToTerminate.reverse()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Array Mutation in Subcluster Management

The terminateSubcluster and reloadSubcluster methods directly mutate arrays (like vatIdsToTerminate and subcluster.vats) by calling .reverse(). This mutates what might be internal kernel store data, potentially corrupting state and causing unexpected behavior for other parts of the system.

Additional Locations (1)

Fix in Cursor Fix in Web

@sirtimid sirtimid merged commit 024f0bc into main Oct 15, 2025
23 checks passed
@sirtimid sirtimid deleted the sirtimid/extract-vat-managers branch October 15, 2025 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants