feat: add runners to pb protocol #2433

MasterPtato · 2025-05-09T01:33:37Z

Changes

MasterPtato · 2025-05-09T01:33:59Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

cloudflare-workers-and-pages · 2025-05-09T01:37:28Z

Deploying rivet with Cloudflare Pages

Latest commit:	`cdf9857`
Status:	🚫 Build failed.

View logs

cloudflare-workers-and-pages · 2025-05-17T01:22:13Z

Deploying rivet-studio with Cloudflare Pages

Latest commit:	`cdf9857`
Status:	🚫 Build failed.

View logs

cloudflare-workers-and-pages · 2025-05-17T01:25:45Z

Deploying rivet-hub with Cloudflare Pages

Latest commit:	`cdf9857`
Status:	🚫 Build failed.

View logs

greptile-apps

PR Summary

This PR introduces a major architectural change to the Rivet project by adding runners to the pegboard protocol and removing the isolate-v8-runner implementation. The changes consolidate runner management into a more generic protocol while maintaining backward compatibility.

Removes isolate-v8-runner package and related JavaScript runtime functionality, suggesting a shift away from V8-based isolation
Adds new actor2 workflow with support for multi-slot runners and improved state management in packages/edge/services/pegboard/src/workflows/actor2/
Introduces runner tracking by remaining slots with new key structures in packages/edge/services/pegboard/src/keys/runner.rs
Refactors container runner to use generic runner terminology and WebSocket-based communication in packages/edge/infra/client/container-runner/
Adds subnet configuration and IP management for runners in pegboard_configure.sh with new variables like SUBNET_IPV4 and SUBNET_IPV6

_{76 file(s) reviewed, 49 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2025-05-21T01:40:39Z

docker/universal/Dockerfile

@@ -47,10 +47,10 @@ RUN \
    --mount=type=cache,target=/root/.cache,id=universal-user-cache \
    --mount=type=cache,target=/root/.npm,id=universal-user-npm \
    --mount=type=cache,target=/root/.yarn,id=universal-user-yarn \
-    RUSTFLAGS="--cfg tokio_unstable" RIVET_BUILD_HUB=0 cargo build --bin rivet-server --bin rivet-edge-server --bin rivet-guard --bin rivet-client --bin rivet-isolate-v8-runner --bin rivet-container-runner && \
+    RUSTFLAGS="--cfg tokio_unstable" RIVET_BUILD_HUB=0 cargo build --bin rivet-server --bin rivet-edge-server --bin rivet-guard --bin rivet-client --bin rivet-container-runner && \


logic: The rivet-isolate-v8-runner binary has been removed from the build command. Verify that this functionality is either no longer needed or has been replaced elsewhere in the codebase.

greptile-apps · 2025-05-21T01:41:08Z

examples/system-test-actor/tests/client.ts

+						http2: {
+							protocol: "http",
+							internalPort: 8085,
+							routing: {
+								guard: {},
+							},
+						},


logic: HTTP2 port is configured but no tests are implemented to verify HTTP2 functionality

greptile-apps · 2025-05-21T01:42:24Z

packages/common/hub-embed/build.rs

+	// Copy dist directory to out_dir
+	let dist_path = hub_path.join("dist");
+	fs_extra::dir::copy(
+		dist_path.clone(),
+		out_dir,
+		&fs_extra::dir::CopyOptions::new().content_only(true),
+	)?;


logic: This will fail if the dist directory doesn't exist when RIVET_BUILD_HUB is false. The old code created an empty directory in this case, but now it tries to copy a potentially non-existent directory.

Suggested change

// Copy dist directory to out_dir

let dist_path = hub_path.join("dist");

fs_extra::dir::copy(

dist_path.clone(),

out_dir,

&fs_extra::dir::CopyOptions::new().content_only(true),

)?;

// Copy dist directory to out_dir if it exists

let dist_path = hub_path.join("dist");

if dist_path.exists() {

fs_extra::dir::copy(

dist_path.clone(),

out_dir,

&fs_extra::dir::CopyOptions::new().content_only(true),

)?;

} else {

// Create empty directory when dist doesn't exist

fs::create_dir_all(out_dir)?

}

greptile-apps · 2025-05-21T01:42:25Z

packages/common/hub-embed/build.rs

+	println!("cargo:rerun-if-changed={}", hub_path.display());
+	println!("cargo:rerun-if-env-changed=FONTAWESOME_PACKAGE_TOKEN");


style: These rerun-if-changed triggers are now outside the build conditional, meaning they'll trigger rebuilds even when RIVET_BUILD_HUB is false.

greptile-apps · 2025-05-21T01:42:57Z

packages/edge/api/actor/src/route/actors.rs

-		let mut create_sub = ctx
-			.subscribe::<pegboard::workflows::actor::CreateComplete>(("actor_id", actor_id))
+
+	if let build::types::BuildAllocationType::None = build.allocation_type {


style: Consider extracting common workflow setup code to avoid duplication between allocation type branches

greptile-apps · 2025-05-21T01:59:20Z

packages/edge/services/pegboard/src/protocol.rs

+pub struct RunnerConfig {
+	pub image: Image,
+	pub root_user_enabled: bool,
+	pub resources: Resources,
+	pub env: HashableMap<String, String>,
+	pub ports: HashableMap<String, Port>,
+	pub network_mode: NetworkMode,
+}


style: RunnerConfig duplicates fields from deprecated ActorConfig. Consider reusing the existing types to maintain consistency.

greptile-apps · 2025-05-21T01:59:46Z

packages/edge/services/pegboard/src/workflows/actor2/runtime.rs

+			if let BuildAllocationType::Multi = input.build_allocation_type {
+				loop {
+					let Some(entry) = stream.try_next().await? else {
+						return Ok(None);
+					};
+
+					let old_runner_allocation_key = keys::subspace()
+						.unpack::<keys::datacenter::RunnersByRemainingSlotsKey>(entry.key())
+						.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?;
+
+					let data = old_runner_allocation_key
+						.deserialize(entry.value())
+						.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?;
+
+					// Add read conflict only for this key
+					tx.add_conflict_range(
+						entry.key(),
+						&end_of_key_range(entry.key()),
+						ConflictRangeType::Read,
+					)?;
+
+					// Clear old entry
+					tx.clear(entry.key());
+
+					let new_remaining_slots =
+						old_runner_allocation_key.remaining_slots.saturating_sub(1);
+
+					// Write new allocation key with 1 less slot
+					let new_allocation_key = keys::datacenter::RunnersByRemainingSlotsKey::new(
+						input.image_id,
+						new_remaining_slots,
+						old_runner_allocation_key.runner_id,
+					);
+					tx.set(&keys::subspace().pack(&new_allocation_key), entry.value());
+
+					// Update runner record
+					let remaining_slots_key =
+						keys::runner::RemainingSlotsKey::new(old_runner_allocation_key.runner_id);
+					tx.set(
+						&keys::subspace().pack(&remaining_slots_key),
+						&remaining_slots_key
+							.serialize(new_remaining_slots)
+							.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,
+					);
+
+					// Insert actor index key
+					let client_actor_key =
+						keys::client::ActorKey::new(data.client_id, input.actor_id);
+					tx.set(
+						&keys::subspace().pack(&client_actor_key),
+						&client_actor_key
+							.serialize(input.generation)
+							.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,
+					);
+
+					return Ok(Some(AllocateActorOutput {
+						runner_id: old_runner_allocation_key.runner_id,
+						new_runner: false,
+						client_id: data.client_id,
+						client_workflow_id: data.client_workflow_id,
+					}));
+				}
+			}


logic: The allocation loop for multi-type builds lacks a check for remaining_slots > 0, which could lead to allocating to runners with no capacity

greptile-apps · 2025-05-21T01:59:47Z

packages/edge/services/pegboard/src/workflows/actor2/runtime.rs

+				if let BuildAllocationType::Multi = input.build_allocation_type {
+					let remaining_slots = input.build_allocation_total_slots.saturating_sub(1);
+					let total_slots = input.build_allocation_total_slots;
+
+					// Insert runner records
+					let remaining_slots_key = keys::runner::RemainingSlotsKey::new(runner_id);
+					tx.set(
+						&keys::subspace().pack(&remaining_slots_key),
+						&remaining_slots_key
+							.serialize(remaining_slots)
+							.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,
+					);
+
+					let total_slots_key = keys::runner::TotalSlotsKey::new(runner_id);
+					tx.set(
+						&keys::subspace().pack(&total_slots_key),
+						&total_slots_key
+							.serialize(total_slots)
+							.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,
+					);
+
+					// Insert runner index key
+					let runner_idx_key = keys::datacenter::RunnersByRemainingSlotsKey::new(
+						input.image_id,
+						remaining_slots,
+						runner_id,
+					);
+					tx.set(
+						&keys::subspace().pack(&runner_idx_key),
+						&runner_idx_key
+							.serialize(keys::datacenter::RunnersByRemainingSlotsKeyData {
+								client_id: old_client_allocation_key.client_id,
+								client_workflow_id,
+							})
+							.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,
+					);
+				}


logic: Runner records are only created when build_allocation_type is Multi, but this creates an inconsistency since Single type builds should also have runner records

Suggested change

if let BuildAllocationType::Multi = input.build_allocation_type {

let remaining_slots = input.build_allocation_total_slots.saturating_sub(1);

let total_slots = input.build_allocation_total_slots;

// Insert runner records

let remaining_slots_key = keys::runner::RemainingSlotsKey::new(runner_id);

tx.set(

&keys::subspace().pack(&remaining_slots_key),

&remaining_slots_key

.serialize(remaining_slots)

.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,

);

let total_slots_key = keys::runner::TotalSlotsKey::new(runner_id);

tx.set(

&keys::subspace().pack(&total_slots_key),

&total_slots_key

.serialize(total_slots)

.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,

);

// Insert runner index key

let runner_idx_key = keys::datacenter::RunnersByRemainingSlotsKey::new(

input.image_id,

remaining_slots,

runner_id,

);

tx.set(

&keys::subspace().pack(&runner_idx_key),

&runner_idx_key

.serialize(keys::datacenter::RunnersByRemainingSlotsKeyData {

client_id: old_client_allocation_key.client_id,

client_workflow_id,

})

.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,

);

}

// Insert runner records for all allocation types

let (remaining_slots, total_slots) = match input.build_allocation_type {

BuildAllocationType::Multi => {

(input.build_allocation_total_slots.saturating_sub(1), input.build_allocation_total_slots)

},

BuildAllocationType::Single => (0, 1),

};

// Insert runner records

let remaining_slots_key = keys::runner::RemainingSlotsKey::new(runner_id);

tx.set(

&keys::subspace().pack(&remaining_slots_key),

&remaining_slots_key

.serialize(remaining_slots)

.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,

);

let total_slots_key = keys::runner::TotalSlotsKey::new(runner_id);

tx.set(

&keys::subspace().pack(&total_slots_key),

&total_slots_key

.serialize(total_slots)

.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,

);

// Insert runner index key

let runner_idx_key = keys::datacenter::RunnersByRemainingSlotsKey::new(

input.image_id,

remaining_slots,

runner_id,

);

tx.set(

&keys::subspace().pack(&runner_idx_key),

&runner_idx_key

.serialize(keys::datacenter::RunnersByRemainingSlotsKeyData {

client_id: old_client_allocation_key.client_id,

client_workflow_id,

})

.map_err(|x| fdb::FdbBindingError::CustomError(x.into()))?,

);

greptile-apps · 2025-05-21T01:59:47Z

packages/edge/services/pegboard/src/workflows/actor2/runtime.rs

+				destroy::clear_ports_and_resources(
+					input.actor_id,
+					input.image_id,
+					Some(build.allocation_type),
+					ingress_ports,
+					Some(input.runner_id),
+					Some(input.client_id),
+					Some(input.client_workflow_id),
+					selected_resources_memory_mib,
+					selected_resources_cpu_millicores,
+					&tx,
+				)
+				.await
+			}


logic: clear_ports_and_resources is called with build.allocation_type before checking if build exists, which could cause a panic if build is not found

Suggested change

destroy::clear_ports_and_resources(

input.actor_id,

input.image_id,

Some(build.allocation_type),

ingress_ports,

Some(input.runner_id),

Some(input.client_id),

Some(input.client_workflow_id),

selected_resources_memory_mib,

selected_resources_cpu_millicores,

&tx,

)

.await

}

destroy::clear_ports_and_resources(

input.actor_id,

input.image_id,

build.map(|b| b.allocation_type),

ingress_ports,

Some(input.runner_id),

Some(input.client_id),

Some(input.client_workflow_id),

selected_resources_memory_mib,

selected_resources_cpu_millicores,

&tx,

)

.await

}

greptile-apps · 2025-05-21T01:59:55Z

packages/edge/services/pegboard/src/workflows/mod.rs

@@ -1,2 +1,3 @@
 pub mod actor;
+pub mod actor2;


style: Consider using a more descriptive name like actor_v2 or similar instead of actor2 to better indicate this is a new version

claude · 2025-07-03T01:32:42Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze core architectural changes (actor2 workflow)
Review protocol and runner management changes
Examine security implications
Check error handling and resource management
Assess test coverage
Provide comprehensive feedback

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-03T01:36:09Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze core architectural changes (actor2 workflow)
Review protocol and runner management changes
Examine security implications
Check error handling and resource management
Assess test coverage
Provide comprehensive feedback

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

coderabbitai · 2025-07-04T01:27:15Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

claude · 2025-07-04T01:31:19Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-04T01:32:40Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-11T22:33:29Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-11T22:33:37Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-12T00:47:10Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
destroy::clear_ports_and_resources(
    input.actor_id,
    input.image_id,
    Some(build.allocation_type), // Potential panic

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-12T00:47:17Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

… bytes read

vercel · 2025-07-18T18:51:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
rivet-site	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 18, 2025 7:04pm

claude · 2025-07-18T18:57:02Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Tasks:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review specific code changes and implementation details
Assess security implications and potential vulnerabilities
Evaluate performance considerations
Check test coverage and quality assurance
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

claude · 2025-07-18T18:57:34Z

Claude finished @MasterPtato's task —— View job

PR Review: feat: add runners to pb protocol

I've completed a comprehensive review of this major architectural change. This PR introduces significant improvements to the Rivet platform by adding runner support to the pegboard protocol while removing the isolate-v8-runner implementation.

Review Checklist:

Read repository guidelines (CLAUDE.md)
Analyze overall architecture and design decisions
Review actor2 workflow implementation
Examine runner management and protocol changes
Check security implications
Assess error handling and edge cases
Evaluate performance considerations
Review test coverage
Provide structured feedback and recommendations

🏗️ Architecture & Design

Strengths:

Unified Runner Protocol: The consolidation from separate isolate and container runners to a generic runner protocol is a solid architectural improvement
Multi-slot Support: The new slot-based allocation system (BuildAllocationType::Multi) enables better resource utilization
Actor2 Workflow: Clean separation between actor and actor2 workflows allows for incremental migration
Database Design: The new runner tracking in FDB with remaining/total slots provides efficient allocation lookups

Concerns:

Breaking Change Scope: Removing the entire isolate-v8-runner package without clear migration path could break existing JavaScript runtime functionality
Dual Workflow Complexity: Having both actor and actor2 workflows increases cognitive load and maintenance burden

🐛 Potential Bugs & Issues

Critical Issues

Runner Allocation Logic Gap (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:255):

// Missing remaining_slots > 0 check for multi-type builds
if let BuildAllocationType::Multi = input.build_allocation_type {
    loop {
        // Should verify remaining_slots > 0 before allocation

SQL Parameter Mismatch (packages/edge/infra/client/manager/src/runner/mod.rs:510):
```
SET delete_ts = ?3  -- Should be ?2, missing bind parameter
WHERE
```
Race Condition in Runner Socket (packages/edge/infra/client/manager/src/ctx.rs:877):
Runner socket connection could still occur between the connection check lines.

Build Type Validation Error (packages/edge/services/pegboard/src/workflows/actor2/runtime.rs:1021):

// build.allocation_type accessed before checking if build exists
let build = unwrap_with!(build_res.builds.first(), BUILD_NOT_FOUND);
// Later used without null check
destroy::clear_ports_and_resources(..., build.allocation_type, ...)

Security Concerns

Hardcoded UUIDs (packages/edge/infra/client/echo/src/main.rs:68):

let payload = json!({
    "init": {
        "runner_id": Uuid::nil(), // Should be unique UUID
    },
});

Missing Input Validation (packages/edge/infra/client/echo/src/main.rs:107):
No error handling for missing actor_id or generation fields in packets.

🚀 Performance Considerations

Improvements:

FDB Allocation Strategy: Using SNAPSHOT reads for allocation prevents conflicts while maintaining consistency
Bin-packing Algorithm: Reverse ordering in allocation queries optimizes resource utilization
Connection Pooling: WebSocket connection management looks efficient

Potential Issues:

Tokio "full" Feature: packages/edge/infra/client/echo/Cargo.toml enables all Tokio features unnecessarily
Allocation Loop Performance: The multi-slot allocation loop could benefit from early termination checks
Ping Thread Cleanup: Ping threads continue running even after main WebSocket connections close

🔒 Security Assessment

Positive:

Process Isolation: Container runner maintains proper process separation
Signal Handling: Appropriate use of Unix signals for process management
Resource Bounds: Memory and CPU limits properly enforced

Concerns:

Error Exposure: WebSocket error messages might leak internal system information
Resource Exhaustion: Missing rate limiting could allow unconstrained logging (removed throttle.rs)

🧪 Test Coverage

Missing Tests:

Runner Lifecycle: No comprehensive tests for runner creation, allocation, and cleanup
Error Scenarios: Limited testing of failure modes and edge cases
Protocol Validation: Missing tests for malformed protocol messages
Multi-slot Allocation: No tests for the new slot-based allocation system

Existing Tests:

Basic client state tests exist but assume generation 0
Integration tests are minimal given the scope of changes

📝 Code Quality Issues

Style & Maintainability

Inconsistent Naming: actor2 should be actor_v2 for clarity
TODO Overuse: Multiple unresolved TODOs in critical paths
Deprecated Fields: Multiple deprecated protocol fields still marked as required
Comment Accuracy: TotalSlotsKey comment says "MiB" but measures slots

Error Handling

Incomplete Error Paths: Several unwrap() calls that should handle errors gracefully
Missing Context: Error messages lack sufficient debugging information
Inconsistent Returns: Some functions return Ok(()) where errors should be propagated

🔧 Recommendations

Immediate Fixes Required

Fix SQL Parameter Binding: Correct the missing parameter in runner deletion query
Add Remaining Slots Check: Validate slot availability before allocation attempts
Improve Error Handling: Replace unwrap() calls with proper error handling
Fix UUID Generation: Use Uuid::new_v4() instead of Uuid::nil() for runner IDs

Architecture Improvements

Migration Strategy: Provide clear documentation for migrating from isolate-v8-runner
Test Coverage: Add comprehensive integration tests for the new runner system
Monitoring: Implement metrics for runner allocation success/failure rates
Documentation: Update API documentation to reflect the new allocation types

Performance Optimizations

Resource Cleanup: Implement proper cleanup for ping threads and WebSocket connections
Feature Minimization: Use specific Tokio features instead of "full"
Allocation Efficiency: Add early termination conditions in allocation loops

✅ Overall Assessment

This PR represents a significant and necessary architectural improvement that consolidates runner management and enables multi-slot allocation. While the core design is sound, several critical bugs and missing error handling need to be addressed before merging.

Verdict: Approve with required changes - The architectural direction is correct, but the implementation needs refinement in error handling, test coverage, and bug fixes identified above.

Priority Issues to Address:

SQL parameter binding fix (High)
Remaining slots validation (High)
Error handling improvements (High)
Test coverage expansion (Medium)
Documentation updates (Medium)

MasterPtato mentioned this pull request May 9, 2025

fix(pegboard): fix netns path bug #2431

Closed

MasterPtato mentioned this pull request May 9, 2025

fix: swap status monitor dns resolver with native resolver #2432

Closed

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch 5 times, most recently from d3bc56e to 9e9302b Compare May 9, 2025 23:59

MasterPtato force-pushed the 05-07-fix_swap_status_monitor_dns_resolver_with_native_resolver branch from 13d14bb to 35fc915 Compare May 9, 2025 23:59

graphite-app bot changed the base branch from 05-07-fix_swap_status_monitor_dns_resolver_with_native_resolver to graphite-base/2433 May 10, 2025 00:01

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from 9e9302b to b4a61a7 Compare May 15, 2025 01:34

MasterPtato force-pushed the graphite-base/2433 branch from 35fc915 to bb8ccd5 Compare May 15, 2025 01:34

MasterPtato changed the base branch from graphite-base/2433 to main May 15, 2025 01:34

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch 3 times, most recently from faaaf17 to 542b3c5 Compare May 16, 2025 20:15

MasterPtato mentioned this pull request May 16, 2025

feat(pb): add runner logs #2447

Open

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from 542b3c5 to 885534d Compare May 17, 2025 01:22

MasterPtato marked this pull request as ready for review May 21, 2025 01:38

MasterPtato requested a review from NathanFlurry May 21, 2025 01:38

MasterPtato mentioned this pull request May 21, 2025

feat(pb): get multi-actors working e2e, add resources for builds #2465

Open

greptile-apps bot reviewed May 21, 2025

View reviewed changes

This was referenced May 22, 2025

feat: implement new actor ids #2467

Open

fix(pb): get new actor ids working e2e #2471

Open

MasterPtato changed the base branch from main to graphite-base/2433 May 27, 2025 02:01

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from 885534d to d0f985a Compare May 27, 2025 02:02

MasterPtato changed the base branch from graphite-base/2433 to 05-27-chore_pegboard_add_exit_code_log_from_container_runner May 27, 2025 02:02

MasterPtato mentioned this pull request Jul 3, 2025

fix: increase default actor kill timeout to 5s #2701

Open

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from 8fad6b0 to a832749 Compare July 3, 2025 01:31

MasterPtato force-pushed the 05-27-chore_pegboard_add_exit_code_log_from_container_runner branch from f20b30f to 03f6f81 Compare July 3, 2025 01:31

MasterPtato force-pushed the 05-27-chore_pegboard_add_exit_code_log_from_container_runner branch from 03f6f81 to c6943e6 Compare July 4, 2025 01:27

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from a832749 to 0074232 Compare July 4, 2025 01:27

MasterPtato mentioned this pull request Jul 4, 2025

fix: convert runner protocol to protobuf #2702

Open

This was referenced Jul 4, 2025

fix: even more actor api changes #2714

Open

fix: get pb manager test working e2e again #2721

Open

MasterPtato force-pushed the 05-27-chore_pegboard_add_exit_code_log_from_container_runner branch from c6943e6 to d651a8c Compare July 11, 2025 22:23

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from 0074232 to c518c13 Compare July 11, 2025 22:23

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from c518c13 to 6c559a1 Compare July 12, 2025 00:40

MasterPtato force-pushed the 05-27-chore_pegboard_add_exit_code_log_from_container_runner branch from d651a8c to 017878a Compare July 12, 2025 00:40

NathanFlurry and others added 3 commits July 18, 2025 18:49

chore(pegboard): replace computing image size manually with using tar…

8452f42

… bytes read

feat: implement clickhouse-user-query

2c6945f

feat: add runners to pb protocol

cdf9857

MasterPtato force-pushed the 05-27-chore_pegboard_add_exit_code_log_from_container_runner branch from 017878a to d6072b5 Compare July 18, 2025 18:51

MasterPtato force-pushed the 05-08-feat_add_runners_to_pb_protocol branch from 6c559a1 to cdf9857 Compare July 18, 2025 18:51

vercel bot deployed to Preview July 18, 2025 19:04 View deployment

		println!("cargo:rerun-if-changed={}", hub_path.display());
		println!("cargo:rerun-if-env-changed=FONTAWESOME_PACKAGE_TOKEN");

feat: add runners to pb protocol #2433

Are you sure you want to change the base?

feat: add runners to pb protocol #2433

Uh oh!

Conversation

MasterPtato commented May 9, 2025

Changes

Uh oh!

MasterPtato commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

cloudflare-workers-and-pages bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying rivet with Cloudflare Pages

Uh oh!

cloudflare-workers-and-pages bot commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying rivet-studio with Cloudflare Pages

Uh oh!

cloudflare-workers-and-pages bot commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying rivet-hub with Cloudflare Pages

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat: add runners to pb protocol

Review Checklist:

🏗️ Architecture & Design

🐛 Potential Bugs & Issues

Critical Issues

Security Concerns

🚀 Performance Considerations

🔒 Security Assessment

🧪 Test Coverage

📝 Code Quality Issues

Style & Maintainability

Error Handling

🔧 Recommendations

Immediate Fixes Required

Architecture Improvements

Performance Optimizations

✅ Overall Assessment

Uh oh!

claude bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MasterPtato commented May 9, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented May 9, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented May 17, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented May 17, 2025 •

edited

Loading

claude bot commented Jul 3, 2025 •

edited

Loading

claude bot commented Jul 3, 2025 •

edited

Loading

coderabbitai bot commented Jul 4, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

claude bot commented Jul 4, 2025 •

edited

Loading

claude bot commented Jul 4, 2025 •

edited

Loading