Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Oct 27, 2025

Prevents indefinite hangs when cancelling long-running operations by passing abort signals through the runtime stack.

Changes

  • Runtime interfaces: Added optional abortSignal parameter to operations with long timeouts (≥30s)
  • LocalRuntime: Accepts but ignores abort signals (local operations are fast)
  • SSHRuntime: Passes abort signals to all exec() calls and spawn() operations
  • Tool integration: Bash tool passes abort signal to runtime operations

Impact

Long-running operations can now be cancelled cleanly:

  • Git operations (clone, checkout, init hooks)
  • SSH file I/O (readFile, writeFile)
  • Workspace operations (create, rename, delete)
  • Bundle creation for SSH sync

Generated with cmux

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

// Step 2: Create bundle locally and pipe to remote file via SSH
initLogger.logStep(`Creating git bundle...`);
await new Promise<void>((resolve, reject) => {
const sshArgs = this.buildSSHArgs(true);
const command = `cd ${shescape.quote(projectPath)} && git bundle create - --all | ssh ${sshArgs.join(" ")} "cat > ${bundleTempPath}"`;
log.debug(`Creating bundle: ${command}`);
const proc = spawn("bash", ["-c", command]);
streamProcessToLogger(proc, initLogger, {
logStdout: false,
logStderr: true,
});
let stderr = "";
proc.stderr.on("data", (data: Buffer) => {
stderr += data.toString();
});
proc.on("close", (code) => {
if (code === 0) {
resolve();
} else {
reject(new Error(`Failed to create bundle: ${stderr}`));
}
});
proc.on("error", (err) => {
reject(err);
});
});

P1 Badge Wire abortSignal to git bundle spawn

The new abortSignal is passed into syncProjectToRemote, but the expensive "git bundle" step still spawns a local process without listening for that signal. If a workspace initialization is cancelled while the bundle is being generated, this Promise keeps running until the bundle command finishes, so callers still block for minutes despite cancellation. Consider hooking the signal to proc.kill() (and rejecting the Promise) so that cancellation is effective for the longest phase of workspace syncing.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ammar-agent ammar-agent force-pushed the rt-abort branch 2 times, most recently from db92f27 to ebd6ced Compare October 27, 2025 19:03
Add abort signal support to all long-running runtime operations to prevent indefinite hangs. Operations with timeouts <10s are excluded per design to avoid noise.

Changes:
- Runtime interface: Add optional abortSignal to readFile/writeFile/stat and workspace operations
- LocalRuntime: Accept and check abort signals in file operations and workspace lifecycle
- SSHRuntime: Pass abort signals through to exec() calls for long-running operations:
  - File I/O (300s timeout)
  - Git clone operations (300s timeout)
  - Init hook execution (3600s timeout)
  - Workspace rename/delete (30s timeout)
- Helper functions: Accept and propagate abort signals through readFileString/writeFileString
- Bash tool: Pass abort signal to runtime.writeFile() for overflow files

All abort signals are optional parameters for backward compatibility.

Testing:
- 794 unit tests pass
- 57 integration tests pass (executeBash, runtimeExecuteBash, runtimeFileEditing, createWorkspace, renameWorkspace, removeWorkspace)
- Typecheck and lint clean

# Conflicts:
#	src/runtime/LocalRuntime.ts
Local file operations (readFile, writeFile, stat, renameWorkspace, deleteWorkspace)
are fast and don't need cancellation support. The abort signal parameters remain
in the interface for SSHRuntime, but LocalRuntime now ignores them with a comment
explaining why.

This simplifies LocalRuntime while keeping abort signal support where it matters
most - in SSHRuntime's network-bound operations.
@ammar-agent
Copy link
Collaborator Author

@codex This has been addressed. The bundle creation spawn() now hooks the abort signal to proc.kill() and rejects the Promise when cancelled. See the latest commit.

@chatgpt-codex-connector

This comment has been minimized.

During rebase conflict resolution, symlink resolution logic was accidentally
removed from LocalRuntime.writeFile(). This restores:
- resolvedPath: Resolves symlinks to write through them
- originalMode: Preserves file permissions across writes

The abort signal parameter addition is preserved.
@ammar-agent
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

// Spawn ssh command
const sshProcess = spawn("ssh", sshArgs, {
stdio: ["pipe", "pipe", "pipe"],
});
// Convert Node.js streams to Web Streams
const stdout = Readable.toWeb(sshProcess.stdout) as unknown as ReadableStream<Uint8Array>;
const stderr = Readable.toWeb(sshProcess.stderr) as unknown as ReadableStream<Uint8Array>;
const stdin = Writable.toWeb(sshProcess.stdin) as unknown as WritableStream<Uint8Array>;
// Track if we killed the process due to timeout
let timedOut = false;
// Create promises for exit code and duration
// Uses special exit codes (EXIT_CODE_ABORTED, EXIT_CODE_TIMEOUT) for expected error conditions
const exitCode = new Promise<number>((resolve, reject) => {
sshProcess.on("close", (code, signal) => {
// Check abort first (highest priority)
if (options.abortSignal?.aborted) {
resolve(EXIT_CODE_ABORTED);
return;
}
// Check if we killed the process due to timeout
// Don't check signal - if we set timedOut, we timed out regardless of how process died
if (timedOut) {
resolve(EXIT_CODE_TIMEOUT);
return;
}
resolve(code ?? (signal ? -1 : 0));
});
sshProcess.on("error", (err) => {
reject(new RuntimeErrorClass(`Failed to execute SSH command: ${err.message}`, "exec", err));
});
});
const duration = exitCode.then(() => performance.now() - startTime);
// Handle abort signal
if (options.abortSignal) {
options.abortSignal.addEventListener("abort", () => sshProcess.kill("SIGKILL"));
}

P1 Badge Respect already-aborted signals before spawning SSH commands

The new abort handling in SSHRuntime.exec only registers a listener after the SSH process is spawned and never checks whether options.abortSignal is already in the aborted state. AbortSignal events do not fire retroactively, so if the controller is aborted before exec is called (for example, a user cancels initialization between sequential steps), the method will still launch the remote command and it will run until completion or timeout, defeating the cancellation the commit is trying to add. Consider short‑circuiting at the top of exec or killing the process immediately when abortSignal.aborted is true so that later operations don’t start once cancellation has been requested.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

AbortSignal events don't fire retroactively, so if a signal is already
aborted when a method is called, the abort listener never fires and the
operation runs to completion despite being cancelled.

Added pre-checks at operation start in:
- SSHRuntime.exec(): Short-circuit before spawning SSH process
- SSHRuntime.syncProjectToRemote(): Check at function entry and before bundle spawn

This ensures operations cancelled between sequential steps don't start.

Addresses Codex feedback.
Rather than having callers manually set up abort handlers and cleanup,
streamProcessToLogger now accepts an optional abortSignal and returns
a cleanup function. This centralizes the abort handling logic and makes
call sites cleaner.

Benefits:
- Reduces boilerplate at call sites
- Ensures consistent cleanup (no risk of forgetting removeEventListener)
- Single place to maintain abort handling logic

Addresses code review feedback.
Add abortSignal parameter to all file edit tools and helpers:
- executeFileEditOperation() accepts and passes abortSignal
- fileExists() accepts and passes abortSignal to runtime.stat()
- file_edit_replace_string passes abortSignal from AI SDK
- file_edit_replace_lines passes abortSignal from AI SDK
- file_edit_insert passes abortSignal from AI SDK

These tools now receive abort signals from AI SDK tool execution
context and properly pass them to runtime operations (stat, readFile,
writeFile). SSH runtime will cancel operations when abort is triggered.
Local runtime ignores abort signals (fast operations).
@ammario ammario enabled auto-merge October 28, 2025 19:04
@ammario ammario added this pull request to the merge queue Oct 28, 2025
auto-merge was automatically disabled October 28, 2025 19:32

Pull Request is not mergeable

Merged via the queue into main with commit 52adb75 Oct 28, 2025
18 of 19 checks passed
@ammario ammario deleted the rt-abort branch October 28, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants