Skip to content

feat: adopt RxP types + add process-group handling (Phase 3a)#2

Merged
Velascat merged 1 commit intomainfrom
feat/rxp-types-and-process-group
May 6, 2026
Merged

feat: adopt RxP types + add process-group handling (Phase 3a)#2
Velascat merged 1 commit intomainfrom
feat/rxp-types-and-process-group

Conversation

@Velascat
Copy link
Copy Markdown
Owner

@Velascat Velascat commented May 6, 2026

Summary

Phase 3a of the OperationsCenter runtime extraction. Aligns ExecutorRuntime with the canonical RxP protocol and brings the SubprocessRunner up to kodo-equivalent process-group safety.

Type alignment with RxP

ExecutorRuntime previously had its own dataclass copies of `RuntimeInvocation` / `RuntimeResult` / `RuntimeStatus` — a recipe for protocol drift. This PR collapses them into RxP re-exports:

```python
from rxp.contracts import RuntimeInvocation, RuntimeResult, ArtifactDescriptor
```

Status is RxP's `runtime_status` vocabulary (string literals): `pending | running | succeeded | failed | timed_out | cancelled | rejected`.

Added `rxp @ git+https://github.com/Velascat/RxP.git\` as a dep.

Contract shape changes

  • `RuntimeInvocation` now requires `runtime_kind` (RxP-required field)
  • `RuntimeResult` now requires `runtime_kind`
  • `RuntimeResult.artifacts` is `list[ArtifactDescriptor]`, not `list[str]`
  • Deleted `contracts/status.py` (RuntimeStatus enum)

Process-group safety

The seed SubprocessRunner did basic `subprocess.run` with no group handling. Kodo's existing `_run_subprocess` had robust orphan-prevention; we mirror it now in ExecutorRuntime:

  • `start_new_session=True` → child becomes its own process-group leader
  • On timeout → `os.killpg(SIGKILL)` reaps the entire group (prevents orphan claude/codex worker subprocesses)
  • Transient SIGTERM handler installed for the run; if supervising Python is killed (supervisor stop, OOM), the child group is killed before exit. Previous handler is restored on return.

This is not optional polish — without it, kodo runs that hit timeout would leave orphans consuming API quota. Phase 3b in OperationsCenter cannot safely use ExecutorRuntime until this lands.

io/json_io.py

  • `dataclasses.asdict` → `model_dump_json`
  • `RuntimeInvocation(**payload)` → `RuntimeInvocation.model_validate(payload)`

Tests

  • Every fixture gains `runtime_kind="subprocess"`
  • `RuntimeStatus.SUCCEEDED` → `"succeeded"` (string literal comparisons)
  • New: rejects unknown `runtime_kind` via pydantic `ValidationError`
  • New: `result.artifacts` is a list of `ArtifactDescriptor` with `artifact_id` `"stdout"` / `"stderr"` and `kind` `"log_excerpt"`

15 tests pass (was 13; +2 new).

Test plan

  • Full test suite passes
  • No remaining import of `contracts.status`
  • Public API (`from executor_runtime import ExecutorRuntime`) unchanged
  • RxP RuntimeInvocation accepted by `SubprocessRunner.run`
  • RuntimeResult returned has the RxP shape (`runtime_kind`, `ArtifactDescriptor` list, string-literal status)
  • Phase 3b (OperationsCenter integration) lands after this merges

🤖 Generated with Claude Code

ExecutorRuntime now consumes the canonical RxP RuntimeInvocation /
RuntimeResult / ArtifactDescriptor contracts directly. The previous
parallel dataclass copies + RuntimeStatus enum are deleted; status is
RxP's runtime_status vocabulary (string literals).

Added: rxp @ git+https://github.com/Velascat/RxP.git as a dep.

Contract changes
- RuntimeInvocation: now requires `runtime_kind` (RxP's required field)
- RuntimeResult: now requires `runtime_kind`; `artifacts` is
  list[ArtifactDescriptor] not list[str]
- Removed: contracts/status.py (RuntimeStatus enum)

Process-group safety (matches kodo's previous behavior)
- SubprocessRunner spawns children with start_new_session=True so the
  child is its own process-group leader
- On timeout: os.killpg(SIGKILL) reaps the entire group, preventing
  orphan worker subprocesses (claude / codex / kodo helpers) from
  continuing to consume CPU/API quota
- Transient SIGTERM handler: if the supervising Python process is
  itself killed (supervisor stop, OOM), the child group is killed
  before exit. Previous handler is restored on return.

io/json_io.py
- Switched from dataclasses.asdict to pydantic model_dump_json
- Switched from RuntimeInvocation(**payload) to model_validate

Tests
- runtime_kind required in every fixture
- RuntimeStatus enum imports replaced with string-literal comparisons
  (e.g. assert result.status == "succeeded")
- New: rejects unknown runtime_kind via pydantic ValidationError
- New: artifacts come back as ArtifactDescriptor instances with
  artifact_id "stdout" / "stderr" and kind "log_excerpt"

15 tests pass (was 13; +2 new).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Velascat Velascat merged commit 9a44232 into main May 6, 2026
@Velascat Velascat deleted the feat/rxp-types-and-process-group branch May 6, 2026 21:22
Velascat added a commit that referenced this pull request May 7, 2026
* feat: AsyncHttpRunner — kickoff (202) + poll-until-terminal HTTP runner

Closes the async-shaped HTTP gap that HttpRunner explicitly punted on. Pairs with the new RuntimeKind 'http_async' (RxP PR #2).

Behavior:
- POST kickoff at http.url; expects 202 + JSON body with run_id at configurable dotted path. 200 also accepted (sync result treated as immediately-terminal).
- Substitutes {run_id} in http.poll_url_template, then polls that URL until status (at configurable dotted path) is in terminal_states. Success/failure determined by terminal status membership in success_states.
- Sleep between polls is injectable; defaults to time.sleep. Tests use a no-op or a counter to drive the loop deterministically.
- Sync from caller's POV — run() blocks until terminal or invocation timeout. No global state; each call uses a short-lived httpx.Client.
- Network errors, kickoff non-202/200, run_id extraction failure, poll non-200, status extraction failure, and timeouts each map to distinct error_summary messages.

Wire shape lives in RuntimeInvocation.metadata — strings only, matching RxP metadata typing. Comma-separated lists for terminal/success states keep the schema flat.

18 new unit tests covering happy paths (kickoff+poll, alternate JSON paths, sync 200 fast path), validation/rejection (5 missing-metadata cases), kickoff errors (non-202, timeout, run_id extraction), poll errors (non-200, timeout, status extraction), and poll-loop wiring (sleep called between polls, zero interval). 56 ER tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci: re-run after RxP main has http_async kind

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant