[codex] migrate codexd into a self-contained Culture agent#7
[codex] migrate codexd into a self-contained Culture agent#7OriNachum wants to merge 18 commits into
Conversation
|
Version not bumped: |
Review Summary by QodoMigrate Codex Culture daemon into self-contained codexd runtime with harness orchestration
WalkthroughsDescription• Migrate Codex Culture daemon and shared harness internals into codexd as a self-contained runtime • Implement universal BaseDaemon orchestrator managing IRC transport, socket server, agent runner, and webhook client for all four backends (claude/codex/copilot/acp) • Add daemon lifecycle management with attention-based polling, crash recovery with circuit breaker, and IPC request dispatch • Implement CodexAgentRunner for managing Codex app-server subprocess via JSON-RPC with turn execution, token tracking, and OpenTelemetry instrumentation • Add daemon and managed repo workflow commands including workspace validation, Git helpers, and Codex app-server orchestration • Implement shared harness modules: IRC transport client with tracing, Unix socket IPC server, webhook alerting, message buffering, and room metadata parsing • Add OpenTelemetry integration with tracer/meter initialization, audit JSONL sink with rotation, and W3C trace-context propagation for IRC messages • Implement per-target attention state machine with stimulus-driven promotion and time-based decay for poll-loop scheduling • Add PID file management for daemon process lifecycle tracking and server listing • Implement repo workflow orchestration for clone, run, commit, and push operations with workspace validation • Package Codex Culture assets (culture.yaml, SKILL.md) and update documentation • Add comprehensive test coverage for all migrated components including daemon config, attention tracking, telemetry, Git operations, and CLI dispatch Diagramflowchart LR
CLI["CLI<br/>daemon/repo commands"]
BaseDaemon["BaseDaemon<br/>universal orchestrator"]
CodexDaemon["CodexDaemon<br/>Codex-specific"]
IRC["IRC Transport<br/>async client"]
SocketServer["Socket Server<br/>IPC"]
AgentRunner["AgentRunner<br/>JSON-RPC"]
Supervisor["Supervisor<br/>behavior eval"]
Telemetry["Telemetry<br/>OTEL/audit"]
RepoWorkflow["Repo Workflow<br/>Git ops"]
Attention["Attention<br/>state machine"]
CLI -->|start/repo| CodexDaemon
CodexDaemon -->|extends| BaseDaemon
BaseDaemon -->|uses| IRC
BaseDaemon -->|uses| SocketServer
BaseDaemon -->|uses| Attention
CodexDaemon -->|uses| AgentRunner
CodexDaemon -->|uses| Supervisor
BaseDaemon -->|uses| Telemetry
CLI -->|repo run/push| RepoWorkflow
RepoWorkflow -->|executes| AgentRunner
File Changes1. codexd/harness/base_daemon.py
|
Code Review by Qodo
Context used✅ Compliance rules (platform):
11 rules 1. culture-irc skill file misplaced
|
|
| --- | ||
| name: culture-irc | ||
| description: > | ||
| Communicate over IRC on the Culture network. Use when the user asks to | ||
| read messages, send messages, check who's online, join/part channels, or | ||
| interact with other agents on the IRC mesh. | ||
| --- | ||
|
|
||
| # IRC Skill for Culture | ||
|
|
||
| This skill lets you communicate over IRC through the culture daemon. | ||
| The daemon runs as a background process and maintains a persistent IRC connection. | ||
|
|
||
| ## Setup |
There was a problem hiding this comment.
1. culture-irc skill file misplaced 📘 Rule violation ⌂ Architecture
A new Codex skill definition file was added at codexd/agent/skill/SKILL.md instead of under .agents/skills/, which violates the required repository-local skills layout. This can break skill discovery and creates a non-standard skill location in the repo.
Agent Prompt
## Issue description
A repository-local Codex skill definition (`SKILL.md` with skill front-matter) was introduced outside `.agents/skills/`.
## Issue Context
The repo already uses `.agents/skills/<skill>/SKILL.md` for skill discovery/standardization. The new skill file is currently located under the Python package path and is also explicitly included in the wheel build.
## Fix Focus Areas
- codexd/agent/skill/SKILL.md[1-194]
- pyproject.toml[41-44]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| ## [Unreleased] | ||
|
|
||
| ### Added | ||
|
|
||
| - Migrated the Codex Culture daemon into `codexd` as a self-contained runtime. | ||
| - Added `codexd repo run` for managed clone, Codex app-server execution, commit, | ||
| and branch push workflows. |
There was a problem hiding this comment.
2. Package version not bumped 📘 Rule violation § Compliance
This PR adds significant production functionality (daemon/runtime and new CLI workflows) but the package version in pyproject.toml remains 0.1.2. Compliance requires updating the version (and maintaining the changelog entry) for PRs that change published runtime behavior.
Agent Prompt
## Issue description
Production code and CLI behavior changed, but the distribution version was not updated.
## Issue Context
`CHANGELOG.md` has new entries under `[Unreleased]` describing newly added daemon and repo-work functionality, while `pyproject.toml` still declares `version = "0.1.2"`.
## Fix Focus Areas
- pyproject.toml[2-4]
- CHANGELOG.md[8-15]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def checkout_branch(repo: Path, branch: str, *, base: str | None = None) -> None: | ||
| if base: | ||
| run_git(repo, "fetch", "origin") | ||
| run_git(repo, "checkout", "-B", branch, base) | ||
| return | ||
| run_git(repo, "checkout", "-B", branch) |
There was a problem hiding this comment.
3. Git base option injection 🐞 Bug ⛨ Security
checkout_branch() forwards the user-supplied base directly into git checkout -B ... base, so a base starting with - is interpreted as a git option and can change behavior or fail unexpectedly. This is reachable from repo run --base via run_repo_task() with no validation.
Agent Prompt
### Issue description
`codexd.repo.git.checkout_branch()` passes `base` directly to `git checkout -B`, allowing option-style values (e.g., starting with `-`) to be parsed as git flags.
### Issue Context
`base` originates from the CLI (`codexd repo run --base`) and is forwarded through `codexd.repo.workflow.run_repo_task()`.
### Fix Focus Areas
- codexd/repo/workspace.py[36-56]
- codexd/repo/workflow.py[48-66]
- codexd/repo/git.py[39-44]
### Suggested fix
- Add a `validate_base_ref()` (or reuse a generalized ref validator) that rejects:
- leading `-`
- whitespace/control chars
- `..`, trailing `.lock`, `@{`, etc. (same class of checks as branch safety)
- Call it in `run_repo_task()` before invoking `checkout_branch()`.
- Optionally, also validate inside `checkout_branch()` as defense-in-depth.
- Consider verifying the ref exists (e.g., `git rev-parse --verify <base>^{commit}`) and raise a clear `GitError` if not.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def run_git(cwd: Path, *args: str) -> subprocess.CompletedProcess[str]: | ||
| result = subprocess.run( | ||
| ["git", *args], | ||
| cwd=cwd, | ||
| text=True, | ||
| capture_output=True, | ||
| check=False, | ||
| ) | ||
| if result.returncode != 0: | ||
| detail = result.stderr.strip() or result.stdout.strip() or "git command failed" | ||
| raise GitError(detail) | ||
| return result | ||
|
|
||
|
|
||
| def clone(remote: str, checkout_path: Path) -> None: | ||
| result = subprocess.run( | ||
| ["git", "clone", remote, str(checkout_path)], | ||
| text=True, | ||
| capture_output=True, | ||
| check=False, | ||
| ) |
There was a problem hiding this comment.
4. Git commands can hang 🐞 Bug ☼ Reliability
run_git() and clone() call subprocess.run(...) with no timeout and default environment, so clone/fetch/push can block indefinitely and wedge repo workflows. This can leave the CLI stuck with no recovery path besides external interruption.
Agent Prompt
### Issue description
Git subprocesses are invoked without timeouts and with an inherited environment, so they can block forever (network stall, credential prompts).
### Issue Context
`codexd repo run/push` relies on these helpers to complete; a hang wedges the whole workflow.
### Fix Focus Areas
- codexd/repo/git.py[13-37]
- codexd/repo/workflow.py[48-83]
### Suggested fix
- Add a timeout parameter (configurable; e.g., 300s) to `run_git()` / `clone()`.
- Disable interactive prompts by setting env overrides (at least):
- `GIT_TERMINAL_PROMPT=0`
- optionally `GIT_ASKPASS=` and/or `SSH_ASKPASS=`
- Catch `subprocess.TimeoutExpired` and raise `GitError` with a clear message including the git args.
- Consider emitting a hint that the remote may require credentials when prompt is disabled.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| async def _terminate_process(self) -> None: | ||
| """Terminate the subprocess, escalating to kill on timeout.""" | ||
| if not self._process: | ||
| return | ||
| try: | ||
| self._process.terminate() | ||
| async with asyncio.timeout(_codex_const.PROCESS_TERMINATE_GRACE_SECONDS): | ||
| await self._process.wait() | ||
| except (asyncio.TimeoutError, ProcessLookupError): | ||
| try: | ||
| self._process.kill() | ||
| except ProcessLookupError: | ||
| pass |
There was a problem hiding this comment.
5. Killed process not reaped 🐞 Bug ☼ Reliability
CodexAgentRunner._terminate_process() calls kill() on timeout but never awaits process.wait(), which can leave a zombie subprocess and leak resources. Over repeated failures/timeouts this can exhaust process table resources and destabilize the daemon.
Agent Prompt
### Issue description
On terminate timeout, the code sends SIGKILL but does not await `wait()`, so the child may not be reaped.
### Issue Context
This path is reachable whenever the codex app-server becomes unresponsive during shutdown.
### Fix Focus Areas
- codexd/agent/agent_runner.py[164-177]
### Suggested fix
- After `self._process.kill()`, always `await self._process.wait()` (optionally under a short timeout).
- Ensure `_process` is cleared/reset after shutdown to avoid reuse of a dead handle.
- Consider logging when escalation to kill occurs so operators can spot repeated unclean terminations.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Summary
Why
codexd previously only contained package metadata, docs, and scaffold-level CLI behavior. This change lands the runnable Codex-only daemon and the delegated remote-repository workflow described in the migration plan.
Validation
uv run pytest -n auto --cov=codexduv run black --check codexd testsuv run isort --check-only codexd testsuv run flake8 codexd testsuv run bandit -c pyproject.toml -r codexdmarkdownlint-cli2 "**/*.md"uv builduv run python -m codexd --version