Below is a full, end‑to‑end implementation you can copy into a repo and run.

What you’ll get:

* A **custom MCP server** (Python, STDIO) you can plug into **Codex CLI** and/or **Gemini CLI** as a tool server.
* Internally, it orchestrates:

  * **Codex CLI as an MCP server** (spawned via `npx … codex mcp-server`)
  * **Gemini CLI in headless mode** (`gemini -p …`)
* Automatic **reasoning-depth selection** per task, mapped to:

  * Codex config override: `model_reasoning_effort`
  * GPT‑5.2 Agents: `ModelSettings(reasoning.effort=…, verbosity=…)`
* “GPT‑5.2 prompting guidance” baked into agent prompts: verbosity clamping, scope discipline, tool rules, ambiguity handling


## 1) Architecture

### Processes

1. **Your MCP host** (Codex CLI, Gemini CLI, Cursor, Claude Desktop, etc.)
2. **This custom MCP server**: `codex_multireason_mcp` (FastMCP server, STDIO)
3. **Inside the server**, on-demand:

   * Launches **Codex CLI MCP server**: exposes `codex` and `codex-reply` tools
   * Calls **Gemini CLI** headlessly for “second opinion” review

### What “multiple reasoning levels” means here

* For **Codex** sessions, we set `model_reasoning_effort` per task (fast vs deep) using the MCP `codex` tool’s `config` override (it overrides `$CODEX_HOME/config.toml`).
* For **planner/reviewer agents**, we set GPT‑5.2 `reasoning.effort` and `verbosity`.


## 2) Prerequisites

### System requirements

* **Python 3.10+**
* **Node.js 18+** (needed for `npx`)
* OpenAI API key in environment or `.env`
* Gemini CLI installed + API key env var `GEMINI_API_KEY`


## 3) Create the project


In [None]:
%%bash
mkdir codex-multireason-mcp
cd codex-multireason-mcp
python -m venv .venv
source .venv/bin/activate



### Install Python deps

Agents SDK guide recommends:


In [None]:
%%bash
pip install --upgrade openai openai-agents python-dotenv


Add MCP server framework:


In [None]:
%%bash
pip install fastmcp



FastMCP is a standard way to build MCP servers and supports `FastMCP(...)` + `mcp.run(transport="stdio")`.

Optional (recommended):


In [None]:
%%bash
pip install pydantic rich


## 4) Install / configure Codex CLI

You don’t have to global-install Codex if `npx` can fetch it, but you must have a working Codex CLI setup.

### Codex config file

Codex uses `~/.codex/config.toml`

Create/edit:

```toml
# ~/.codex/config.toml
# Base defaults (can be overridden per task by this system via MCP `codex` tool's `config` field)
model = "gpt-5-codex"
model_reasoning_effort = "medium"
sandbox_mode = "workspace-write"
approval_policy = "never"

[profiles.fast]
model_reasoning_effort = "minimal"

[profiles.standard]
model_reasoning_effort = "medium"

[profiles.deep]
model_reasoning_effort = "high"
```

Codex supports `model_reasoning_effort` and profiles under `[profiles.<name>]`, and describes precedence (CLI flags > profile > root config > defaults).

> Note: exact supported values for `model_reasoning_effort` can vary by Codex version/model. This system will attempt your chosen value and still works even if you only use `low/medium/high`.


## 5) Install / configure Gemini CLI

### Install

```bash
npm install -g @google/gemini-cli
```

### Set API key

```bash
export GEMINI_API_KEY="..."
```

Gemini CLI codelab uses `GEMINI_API_KEY`.

### Config file location

* User settings: `~/.gemini/settings.json`
* Project settings: `.gemini/settings.json` (in repo)

You can optionally define model aliases (helpful for “fast vs deep” reviews). Gemini CLI supports custom aliases under `modelConfigs.customAliases`.

Example `~/.gemini/settings.json` snippet (optional):

```json
{
  "modelConfigs": {
    "customAliases": {
      "review-fast": {
        "modelConfig": { "model": "gemini-2.5-flash" }
      },
      "review-deep": {
        "modelConfig": { "model": "gemini-2.5-pro" }
      }
    }
  }
}
```

### Headless mode flags

Gemini CLI supports non-interactive prompting (`-p`)  and output formatting like `--output-format json` .


## 6) Add the implementation files

Create this folder structure:

```bash
mkdir -p codex_multireason_mcp
touch codex_multireason_mcp/__init__.py
```


### 6.1 `codex_multireason_mcp/policy.py`

```python
from __future__ import annotations

from dataclasses import dataclass
import re
from typing import Literal

CodexEffort = Literal["minimal", "low", "medium", "high", "xhigh"]
OpenAIEffort = Literal["minimal", "low", "medium", "high", "xhigh"]
Verbosity = Literal["low", "medium", "high"]
Sandbox = Literal["read-only", "workspace-write"]
TaskType = Literal["code", "review", "research", "doc", "other"]


_LEVEL_ORDER: list[str] = ["minimal", "low", "medium", "high", "xhigh"]
_LEVEL_RANK = {lvl: i for i, lvl in enumerate(_LEVEL_ORDER)}


def _max_level(a: str, b: str) -> str:
    return a if _LEVEL_RANK[a] >= _LEVEL_RANK[b] else b


@dataclass(frozen=True)
class RouteDecision:
    task_type: TaskType
    codex_effort: CodexEffort
    openai_effort: OpenAIEffort
    verbosity: Verbosity
    sandbox: Sandbox
    use_gemini: bool
    rationale: str


_RISK_KEYWORDS = {
    # high-risk / irreversible / security
    r"\bsecurity\b": 4,
    r"\bauth\b|\boauth\b|\bjwt\b": 3,
    r"\bcredential\b|\bsecret\b|\btoken\b|\bapi key\b": 4,
    r"\bencrypt\b|\bcrypto\b": 3,
    r"\bmigration\b|\bdatabase\b|\bschema\b": 4,
    r"\bdelete\b|\bdrop\b|\btruncate\b": 4,
    r"\bproduction\b|\bprod\b": 3,
    r"\bcompliance\b|\bpolicy\b|\bPII\b|\bHIPAA\b|\bSOX\b": 4,
}

_COMPLEXITY_KEYWORDS = {
    r"\brefactor\b|\bre-architect\b|\barchitecture\b": 4,
    r"\bmonorepo\b|\bmicroservice\b": 3,
    r"\bperformance\b|\boptimi[sz]e\b": 2,
    r"\btest\b|\bunit test\b|\bintegration\b": 2,
    r"\bCI\b|\bCD\b|\bpipeline\b": 3,
    r"\bcontainer\b|\bdocker\b|\bkubernetes\b": 3,
    r"\btypescript\b|\brust\b|\bgo\b|\bpython\b": 1,
    r"\bmultiple files\b|\bmulti-file\b": 2,
}


def recommend(task: str, prefer: str = "auto") -> RouteDecision:
    """
    Deterministic router. This is the core of "agents automatically choose best reasoning level".
    prefer: auto|fast|deep
    """
    t = task.strip()
    t_lower = t.lower()

    # Task type heuristics
    if any(k in t_lower for k in ["diff", "review", "code review", "pr review"]):
        task_type: TaskType = "review"
    elif any(k in t_lower for k in ["summarize", "rewrite", "doc", "markdown", "readme"]):
        task_type = "doc"
    elif any(k in t_lower for k in ["research", "compare", "options", "evaluate", "trade-off"]):
        task_type = "research"
    elif any(k in t_lower for k in ["bug", "fix", "implement", "refactor", "feature", "tests", "build"]):
        task_type = "code"
    else:
        task_type = "other"

    risk = 0
    for pat, w in _RISK_KEYWORDS.items():
        if re.search(pat, t_lower):
            risk += w

    complexity = 0
    for pat, w in _COMPLEXITY_KEYWORDS.items():
        if re.search(pat, t_lower):
            complexity += w

    # Length adds mild complexity
    if len(t) > 800:
        complexity += 2
    elif len(t) > 250:
        complexity += 1

    # Map scores to levels
    if risk >= 10:
        base = "xhigh"
    elif risk >= 6 or complexity >= 8:
        base = "high"
    elif risk >= 3 or complexity >= 5:
        base = "medium"
    elif complexity >= 2:
        base = "low"
    else:
        base = "minimal"

    # User preference override
    if prefer == "fast":
        base = "minimal"
    elif prefer == "deep":
        base = "high"

    # Sandboxing policy
    sandbox: Sandbox = "workspace-write" if task_type in ("code",) else "read-only"

    # Gemini usage: use for medium+ risk/complexity reviews and any explicit "compare/evaluate"
    use_gemini = (task_type in ("review", "research")) or (base in ("medium", "high", "xhigh"))

    # Planner effort: usually one notch lower than Codex for speed
    openai_effort: OpenAIEffort = base
    if base == "xhigh":
        openai_effort = "high"

    verbosity: Verbosity = "low" if base in ("minimal", "low") else "medium"

    rationale = f"type={task_type}, risk={risk}, complexity={complexity}, prefer={prefer}, chosen={base}"

    return RouteDecision(
        task_type=task_type,
        codex_effort=base,          # passed to Codex via MCP `config.model_reasoning_effort`
        openai_effort=openai_effort,
        verbosity=verbosity,
        sandbox=sandbox,
        use_gemini=use_gemini,
        rationale=rationale,
    )
```


### 6.2 `codex_multireason_mcp/gemini_cli.py`

```python
from __future__ import annotations

import json
import os
import subprocess
from typing import Any, Optional


class GeminiCLIError(RuntimeError):
    pass


def run_gemini_headless(
    prompt: str,
    model: str = "review-fast",
    timeout_s: int = 120,
) -> dict[str, Any]:
    """
    Calls Gemini CLI in headless mode.
    - Gemini supports non-interactive prompting via -p. 
    - Gemini CLI supports --output-format json. 
    """
    env = os.environ.copy()

    cmd = [
        "gemini",
        "--model",
        model,
        "--prompt",
        prompt,
        "--output-format",
        "json",
    ]

    try:
        p = subprocess.run(
            cmd,
            input="",
            capture_output=True,
            text=True,
            env=env,
            timeout=timeout_s,
            check=False,
        )
    except FileNotFoundError as e:
        raise GeminiCLIError(
            "gemini CLI not found on PATH. Install it (npm install -g @google/gemini-cli)."
        ) from e
    except subprocess.TimeoutExpired as e:
        raise GeminiCLIError(f"gemini CLI timed out after {timeout_s}s") from e

    if p.returncode != 0:
        raise GeminiCLIError(
            f"gemini CLI failed (exit={p.returncode}). stderr:\n{p.stderr.strip()}"
        )

    raw = p.stdout.strip()

    # Best-effort JSON parse (format can vary by version)
    try:
        data = json.loads(raw)
        return {"raw_json": data, "text": _extract_text(data) or raw}
    except Exception:
        return {"raw_json": None, "text": raw}


def _extract_text(obj: Any) -> Optional[str]:
    """
    Try common shapes. Keep permissive since CLI schemas evolve.
    """
    if isinstance(obj, dict):
        for k in ("text", "output_text", "message"):
            v = obj.get(k)
            if isinstance(v, str) and v.strip():
                return v
        # candidates[0].content.parts[0].text style
        cands = obj.get("candidates")
        if isinstance(cands, list) and cands:
            c0 = cands[0]
            if isinstance(c0, dict):
                content = c0.get("content")
                if isinstance(content, dict):
                    parts = content.get("parts")
                    if isinstance(parts, list) and parts:
                        p0 = parts[0]
                        if isinstance(p0, dict) and isinstance(p0.get("text"), str):
                            return p0["text"]
    return None
```


### 6.3 `codex_multireason_mcp/workflow.py`

This is where GPT‑5.2 prompting guidance + Codex MCP calls happen.

```python
from __future__ import annotations

import asyncio
import os
import subprocess
from dataclasses import asdict
from typing import Any

from dotenv import load_dotenv
from agents import Agent, ModelSettings, Runner, set_default_openai_api
from agents.mcp import MCPServerStdio
from openai.types.shared import Reasoning

from .policy import RouteDecision
from .gemini_cli import run_gemini_headless


# GPT‑5.2 prompting guidance emphasizes explicit verbosity clamps, scope discipline,
# and crisp tool descriptions. 
OUTPUT_VERBOSITY_SPEC = """
<output_verbosity_spec>
- Default: 3–6 sentences or ≤5 bullets.
- For multi-step tasks: 1 short overview paragraph, then ≤5 bullets tagged:
  What changed, Where, Risks, Next steps, Open questions.
- Avoid long narrative paragraphs; prefer compact bullets and short sections.
- Do not rephrase the user’s request unless it changes semantics.
</output_verbosity_spec>
""".strip()

SCOPE_DISCIPLINE = """
<design_and_scope_constraints>
- Implement EXACTLY and ONLY what the user requests.
- No extra features, no embellishments, no new dependencies unless asked.
- If something is ambiguous, choose the simplest valid interpretation.
</design_and_scope_constraints>
""".strip()

TOOL_RULES = """
<tool_usage_rules>
- Use tools when you need repo-specific truth (file contents, diffs, tests).
- Parallelize independent reads when possible.
- After any write/update, restate: What changed, Where, and validation performed.
</tool_usage_rules>
""".strip()

UNCERTAINTY = """
<uncertainty_and_ambiguity>
- If ambiguous, present 2 plausible interpretations with labeled assumptions.
- Never invent exact details (paths, outputs) you didn't verify.
</uncertainty_and_ambiguity>
""".strip()


def _git_available(workspace: str) -> bool:
    try:
        p = subprocess.run(
            ["git", "rev-parse", "--is-inside-work-tree"],
            cwd=workspace,
            capture_output=True,
            text=True,
            check=False,
        )
        return p.returncode == 0 and p.stdout.strip() == "true"
    except FileNotFoundError:
        return False


def _git_diff_bundle(workspace: str, max_chars: int = 120_000) -> dict[str, Any]:
    if not _git_available(workspace):
        return {"is_git_repo": False, "stat": None, "name_only": None, "diff": None}

    stat = subprocess.run(
        ["git", "diff", "--stat"],
        cwd=workspace,
        capture_output=True,
        text=True,
        check=False,
    ).stdout

    name_only = subprocess.run(
        ["git", "diff", "--name-only"],
        cwd=workspace,
        capture_output=True,
        text=True,
        check=False,
    ).stdout

    diff = subprocess.run(
        ["git", "diff"],
        cwd=workspace,
        capture_output=True,
        text=True,
        check=False,
    ).stdout

    if len(diff) > max_chars:
        diff = diff[:max_chars] + "\n\n…(diff truncated)…\n"

    return {
        "is_git_repo": True,
        "stat": stat.strip() or None,
        "name_only": name_only.strip() or None,
        "diff": diff.strip() or None,
    }


async def run_workflow(task: str, workspace: str, decision: RouteDecision) -> dict[str, Any]:
    """
    Orchestrates:
    1) GPT‑5.2 planner (short plan + acceptance criteria)
    2) GPT‑5.2 implementer agent that calls Codex MCP with chosen reasoning effort
    3) Optional Gemini CLI review
    4) GPT‑5.2 summarizer
    """
    load_dotenv(override=True)
    set_default_openai_api(os.getenv("OPENAI_API_KEY"))

    planner = Agent(
        name="Planner",
        model="gpt-5.2",
        model_settings=ModelSettings(
            reasoning=Reasoning(effort=decision.openai_effort),
            verbosity=decision.verbosity,
        ),
        instructions="\n\n".join([
            "You turn user requests into an execution plan with crisp constraints.",
            OUTPUT_VERBOSITY_SPEC,
            SCOPE_DISCIPLINE,
            TOOL_RULES,
            UNCERTAINTY,
            "Output format:\n"
            "- 1 short overview paragraph\n"
            "- then bullets:\n"
            "  - Goals\n"
            "  - Non-goals\n"
            "  - Acceptance criteria\n"
            "  - Risks\n"
            "  - Minimal validation steps\n",
        ]),
    )

    # Start Codex CLI as an MCP server, as shown in the Agents SDK guide. 
    async with MCPServerStdio(
        name="Codex CLI",
        params={"command": "npx", "args": ["-y", "codex", "mcp-server"]},
        client_session_timeout_seconds=360000,
    ) as codex_mcp_server:

        implementer = Agent(
            name="Implementer",
            model="gpt-5.2",
            model_settings=ModelSettings(
                reasoning=Reasoning(effort=decision.openai_effort),
                verbosity="low",
            ),
            # Attaches Codex MCP tools to this agent. 
            mcp_servers=[codex_mcp_server],
            instructions="\n\n".join([
                "You implement the plan by calling the Codex MCP tool.",
                "Rules:",
                "- You MUST call the `codex` tool at least once for implementation work.",
                "- Use sandbox strictly as instructed.",
                "- Do not add extra features or dependencies.",
                OUTPUT_VERBOSITY_SPEC,
                SCOPE_DISCIPLINE,
                TOOL_RULES,
                UNCERTAINTY,
                "",
                "When calling Codex MCP, ALWAYS use JSON with these fields (Codex MCP supports them): "
                "`prompt`, `approval-policy`, `sandbox`, and optional `config` overrides. ",
                "",
                "Use this exact parameter shape:",
                f"""
{{
  "prompt": "...",
  "approval-policy": "never",
  "sandbox": "{decision.sandbox}",
  "config": {{
    "model_reasoning_effort": "{decision.codex_effort}"
  }},
  "cwd": "{workspace}"
}}
""".strip(),
                "",
                "After Codex completes, respond with ONLY:",
                "- 1 line: 'Codex run complete.'",
            ]),
        )

        summarizer = Agent(
            name="Summarizer",
            model="gpt-5.2",
            model_settings=ModelSettings(
                reasoning=Reasoning(effort="low"),
                verbosity="medium",
            ),
            instructions="\n\n".join([
                "You produce the final user-facing summary.",
                OUTPUT_VERBOSITY_SPEC,
                "If you see potential risk or missing validation, call it out in 'Risks' and 'Next steps'.",
            ]),
        )

        plan_res = await Runner.run(planner, task)
        plan_text = plan_res.final_output

        impl_input = (
            "Task:\n"
            f"{task}\n\n"
            "Plan:\n"
            f"{plan_text}\n\n"
            "Now implement using Codex MCP tool.\n"
        )
        _ = await Runner.run(implementer, impl_input)

    # After Codex finishes, collect diff (outside Codex, deterministic)
    diff_bundle = _git_diff_bundle(workspace)

    gemini_review = None
    if decision.use_gemini and diff_bundle.get("diff"):
        gem_prompt = (
            "You are a meticulous code reviewer.\n"
            "Review this diff for correctness, edge cases, security, and missing tests.\n"
            "Return:\n"
            "- Findings (bullets)\n"
            "- Severity per finding (low/med/high)\n"
            "- Suggested fixes (bullets)\n\n"
            f"Task:\n{task}\n\n"
            f"Diff stat:\n{diff_bundle.get('stat')}\n\n"
            f"Diff:\n{diff_bundle.get('diff')}\n"
        )
        # Choose model alias depending on effort (optional)
        model_alias = "review-deep" if decision.codex_effort in ("high", "xhigh") else "review-fast"
        gemini_review = run_gemini_headless(gem_prompt, model=model_alias)

    summary_input = (
        "Summarize the completed work.\n\n"
        f"Routing decision: {decision.rationale}\n\n"
        f"Plan:\n{plan_text}\n\n"
        f"Git diff stat:\n{diff_bundle.get('stat')}\n\n"
        f"Changed files:\n{diff_bundle.get('name_only')}\n\n"
        f"Gemini review:\n{(gemini_review or {}).get('text')}\n"
    )

    summary_res = await Runner.run(summarizer, summary_input)

    return {
        "decision": asdict(decision),
        "plan": plan_text,
        "git": diff_bundle,
        "gemini_review": gemini_review,
        "final": summary_res.final_output,
    }
```


Key points this uses from docs:

* Codex MCP `codex` tool supports `config` overrides and `sandbox` modes
* Codex supports `model_reasoning_effort` in config.toml and via CLI `--config model_reasoning_effort="high"`
* Agents SDK runs Codex MCP via `MCPServerStdio(... npx -y codex mcp-server ...)`
* GPT‑5.2 prompting guidance: verbosity clamps, scope discipline, tool rules
* Agents SDK supports `ModelSettings(reasoning.effort=…, verbosity=…)`


### 6.4 `codex_multireason_mcp/server.py` (Your custom MCP server)

```python
from __future__ import annotations

import os
from fastmcp import FastMCP

from .policy import recommend
from .workflow import run_workflow
from .gemini_cli import run_gemini_headless


mcp = FastMCP("codex-multireason-mcp")


@mcp.tool()
def recommend_route(task: str, prefer: str = "auto") -> dict:
    """
    Tool: returns the chosen reasoning/sandbox policy without executing anything.
    """
    d = recommend(task, prefer=prefer)
    return {
        "task": task,
        "prefer": prefer,
        "decision": d.__dict__,
    }


@mcp.tool()
async def run(task: str, workspace: str = ".", prefer: str = "auto") -> dict:
    """
    Tool: runs the full workflow (plan -> codex -> optional gemini -> summary).
    """
    d = recommend(task, prefer=prefer)
    return await run_workflow(task=task, workspace=workspace, decision=d)


@mcp.tool()
def gemini(prompt: str, model: str = "review-fast") -> dict:
    """
    Tool: direct access to Gemini CLI headless, useful for ad-hoc second opinions.
    """
    return run_gemini_headless(prompt, model=model)


def main() -> None:
    # Serve MCP over STDIO (typical for desktop/CLI hosts). 
    mcp.run(transport="stdio")


if __name__ == "__main__":
    main()
```


### 6.5 `codex_multireason_mcp/cli.py` (local smoke test without any MCP host)

```python
from __future__ import annotations

import asyncio
import sys

from .policy import recommend
from .workflow import run_workflow


async def _main() -> None:
    if len(sys.argv) < 2:
        print('Usage: python -m codex_multireason_mcp.cli "your task here" [workspace] [prefer]')
        raise SystemExit(2)

    task = sys.argv[1]
    workspace = sys.argv[2] if len(sys.argv) >= 3 else "."
    prefer = sys.argv[3] if len(sys.argv) >= 4 else "auto"

    decision = recommend(task, prefer=prefer)
    result = await run_workflow(task, workspace, decision)
    print(result["final"])


if __name__ == "__main__":
    asyncio.run(_main())
```


## 7) Run it locally (smoke test)

Create `.env`:

```bash
cat > .env << 'EOF'
OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=...  # only needed if you want Gemini reviews
EOF
```

Now run:

```bash
python -m codex_multireason_mcp.cli "Refactor the config loader to support profiles and add tests."
```

You should see a short, structured final summary, and if this is a git repo you’ll also get a diff bundle in the returned object (when run via MCP).


## 8) Use it as an MCP server

### Option A: Connect from Codex CLI

Codex MCP server configuration lives in `~/.codex/config.toml`

Add:

```toml
[mcp_servers.codex_multireason]
command = "python"
args = ["-m", "codex_multireason_mcp.server"]

[mcp_servers.codex_multireason.env]
OPENAI_API_KEY = "sk-..."   # or omit and rely on env
GEMINI_API_KEY = "..."      # optional
```

Codex supports MCP server entries via `[mcp_servers.<name>]` with `command`, `args`, and optional env.

Then launch Codex and check `/mcp` in the TUI.

### Option B: Connect from Gemini CLI as a tool host

Gemini CLI supports `mcpServers` in `settings.json`  and configuration files at `~/.gemini/settings.json` or `.gemini/settings.json`.

Example `~/.gemini/settings.json`:

```json
{
  "mcpServers": {
    "codex_multireason": {
      "command": "python",
      "args": ["-m", "codex_multireason_mcp.server"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}
```

Now Gemini CLI can call your MCP tools when prompted (depending on its tool-routing behavior).


## 9) How the “automatic reasoning” works (and how to tune it)

### Current policy behavior (deterministic)

* “typo/format/small fix” ⇒ `minimal`
* “refactor/tests/perf” ⇒ `medium`
* “architecture/migration/security” ⇒ `high` or `xhigh`
* Code-changing tasks ⇒ `sandbox="workspace-write"`, otherwise `read-only`

This is the knob Codex exposes as “Reasoning depth” via `model_reasoning_effort`.
And the Codex MCP tool lets us override config per session via `config`.

### GPT‑5.2 prompt shaping used here

We embed the guide’s best practices:

* Explicit verbosity clamps
* Scope drift prevention
* Tool usage rules

### Quick customization points

* Edit `policy.py` keyword weights
* Add an explicit “prefer=fast|deep” argument from your MCP host
* Modify `workflow.py` to:

  * always run Gemini on `high+`
  * use “review-deep” alias whenever diff > N lines


## 10) Notes & operational caveats

* **Gemini headless mode** (`-p`) can’t authorize interactive tools or run shell commands. That’s fine here because we only use it for text review.
* Codex MCP tool has `approval-policy` values like `untrusted`, `on-failure`, `never` and `sandbox` modes like `read-only` / `workspace-write` / `danger-full-access`.
  This implementation uses `approval-policy="never"` to avoid blocking automation, and uses sandboxing as the safety boundary.


## 11) What to do next (recommended upgrades)

If you want this to feel “production grade”:

1. **Keep Codex MCP server warm**
   Right now, each `run()` call starts a new Codex MCP subprocess. You can optimize by managing a long-lived Codex MCP server instance inside FastMCP (single process, reused per call).

2. **Progressive deepening**

   * Start with `low`
   * If tests fail or diff touches sensitive files, re-run with `high`

3. **Add a “guardrail tool”**

   * Block risky commands
   * Require explicit user override for `danger-full-access`

If you tell me your typical task categories (e.g., “mostly refactors + CI”, “mostly docs + small fixes”, “often security reviews”), I can give you a tuned `policy.py` profile and stronger, role-specific prompt blocks (e.g., a dedicated “Security Reviewer” agent that always forces `sandbox="read-only"` and `high` effort).
