Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/mcp/autonomous_agent_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,11 +123,11 @@ An advanced pattern: use PlanExe to plan the agent's own work.
4. Agent executes the plan step by step, tracking progress against the WBS

Key files in the zip for agent consumption:
- `018-2-wbs_level1.json` — High-level work packages
- `018-5-wbs_level2.json` — Detailed tasks within each package
- `023-2-wbs_level3.json` — Sub-tasks with effort estimates
- `004-2-pre_project_assessment.json` — Feasibility assessment
- `003-6-distill_assumptions_raw.json` — Key assumptions to validate
- `wbs_level1.json` — High-level work packages
- `wbs_level2.json` — Detailed tasks within each package
- `wbs_level3.json` — Sub-tasks with effort estimates
- `pre_project_assessment.json` — Feasibility assessment
- `distill_assumptions_raw.json` — Key assumptions to validate

## Prompt writing tips for agents

Expand Down
4 changes: 2 additions & 2 deletions docs/mcp/mcp_details.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ curl -H "X-API-Key: pex_0123456789abcdef" -O "https://mcp.planexe.org/download/2

Download report:
```bash
curl -H "X-API-Key: pex_0123456789abcdef" -O "https://mcp.planexe.org/download/2d57a448-1b09-45aa-ad37-e69891ff6ec7/030-report.html"
curl -H "X-API-Key: pex_0123456789abcdef" -O "https://mcp.planexe.org/download/2d57a448-1b09-45aa-ad37-e69891ff6ec7/report.html"
```

## Tool Catalog, `mcp_local`
Expand All @@ -248,7 +248,7 @@ Example call:
- Save directory is `PLANEXE_PATH`, or current working directory if unset.
- Non-existing directories are created automatically.
- If `PLANEXE_PATH` points to a file, download fails.
- Filename is prefixed with plan id (for example `<plan_id>-030-report.html`).
- Filename is prefixed with plan id (for example `<plan_id>-report.html`).
- Response includes `saved_path` with the exact local file location.

## Minimal error-handling contract
Expand Down
4 changes: 2 additions & 2 deletions docs/mcp/planexe_mcp_interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -522,7 +522,7 @@ Use `plan_resume` when `plan_status` shows `failed` or `stopped` and plan genera

**Required semantics**

- The MCP tool only accepts plans in `failed` state. However, the underlying Luigi mechanism is more general: Luigi skips any task whose output file already exists and re-executes any task whose output file is missing. This means a completed plan can be partially re-run by deleting `999-pipeline_complete.txt` and the output files of the tasks you want to regenerate — Luigi will re-execute those tasks and all their downstream dependents. The MCP API does not yet expose this capability; it is available when running the pipeline locally via `run_plan_pipeline.py`.
- The MCP tool only accepts plans in `failed` state. However, the underlying Luigi mechanism is more general: Luigi skips any task whose output file already exists and re-executes any task whose output file is missing. This means a completed plan can be partially re-run by deleting `pipeline_complete.txt` and the output files of the tasks you want to regenerate — Luigi will re-execute those tasks and all their downstream dependents. The MCP API does not yet expose this capability; it is available when running the pipeline locally via `run_plan_pipeline.py`.
- On success, the same plan_id is reset to `pending` and requeued.
- Prior artifacts are **preserved** — the worker restores the output directory from the stored zip snapshot.
- `resume_count` tracks how many times the plan has been resumed.
Expand Down Expand Up @@ -577,7 +577,7 @@ Bump `PIPELINE_VERSION` whenever the pipeline changes in a way that would break
- Save directory is `PLANEXE_PATH`.
- If `PLANEXE_PATH` is unset, save to current working directory.
- If `PLANEXE_PATH` points to a file (not a directory), return an error.
- Filenames are `<plan_id>-030-report.html` or `<plan_id>-run.zip`.
- Filenames are `<plan_id>-report.html` or `<plan_id>-run.zip`.
- If a filename already exists, append `-1`, `-2`, ... before extension.
- Successful responses include `saved_path`.

Expand Down
4 changes: 2 additions & 2 deletions docs/proposals/06-adopt-on-the-fly.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ This is a concrete implementation plan for making PlanExe's agent behavior adapt

PlanExe already has multiple "early classification" concepts and quality gates that we can build on:

- **Purpose classification (business/personal/other)**: `worker_plan/worker_plan_internal/assume/identify_purpose.py` produces `002-6-identify_purpose.md` and is already used downstream (e.g., SWOT prompt selection).
- **Purpose classification (business/personal/other)**: `worker_plan/worker_plan_internal/assume/identify_purpose.py` produces `identify_purpose.md` and is already used downstream (e.g., SWOT prompt selection).

- **Plan type classification (digital/physical)**: `worker_plan/worker_plan_internal/assume/identify_plan_type.py` produces `002-8-plan_type.md`. Note: it intentionally labels most software development as "physical" (because it assumes a physical workspace/devices).
- **Plan type classification (digital/physical)**: `worker_plan/worker_plan_internal/assume/identify_plan_type.py` produces `plan_type.md`. Note: it intentionally labels most software development as "physical" (because it assumes a physical workspace/devices).

- **Levers pipeline**: `worker_plan/worker_plan_internal/lever/*` produces potential levers -> deduped -> enriched -> "vital few" -> scenarios/strategic decisions.

Expand Down
8 changes: 4 additions & 4 deletions docs/proposals/101-luigi-resume-enhancements.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ Behavior:
```
$ planexe invalidate SelectScenarioTask --run-dir ./run/Qwen_Clean_v1
Would delete:
run/Qwen_Clean_v1/002-17-selected_scenario_raw.json
run/Qwen_Clean_v1/002-18-selected_scenario.json
run/Qwen_Clean_v1/002-19-scenarios.md
run/Qwen_Clean_v1/selected_scenario_raw.json
run/Qwen_Clean_v1/selected_scenario.json
run/Qwen_Clean_v1/scenarios.md
Proceed? [y/N]
```

Expand All @@ -101,7 +101,7 @@ Tonight we needed to re-run `SelectScenarioTask` after applying a fix. Without k

### The problem

The input plan (`001-2-plan.txt`) is locked in at run start. If a user wants to refine the plan description mid-run — clarify scope, correct a factual error, tighten the framing — there is no supported path. The only option is start a new run from scratch.
The input plan (`plan.txt`) is locked in at run start. If a user wants to refine the plan description mid-run — clarify scope, correct a factual error, tighten the framing — there is no supported path. The only option is start a new run from scratch.

### What we want

Expand Down
6 changes: 3 additions & 3 deletions docs/proposals/107-domain-aware-normalizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,9 +401,9 @@ MakeAssumptions → [QuantifiedAssumptionExtractor] → [FermiSanityCheck] → [

The three new tasks (in brackets) are inserted between the existing MakeAssumptions and DistillAssumptions tasks. Each produces output files following PlanExe's standard naming convention:

- `003-12-fermi_sanity_check_report.json` — detailed per-assumption verdicts
- `003-13-fermi_sanity_check_summary.md` — human-readable summary of findings
- `003-14-normalized_assumptions.json` — all assumptions in standard representation
- `fermi_sanity_check_report.json` — detailed per-assumption verdicts
- `fermi_sanity_check_summary.md` — human-readable summary of findings
- `normalized_assumptions.json` — all assumptions in standard representation

The FermiSanityCheck report includes a section on ethical flags, making it visible to both the downstream pipeline tasks and human reviewers.

Expand Down
4 changes: 2 additions & 2 deletions docs/proposals/112-end-to-end-test-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ These tests exercise the MCP server, database, and worker interactions without i

**Variant — worker-side check:**
1. Bypass the MCP-layer check (e.g. manually set `parameters["pipeline_version"]` to match current).
2. But ensure the `001-3-planexe_metadata.json` in the zip snapshot has a different version.
2. But ensure the `planexe_metadata.json` in the zip snapshot has a different version.
3. Let the worker pick up the resumed plan.
4. Assert: worker sets plan to failed with progress_message containing "Not resumable".

Expand Down Expand Up @@ -87,7 +87,7 @@ These tests invoke real LLMs and are non-deterministic, slow (~10-20 min per pla
4. Call `plan_file_info` with `artifact: "report"` — assert `download_url` is present.
5. Call `plan_file_info` with `artifact: "zip"` — assert `download_url` is present.
6. Download the report and verify it is valid HTML containing expected sections.
7. Download the zip and verify `001-3-planexe_metadata.json` is present with correct `pipeline_version`.
7. Download the zip and verify `planexe_metadata.json` is present with correct `pipeline_version`.

### 7. Resume after mid-generation failure

Expand Down
6 changes: 3 additions & 3 deletions docs/proposals/114-mcp-interface-feedback-stress-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ During the stress test, Plan 1 (20f1cfac) stalled at 5.5% with zero diagnostic i
"state": "failed",
"error": {
"failure_reason": "generation_error",
"failed_step": "016-expert_criticism",
"failed_step": "expert_criticism",
"message": "LLM provider returned 503",
"recoverable": true
}
Expand Down Expand Up @@ -248,7 +248,7 @@ This is a trust gap: the agent cannot confidently tell the user "your plan is re
"sections_complete": 108,
"sections_partial": 2,
"partial_details": [
{"step": "016-expert_criticism", "note": "2/8 experts provided feedback"}
{"step": "expert_criticism", "note": "2/8 experts provided feedback"}
]
}
```
Expand Down Expand Up @@ -507,7 +507,7 @@ No stale error information leaked between states.

### Files list ordering fix

The files list in `plan_status` now shows the most recent 10 files instead of the first 10. When the plan completed, the agent saw `029-2-self_audit.md`, `030-report.html`, `999-pipeline_complete.txt` etc. instead of the same early pipeline files every time. Much more useful for monitoring progress.
The files list in `plan_status` now shows the most recent 10 files instead of the first 10. When the plan completed, the agent saw `self_audit.md`, `report.html`, `pipeline_complete.txt` etc. instead of the same early pipeline files every time. Much more useful for monitoring progress.

### Agent-server capability mismatch (systemic observation)

Expand Down
12 changes: 6 additions & 6 deletions docs/proposals/117-system-prompt-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,10 +303,10 @@ populate_baseline.py # script to populate baseline from zip files
baseline/ # current outputs (extracted from dataset zips)
train/
20260310_hong_kong_game/
001-1-start_time.json
001-2-plan.txt
start_time.json
plan.txt
...
030-report.html
report.html
20250329_gta_game/
...
20250321_silo/
Expand Down Expand Up @@ -338,8 +338,8 @@ history/ # captured output, global run coun
outputs.jsonl
outputs/
20250321_silo/
002-9-potential_levers_raw.json
002-10-potential_levers.json
potential_levers_raw.json
potential_levers.json
activity_overview.json
usage_metrics.jsonl
20260310_hong_kong_game/
Expand Down Expand Up @@ -382,7 +382,7 @@ scores/ # longitudinal tracking
full_plan_comparisons/ # Stage 3 periodic full-plan regenerations
2026-03-20/
hong_kong_game/
030-report.html
report.html
kpi_comparison.json
```

Expand Down
26 changes: 13 additions & 13 deletions docs/proposals/133-dag-and-rca.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Example:

{
"id": "executive_summary_markdown",
"path": "025-2-executive_summary.md",
"path": "executive_summary.md",
"format": "md",
"role": "summary_markdown"
}
Expand Down Expand Up @@ -132,7 +132,7 @@ A stronger format could allow fields like:

{
"from_node": "executive_summary",
"artifact_path": "025-2-executive_summary.md",
"artifact_path": "executive_summary.md",
"used_for": "decision-maker summary section"
}

Expand All @@ -143,7 +143,7 @@ How RCA can work with the current format
Goal

The goal of RCA is to answer questions like:
• Why is a false claim shown in 030-report.html?
• Why is a false claim shown in report.html?
• Which upstream artifact first contained it?
• Which node likely introduced it?
• Which source file should be inspected first?
Expand All @@ -153,7 +153,7 @@ Investigation strategy
Step 1: Start from the final artifact

Begin with the final output artifact, such as:
030-report.html
• report.html

Find the node that produces it.

Expand Down Expand Up @@ -210,15 +210,15 @@ Suppose the final report contains the false claim:
The project requires 12 full-time engineers.

A practical investigation would look like this:
1. search 030-report.html for the claim
1. search report.html for the claim
2. inspect the report node inputs
3. search 025-2-executive_summary.md
4. search 024-2-review_plan.md
5. search 013-team.md
6. if the claim appears in 013-team.md, inspect the team_markdown node
3. search executive_summary.md
4. search review_plan.md
5. search team.md
6. if the claim appears in team.md, inspect the team_markdown node
7. inspect that node’s inputs:
011-2-enrich_team_members_environment_info.json
012-review_team_raw.json
• enrich_team_members_environment_info.json
• review_team_raw.json
8. search those artifacts for the same claim or the numeric value
9. continue upstream until the earliest occurrence is found
10. inspect the producing node’s source_files
Expand Down Expand Up @@ -255,7 +255,7 @@ Example:

{
"id": "review_plan_markdown",
"path": "024-2-review_plan.md",
"path": "review_plan.md",
"format": "md",
"role": "review_output"
}
Expand All @@ -266,7 +266,7 @@ Example:

{
"from_node": "review_plan",
"artifact_path": "024-2-review_plan.md",
"artifact_path": "review_plan.md",
"used_for": "quality review section"
}

Expand Down
Loading