Skip to content

feat(#106): Planner agent + TaskSpec + auto-inference — Phase B backend#121

Merged
Abernaughty merged 6 commits intomainfrom
feature/106-planner-agent
Apr 5, 2026
Merged

feat(#106): Planner agent + TaskSpec + auto-inference — Phase B backend#121
Abernaughty merged 6 commits intomainfrom
feature/106-planner-agent

Conversation

@Abernaughty
Copy link
Copy Markdown
Owner

@Abernaughty Abernaughty commented Apr 4, 2026

Summary

Implements Issue #106 Phase B backend: the Planner agent, TaskSpec schema, workspace auto-inference, readiness checklist validation, and three new API endpoints for conversational task planning.

Closes the backend portion of #106 Phase B. Dashboard (ChatView) updates will follow in a subsequent PR.

What's New

Planner Agent (src/agents/planner.py)

  • TaskSpec Pydantic model — structured task specification with workspace, objective, languages, frameworks, output_type, acceptance_criteria, constraints, related_files
  • PlannerSession — manages conversational planning session state with 30-min idle TTL
  • ChecklistStatus — tracks readiness checklist completion (Required/Recommended/Optional tiers)
  • infer_workspace_stack() — auto-detects languages and frameworks from project files (package.json, pyproject.toml, Cargo.toml, go.mod, etc.)
  • send_planner_message() — core Planner interaction: sends user message to LLM, extracts TaskSpec updates from response JSON block, updates checklist
  • PlannerSessionStore — in-memory session store with lazy TTL cleanup
  • PLANNER_MODEL env var — configurable model (default: gemini-2.0-flash-lite), supports Gemini, Anthropic, and OpenAI-compatible models
  • Zero tool access enforced — Planner never gets filesystem, GitHub, or sandbox tools

API Endpoints (src/api/main.py)

  • POST /tasks/plan — Start a new Planner session with workspace validation + auto-inference
  • POST /tasks/plan/{session_id}/message — Send a message to the Planner agent
  • POST /tasks/plan/{session_id}/submit — Confirm and create task from TaskSpec (validates required checklist items)

SSE Events (src/api/events.py)

  • PLANNER_MESSAGE event type — streams Planner responses to the dashboard in real-time

API Models (src/api/models.py)

  • PlannerStartRequest, PlannerMessageRequest — request bodies
  • PlannerSessionResponse, PlannerSubmitResponse — response schemas
  • PlannerChecklistItemResponse, PlannerChecklistResponse — readiness checklist display
  • PlannerTaskSpecResponse — structured task spec for dashboard rendering

Tests (tests/test_planner.py)

  • TaskSpec serialization and to_description()
  • Checklist validation (required satisfied, recommended warnings, partial required)
  • Auto-inference: JavaScript, TypeScript, Python/FastAPI, SvelteKit, Rust, Go, multi-language projects
  • JSON extraction from Planner LLM responses
  • TaskSpec update application (including workspace override protection)
  • Session lifecycle (creation, expiry, touch, store CRUD)
  • Planner message flow with mocked LLM (field extraction, minimal-input warnings, workspace safety)

Security Considerations

  • Workspace never overridden: _apply_spec_updates() explicitly skips workspace field from LLM output
  • Zero tools: Planner agent has no filesystem/GitHub/sandbox access
  • Session TTL: 30-minute idle timeout prevents memory leaks from abandoned sessions
  • Auto-inference whitelist: Only reads known project manifest files (package.json, pyproject.toml, etc.)
  • Protected workspace PIN: Checked at session start, consistent with existing POST /tasks flow

Design Decisions

Decision Resolution
Planner scope Pre-graph conversational agent — NOT a LangGraph node
Model PLANNER_MODEL env var, default gemini-2.0-flash-lite
Tool access Zero — non-negotiable
Session storage In-memory dict with TTL (production: consider Redis)
Submission guard Required fields must be satisfied; recommended warn but don't block

What's Next (Session 2)

  • Dashboard: ChatView Planner flow — conversational messages, summary card, "Submit to Architect" button
  • Dashboard: Planner message styling (distinct from system/orchestrator)
  • Dashboard: Minimal-input warning display
  • Wire SSE planner_message events to dashboard store

Testing

cd dev-suite
uv run --group dev pytest tests/test_planner.py -v

Summary by CodeRabbit

  • New Features

    • Interactive AI-guided task planning with checklist-based readiness assessment
    • Automatic workspace technology detection (languages, frameworks)
    • Multi-turn conversational planning sessions with persistence, expiry, and session lifecycle APIs
    • Real-time event streaming for planner responses (SSE)
    • API endpoints to start sessions, send messages, and submit planned tasks
    • User-facing warnings for incomplete or minimal inputs
  • Tests

    • Comprehensive test coverage for planner inference, checklist logic, message flow, and session lifecycle

Phase B Step 1-3: Planner agent module with:
- TaskSpec Pydantic model for structured task specifications
- PlannerSession for managing conversational planning sessions
- ChecklistStatus for tracking readiness checklist completion
- infer_workspace_stack() for auto-detecting languages/frameworks
- Planner LLM integration (PLANNER_MODEL env var, zero tools)
- Comprehensive unit tests for all planner logic
- EventType.PLANNER_MESSAGE for streaming planner responses via SSE
- PlannerStartRequest, PlannerMessageRequest for API endpoints
- PlannerSessionResponse, PlannerSubmitResponse for responses
- PlannerChecklistItemResponse, PlannerChecklistResponse for readiness
- PlannerTaskSpecResponse for structured task spec display
Three new endpoints for the Planner conversational flow:
- POST /tasks/plan — start a session with workspace auto-inference
- POST /tasks/plan/{id}/message — send message to Planner agent
- POST /tasks/plan/{id}/submit — confirm and create task from TaskSpec

Includes workspace validation, protected workspace PIN checks,
PLANNER_MESSAGE SSE events, and submission guardrails (required
fields must be satisfied, workspace never overridden by LLM).
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

Adds a Planner agent for conversational task-spec refinement: workspace manifest inference, checklist validation, TTL session management, LLM-driven structured updates, an in-memory session store, SSE events, three API endpoints (create/message/submit), and tests covering inference, checklist, JSON handling, and sessions.

Changes

Cohort / File(s) Summary
Core Planner Agent
dev-suite/src/agents/planner.py
New Planner implementation: Pydantic models (TaskSpec, checklist enums/models, PlannerSession, messages/responses), workspace manifest inference (infer_workspace_stack), checklist builder (build_checklist), session lifecycle & TTL, in-memory PlannerSessionStore singleton, LLM-driven flow (send_planner_message), JSON extract/apply utilities.
API Routes & Integration
dev-suite/src/api/main.py
Adds planner endpoints: POST /tasks/plan (start session, infer workspace), POST /tasks/plan/{session_id}/message (send message, call planner LLM, publish PLANNER_MESSAGE SSE), POST /tasks/plan/{session_id}/submit (validate readiness, create/submit task). Adds _planner_session_to_response helper and SSE integration.
API Event Types
dev-suite/src/api/events.py
Added EventType.PLANNER_MESSAGE = "planner_message" to support planner SSE events.
API Models
dev-suite/src/api/models.py
New request/response models for Planner API: PlannerStartRequest, PlannerMessageRequest, checklist/task-spec/session/submit response models.
Tests
dev-suite/tests/test_planner.py
Comprehensive pytest suite validating TaskSpec/to_description, checklist construction & warnings, manifest-based workspace inference across languages/frameworks, JSON extraction/stripping, spec update application rules (no workspace override, coercions, skip empties), session TTL/store behavior, and end-to-end send_planner_message with mocked LLM.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as API Server
    participant Planner as Planner Agent
    participant LLM as LLM Service
    participant Store as Session Store
    participant EventBus as Event Bus

    Client->>API: POST /tasks/plan (workspace)
    API->>Planner: infer_workspace_stack(workspace)
    Planner-->>API: {languages, frameworks}
    API->>Planner: create_planner_session(...)
    Planner->>Store: store session
    Planner-->>API: PlannerSessionResponse

    Client->>API: POST /tasks/plan/{id}/message
    API->>Store: get session
    API->>Planner: send_planner_message(session, message)
    Planner->>LLM: call model with spec + checklist
    LLM-->>Planner: text + JSON block
    Planner->>Planner: extract JSON, apply updates, rebuild checklist
    Planner-->>API: PlannerResponse (ready, warnings)
    API->>EventBus: publish PLANNER_MESSAGE
    EventBus-->>Client: SSE event

    Client->>API: POST /tasks/plan/{id}/submit
    API->>Store: get & validate session
    API->>Planner: verify required_satisfied
    API->>API: state_manager.create_task + task_runner.submit
    API->>Store: mark session submitted
    API-->>Client: PlannerSubmitResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐰 I hop through manifests and whisper the plan,
I sniff languages and frameworks by scan,
With checklist in paw and JSON to bake,
I polish your spec for the Architect's sake,
Hoppity-hop — now send it off, fluffle and fan! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: implementing a Planner agent with TaskSpec and workspace auto-inference as Phase B backend for Issue #106.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/106-planner-agent

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/agents/planner.py`:
- Around line 271-315: The heuristics currently use substring matches and weak
file presence checks which cause false positives (e.g., matching "preact" as
React via _PACKAGE_JSON_FRAMEWORK_MAP and treating any build.gradle as Kotlin);
update the inference logic to use exact package-name matching for package.json
deps (compare dependency keys exactly against keys in
_PACKAGE_JSON_FRAMEWORK_MAP, allow scoped-prefix exceptions explicitly like
"react-" if intended), change pyproject checks to require whole-token matches
against _PYPROJECT_FRAMEWORK_MAP, and tighten Kotlin detection so build.gradle
alone does not cause "Kotlin" to be appended—only add Kotlin when a
build.gradle.kts exists or when build.gradle contains explicit Kotlin DSL/plugin
coordinates or the Kotlin JVM/Android plugin identifiers; keep references to the
same variables (languages, frameworks, workspace) and ensure build_checklist()
receives only strongly-evidenced entries.
- Around line 549-564: The _apply_spec_updates function currently writes any
parsed JSON value into TaskSpec fields, which allows wrong types (e.g.,
"languages": "Python") to corrupt the spec; add runtime validation before
mutating: for each field in updates (inside _apply_spec_updates) check the
expected TaskSpec field type/shape (use TaskSpec.model_fields or
TaskSpec.__fields__ to get expected types), coerce or reject mismatched types
(e.g., require list for languages, dict/structured types for
acceptance_criteria), log or skip invalid updates, and only setattr when the
value matches the expected schema; reference TaskSpec, _apply_spec_updates,
build_checklist(), and TaskSpec.to_description() to ensure incoming values will
be compatible with downstream consumers.

In `@dev-suite/src/api/main.py`:
- Around line 450-456: Before calling state_manager.create_task/
task_runner.submit with session.task_spec, re-validate the workspace policy for
session.task_spec.workspace (i.e., run the same allow-list and PIN checks that
were used at session open) or verify a carried PlannerSession auth flag is still
valid; if the re-check fails, abort and return an error instead of creating the
task. Locate the usage around session.task_spec.to_description(),
state_manager.create_task and task_runner.submit and either invoke the existing
workspace policy check function (the one used when opening a session) or consult
and invalidate a stored PlannerSession auth state before proceeding.

In `@dev-suite/tests/test_planner.py`:
- Around line 307-312: The test test_ignore_unknown_fields is currently vacuous
because the assertion uses "or True"; update the assertion in that test to
actually verify unknown fields are not set: remove the "or True" and assert that
hasattr(spec, "unknown_field") is False (or use "assert not hasattr(spec,
'unknown_field')"), ensuring _apply_spec_updates and TaskSpec behavior is
exercised; locate the test function test_ignore_unknown_fields and replace the
final assertion accordingly so it fails if unknown fields are applied.
- Around line 9-28: The import block includes unused names (Path,
ChecklistPriority, PlannerSession) and is not sorted, causing Ruff I001/F401;
remove the unused imports (drop Path, ChecklistPriority, PlannerSession) from
the import list and reorder the remaining imports into a properly sorted group
(standard library first like json, time, pathlib if kept, then third-party like
pytest, then local module imports) or run isort/ruff --fix to normalize
ordering; ensure the test references still match the retained symbols such as
AsyncMock, patch, pytest, and the planner symbols that are actually used (e.g.,
TaskSpec, _apply_spec_updates, build_checklist, create_planner_session, etc.).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5a3b371-c820-4886-ae50-b35b6fedb694

📥 Commits

Reviewing files that changed from the base of the PR and between e43843c and f316940.

📒 Files selected for processing (5)
  • dev-suite/src/agents/planner.py
  • dev-suite/src/api/events.py
  • dev-suite/src/api/main.py
  • dev-suite/src/api/models.py
  • dev-suite/tests/test_planner.py

1. test_planner.py: Remove unused imports (Path, ChecklistPriority,
   PlannerSession) fixing ruff F401/I001 CI failure
2. test_planner.py: Fix vacuous assertion (remove `or True`)
3. test_planner.py: Add tests for type coercion and preact/gradle edge cases
4. planner.py: Use exact dep matching in infer_workspace_stack —
   prevents "preact" → React false positive
5. planner.py: Tighten Kotlin detection — only from .kts, not plain
   build.gradle
6. planner.py: Add type validation in _apply_spec_updates — coerce
   string→list, reject non-string/non-list values
7. planner.py: Move `import re` to top-level (was lazy-imported twice)
Addresses CodeRabbit Major: submit_planner_session now re-checks
ws_mgr.is_allowed() and ws_mgr.is_protected() before creating the
task. Prevents a session opened before workspace removal/protection
from submitting into that directory.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
dev-suite/tests/test_planner.py (1)

15-25: ⚠️ Potential issue | 🟡 Minor

Fix import sorting to resolve CI failure.

The ruff linter is failing with I001 because imports within the src.agents.planner block are not sorted according to ruff's isort rules (typically case-insensitive alphabetical order).

🧹 Proposed fix
 from src.agents.planner import (
-    PlannerSessionStore,
-    TaskSpec,
     _apply_spec_updates,
     _extract_task_spec_updates,
     _strip_json_block,
     build_checklist,
     create_planner_session,
     infer_workspace_stack,
+    PlannerSessionStore,
     send_planner_message,
+    TaskSpec,
 )

Alternatively, run ruff check --fix or ruff format to auto-fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_planner.py` around lines 15 - 25, The import block from
src.agents.planner is not sorted which triggers ruff I001; reorder the imported
symbols (PlannerSessionStore, TaskSpec, _apply_spec_updates,
_extract_task_spec_updates, _strip_json_block, build_checklist,
create_planner_session, infer_workspace_stack, send_planner_message) into
ruff/isort-compliant alphabetical order (case-insensitive) or run `ruff check
--fix`/`ruff format` to auto-fix the imports so the test file passes CI.
🧹 Nitpick comments (2)
dev-suite/src/agents/planner.py (2)

797-841: Consider thread safety if scaling beyond single-worker deployment.

PlannerSessionStore uses a plain dict which is not thread-safe for concurrent modifications. This is fine for:

  • Single-worker asyncio (default uvicorn mode)
  • Multiple workers (each has isolated memory)

If you later need shared session state across workers or use threaded concurrency, consider using threading.Lock or migrating to Redis/external store.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/agents/planner.py` around lines 797 - 841, PlannerSessionStore
is not thread-safe because it mutates _sessions without synchronization; add a
threading.Lock (e.g. self._lock = threading.Lock()) and acquire/release it (use
context manager) around all accesses and mutations of self._sessions in create,
get, remove, count, and _cleanup_expired to prevent concurrent races, or
alternatively replace the in-memory dict with a process-shared store (e.g.
Redis) and update create/get/remove/count/_cleanup_expired to use that backend
if cross-worker sharing is required.

373-385: Semantic note on auto_inferred flag.

The auto_inferred flag is set to True whenever languages or frameworks have values, even if the user manually provided them. This is technically imprecise—it only indicates "has values" rather than "values came from auto-inference."

For the MVP this is acceptable since it's primarily for UI hints, but consider tracking the actual inference source if future features need to distinguish user-provided vs auto-detected values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/agents/planner.py` around lines 373 - 385, The auto_inferred
flag for the ChecklistItem entries for "languages" and "frameworks" incorrectly
treats any non-empty value as auto-inferred; update the code that builds those
items (the ChecklistItem construction for fields "languages" and "frameworks" in
planner.py) to read a true inference indicator from the TaskSpec instead of
using bool(task_spec.languages/frameworks). Add or use explicit flags on
TaskSpec (e.g., task_spec.inferred_languages and task_spec.inferred_frameworks
or similar properties), set auto_inferred=task_spec.inferred_languages for the
languages item and auto_inferred=task_spec.inferred_frameworks for the
frameworks item, and fall back to False if those inference flags are not present
to preserve current behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@dev-suite/tests/test_planner.py`:
- Around line 15-25: The import block from src.agents.planner is not sorted
which triggers ruff I001; reorder the imported symbols (PlannerSessionStore,
TaskSpec, _apply_spec_updates, _extract_task_spec_updates, _strip_json_block,
build_checklist, create_planner_session, infer_workspace_stack,
send_planner_message) into ruff/isort-compliant alphabetical order
(case-insensitive) or run `ruff check --fix`/`ruff format` to auto-fix the
imports so the test file passes CI.

---

Nitpick comments:
In `@dev-suite/src/agents/planner.py`:
- Around line 797-841: PlannerSessionStore is not thread-safe because it mutates
_sessions without synchronization; add a threading.Lock (e.g. self._lock =
threading.Lock()) and acquire/release it (use context manager) around all
accesses and mutations of self._sessions in create, get, remove, count, and
_cleanup_expired to prevent concurrent races, or alternatively replace the
in-memory dict with a process-shared store (e.g. Redis) and update
create/get/remove/count/_cleanup_expired to use that backend if cross-worker
sharing is required.
- Around line 373-385: The auto_inferred flag for the ChecklistItem entries for
"languages" and "frameworks" incorrectly treats any non-empty value as
auto-inferred; update the code that builds those items (the ChecklistItem
construction for fields "languages" and "frameworks" in planner.py) to read a
true inference indicator from the TaskSpec instead of using
bool(task_spec.languages/frameworks). Add or use explicit flags on TaskSpec
(e.g., task_spec.inferred_languages and task_spec.inferred_frameworks or similar
properties), set auto_inferred=task_spec.inferred_languages for the languages
item and auto_inferred=task_spec.inferred_frameworks for the frameworks item,
and fall back to False if those inference flags are not present to preserve
current behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f08a9345-e23e-460c-8f51-4274f8863c65

📥 Commits

Reviewing files that changed from the base of the PR and between f316940 and e17062c.

📒 Files selected for processing (2)
  • dev-suite/src/agents/planner.py
  • dev-suite/tests/test_planner.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/api/main.py`:
- Around line 376-381: The exception message returned to the client leaks
internal details; keep the existing logger.error with exc_info=True for full
diagnostics but replace the client-facing _error call so it sends a generic
message (e.g., "Planner failed" or "Internal planner error") and the same 502
status without including the exception object; update the except block around
send_planner_message to call _error with a sanitized string while still logging
the full exception via logger.error and then return.
- Around line 462-464: The planner submission path currently calls
state_manager.create_task(...) and task_runner.submit(task_id, description,
workspace=workspace) but omits the publish_pr flag that the regular POST /tasks
path forwards; update the planner flow to accept and propagate publish_pr (e.g.
add publish_pr to PlannerStartRequest or the planner handler's params) and pass
it into task_runner.submit(task_id, description, workspace=workspace,
publish_pr=publish_pr) so planner-created tasks honor the same PR publishing
option as the regular endpoint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0a539a69-8463-4bdb-80e9-e34c070186cd

📥 Commits

Reviewing files that changed from the base of the PR and between e17062c and 61a7456.

📒 Files selected for processing (1)
  • dev-suite/src/api/main.py

Comment on lines +376 to +381
try:
planner_resp = await send_planner_message(session, body.message)
except Exception as e:
logger.error("Planner LLM call failed: %s", e, exc_info=True)
_error(f"Planner failed: {e}", 502)
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sanitize exception details before returning to client.

The error message f"Planner failed: {e}" may leak internal details (stack traces, file paths, or configuration info) from the underlying LLM call or JSON extraction. Consider a generic message for the client and log the full exception separately.

🛡️ Suggested fix
     try:
         planner_resp = await send_planner_message(session, body.message)
     except Exception as e:
         logger.error("Planner LLM call failed: %s", e, exc_info=True)
-        _error(f"Planner failed: {e}", 502)
+        _error("Planner service temporarily unavailable. Please try again.", 502)
         return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
planner_resp = await send_planner_message(session, body.message)
except Exception as e:
logger.error("Planner LLM call failed: %s", e, exc_info=True)
_error(f"Planner failed: {e}", 502)
return
try:
planner_resp = await send_planner_message(session, body.message)
except Exception as e:
logger.error("Planner LLM call failed: %s", e, exc_info=True)
_error("Planner service temporarily unavailable. Please try again.", 502)
return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/api/main.py` around lines 376 - 381, The exception message
returned to the client leaks internal details; keep the existing logger.error
with exc_info=True for full diagnostics but replace the client-facing _error
call so it sends a generic message (e.g., "Planner failed" or "Internal planner
error") and the same 502 status without including the exception object; update
the except block around send_planner_message to call _error with a sanitized
string while still logging the full exception via logger.error and then return.

Comment on lines +462 to +464
# Create task via existing flow
task_id = await state_manager.create_task(description, workspace=workspace)
task_runner.submit(task_id, description, workspace=workspace)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing publish_pr parameter for planner-submitted tasks.

The regular POST /tasks endpoint at line 201 passes publish_pr=body.publish_pr to task_runner.submit(), but the planner submit path omits it. Planner-created tasks will always use the default PR behavior with no user control.

Consider adding publish_pr to the submit request or PlannerStartRequest, or document that planner tasks always use the default.

🔧 Suggested fix if adding to submit endpoint
+from .models import PlannerSubmitRequest  # Add new request model

 `@app.post`("/tasks/plan/{session_id}/submit", response_model=ApiResponse)
 async def submit_planner_session(
     session_id: str,
+    body: PlannerSubmitRequest | None = None,
     _auth: str | None = Depends(require_auth),
 ):
     # ... existing validation ...

     # Create task via existing flow
     task_id = await state_manager.create_task(description, workspace=workspace)
-    task_runner.submit(task_id, description, workspace=workspace)
+    task_runner.submit(
+        task_id,
+        description,
+        workspace=workspace,
+        publish_pr=body.publish_pr if body else None,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/api/main.py` around lines 462 - 464, The planner submission
path currently calls state_manager.create_task(...) and
task_runner.submit(task_id, description, workspace=workspace) but omits the
publish_pr flag that the regular POST /tasks path forwards; update the planner
flow to accept and propagate publish_pr (e.g. add publish_pr to
PlannerStartRequest or the planner handler's params) and pass it into
task_runner.submit(task_id, description, workspace=workspace,
publish_pr=publish_pr) so planner-created tasks honor the same PR publishing
option as the regular endpoint.

Single blank line between import block and first code, not two.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dev-suite/tests/test_planner.py (1)

386-389: Use SESSION_TTL_SECONDS in expiry tests instead of hard-coded 3600.

These assertions are coupled to an implicit 30-minute policy but encode 1 hour directly. If TTL changes, tests can become misleading. Prefer deriving from the source constant.

Proposed diff
 from src.agents.planner import (
     PlannerSessionStore,
+    SESSION_TTL_SECONDS,
     TaskSpec,
@@
     def test_session_expiry(self):
         session = create_planner_session(workspace="/proj")
-        session.last_activity = time.time() - 3600  # 1 hour ago
+        session.last_activity = time.time() - (SESSION_TTL_SECONDS + 1)
         assert session.is_expired
@@
     def test_expired_session_removed(self):
         store = PlannerSessionStore()
         session = create_planner_session(workspace="/proj")
-        session.last_activity = time.time() - 3600
+        session.last_activity = time.time() - (SESSION_TTL_SECONDS + 1)
         store.create(session)
         assert store.get(session.session_id) is None

Also applies to: 412-417

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_planner.py` around lines 386 - 389, Update the expiry
tests to compute the stale timestamp using the module constant
SESSION_TTL_SECONDS instead of the hard-coded 3600; specifically, in
test_session_expiry (and the similar test around lines for session inactivity)
set session.last_activity = time.time() - (SESSION_TTL_SECONDS + 1) (or -
SESSION_TTL_SECONDS to test boundary) so the assertion session.is_expired
derives from SESSION_TTL_SECONDS, and import or reference the constant from the
planner/session module where create_planner_session and session.is_expired are
defined.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/tests/test_planner.py`:
- Around line 391-396: The test_session_touch uses time.sleep which causes
flakiness; instead, set session.last_activity to a deterministic older timestamp
and call session.touch(), then assert session.last_activity is greater than the
previous value. Modify test_session_touch to create the session via
create_planner_session(), assign session.last_activity to a past datetime (or
timestamp) value, call session.touch(), and assert the updated
session.last_activity > previous value (referencing the test function name
test_session_touch and methods create_planner_session, session.touch, and
session.last_activity).

---

Nitpick comments:
In `@dev-suite/tests/test_planner.py`:
- Around line 386-389: Update the expiry tests to compute the stale timestamp
using the module constant SESSION_TTL_SECONDS instead of the hard-coded 3600;
specifically, in test_session_expiry (and the similar test around lines for
session inactivity) set session.last_activity = time.time() -
(SESSION_TTL_SECONDS + 1) (or - SESSION_TTL_SECONDS to test boundary) so the
assertion session.is_expired derives from SESSION_TTL_SECONDS, and import or
reference the constant from the planner/session module where
create_planner_session and session.is_expired are defined.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68c5fb54-8cf2-4e77-afce-254a0cc3424b

📥 Commits

Reviewing files that changed from the base of the PR and between 61a7456 and a34fbd4.

📒 Files selected for processing (1)
  • dev-suite/tests/test_planner.py

Comment on lines +391 to +396
def test_session_touch(self):
session = create_planner_session(workspace="/proj")
old_time = session.last_activity
time.sleep(0.01)
session.touch()
assert session.last_activity > old_time
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid time.sleep() in test_session_touch to reduce CI flakiness.

Sleeping in unit tests is slow and timing-sensitive. You can set last_activity to a known older value and assert touch() updates it.

Proposed diff
     def test_session_touch(self):
         session = create_planner_session(workspace="/proj")
-        old_time = session.last_activity
-        time.sleep(0.01)
+        old_time = session.last_activity
+        session.last_activity = old_time - 10
         session.touch()
-        assert session.last_activity > old_time
+        assert session.last_activity > old_time
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_session_touch(self):
session = create_planner_session(workspace="/proj")
old_time = session.last_activity
time.sleep(0.01)
session.touch()
assert session.last_activity > old_time
def test_session_touch(self):
session = create_planner_session(workspace="/proj")
old_time = session.last_activity
session.last_activity = old_time - 10
session.touch()
assert session.last_activity > old_time
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_planner.py` around lines 391 - 396, The
test_session_touch uses time.sleep which causes flakiness; instead, set
session.last_activity to a deterministic older timestamp and call
session.touch(), then assert session.last_activity is greater than the previous
value. Modify test_session_touch to create the session via
create_planner_session(), assign session.last_activity to a past datetime (or
timestamp) value, call session.touch(), and assert the updated
session.last_activity > previous value (referencing the test function name
test_session_touch and methods create_planner_session, session.touch, and
session.last_activity).

@Abernaughty Abernaughty merged commit c92efbc into main Apr 5, 2026
3 checks passed
@Abernaughty Abernaughty deleted the feature/106-planner-agent branch April 5, 2026 03:40
Abernaughty added a commit that referenced this pull request Apr 5, 2026
Wire the dashboard ChatView to the Planner backend (PR #121).
Conversational task planning flow: workspace select → describe
objective → Planner validates checklist → submit to Architect.

New files:
- stores/planner.svelte.ts — session state, message history, checklist
- routes/api/planner/+server.ts — proxy POST /tasks/plan
- routes/api/planner/[id]/message/+server.ts — proxy message endpoint
- routes/api/planner/[id]/submit/+server.ts — proxy submit endpoint

Modified files:
- types/api.ts — Planner types + planner_message SSE event
- sse.ts — dispatch planner_message events to plannerStore
- stores/index.ts — export + teardown plannerStore
- components/views/ChatView.svelte — full rewrite with Planner flow

Closes #106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant