v0.1.1 by joshwilhelmi · Pull Request #2 · GobbyAI/gobby

joshwilhelmi · 2026-01-08T16:01:42Z

Catching up main to current

Summary by CodeRabbit

New Features
- Autonomous sessions: graceful stop signals, progress tracking, stagnation/stuck detection, and task-selection monitoring
- Enhanced handoff generation: compact/full summaries with DB and file export options
- Metrics: failing-tool reporting and tool-level metrics reset
- New HTTP/WebSocket endpoints for signaling/querying/clearing session stops
- Database migrations to support autonomous workflows
Bug Fixes
- Workflow validation now prevents creating child sessions for invalid workflows
Documentation
- CLI examples updated from "workflow" → "workflows"; roadmap and guides reorganized
Tests
- Extensive new and expanded test suites across adapters, agents, and spawners

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add language specifier to code block in tasks.md - Move lifecycle workflow validation before session creation in runner.py - Add empty command validation in embedded.py spawner - Use full app_path instead of app_name in macos.py spawner - Fix PowerShell command injection in windows.py spawner - Skip None values in verification_dict loop in init.py - Fix hook detection to only check GOBBY_HOOK_START marker - Fix xargs filename handling for spaces in git_hooks.py - Add configurable limit parameter to reregister_active_sessions - Log JSONDecodeError with context in codex_executor.py - Convert search_messages to async in session_messages.py - Add direction validation in task_sync.py - Move cast and WorkflowLoader imports to module level in worktrees.py - Use Literal type for provider parameter in worktrees.py - Remove unreachable if-check in tasks.py - Move auto-transition inside transaction in tasks.py - Add robust func_name extraction with regex in expansion.py - Catch ValueError in validate_workflow_for_agent in loader.py

… status - Update Sprint 10, 12, 21, 22, 30 statuses - Update Milestones 5, 7, 8, 12 with completion details - Fix terminology: Phase-based → Step-based workflow enforcement - Document remaining Task V2 items (git hooks, CLI commands) - Document remaining Worktree items (merge resolution)

All hook events are integrated with workflows: - on_session_start, on_session_end, on_stop, on_pre_compact - on_before_tool (aliased from on_tool_call) - on_after_tool (aliased from on_tool_result) - on_before_agent (aliased from on_prompt_submit) TRIGGER_ALIASES in engine.py provides backward compatibility for legacy names.

All Session Message Tracking sprints are fully implemented: - 7.1: Database schema, LocalSessionMessageManager, ParsedMessage - 7.2: SessionMessageProcessor with byte-offset polling - 7.3: Runner/HookManager integration, WebSocket broadcasting

…ling - Clarify schema descriptions: skip_validation for bypassing LLM validation, no_commit_needed only for tasks with no code changes - Add summarize_diff_for_validation() to ensure all file names are visible in large diffs by prioritizing file list over full content - Add is_doc_only_diff() to detect documentation-only changes - Auto-skip LLM validation for doc-only changes (.md, .txt, .rst, etc.) - Add tests for new functions

The gobby tasks hooks command duplicated functionality already in cli/installers/git_hooks.py with an inferior implementation (no hook chaining, no backups, no pre-commit framework support). - Delete src/gobby/cli/tasks/hooks.py - Remove hooks_cmd from main.py registration - Update docs to reference gobby install instead - Mark git hook integration complete in ROADMAP.md

Mark session chaining and task-driven work loops as complete: - autonomous-loop.yaml workflow exists and is functional - autonomous-task.yaml provides task-driven execution - start_new_session action implemented in session_actions.py Remaining work: multi-surface stop signals, progress tracking with stuck detection

…orkflows' Consistent with other CLI commands (tasks, sessions, agents) and the MCP server naming (gobby-workflows). Updated: - src/gobby/cli/workflows.py: Renamed function workflow() to workflows() - src/gobby/cli/__init__.py: Updated import and add_command - CLAUDE.md, AGENTS.md, ROADMAP.md: Updated documentation - docs/guides/workflows.md: Updated CLI reference - docs/architecture/cli-commands.md: Updated command reference - docs/plans/completed/WORKFLOWS.md: Updated historical docs - docs/examples/workflows/*.yaml, README.md: Updated examples

- Add get_failing_tools() to ToolMetricsManager with failure rate query - Add get_failing_tools MCP tool to gobby-metrics server - Add include_metrics parameter to list_tools HTTP endpoint - Add get_metrics_manager dependency for FastAPI routes - Update ROADMAP.md to mark Sprints 12-15 as complete All MCP Proxy Enhancement milestones are now complete: - Sprint 12: Tool Metrics (get_failing_tools, include_metrics) - Sprint 13: Lazy Server Init (already implemented) - Sprint 14: Semantic Tool Search (already implemented) - Sprint 15: Self-Healing & Incremental Indexing (already implemented)

- Add tool_name parameter to ToolMetricsManager.reset_metrics() - Add tool_name parameter to reset_metrics MCP tool - Allows resetting metrics for a specific tool, not just server-wide

- Add new reset_tool_metrics(server_name, tool_name) MCP tool - Provides focused interface for resetting specific tool metrics - Complements existing reset_metrics() which accepts project_id

- Sprint 20 (Session Management): Pending → ✅ Complete All session MCP tools are implemented: get_session, get_current_session, list_sessions, session_stats, create_handoff, get_handoff_context, get_session_commits, pickup, mark_loop_complete - Sprint 29 (Autonomous Work Loop): Pending → 🔶 Partial Matches Milestone 11 which shows session chaining and task-driven work loops as complete, with stop signals and progress tracking remaining

… workflows - Add src/gobby/autonomous/stop_registry.py with StopRegistry class - Thread-safe stop signal management - signal_stop(), get_signal(), has_pending_signal(), acknowledge(), clear() - Database-backed persistence with session_stop_signals table - Add database migration (v37) for session_stop_signals table - Foreign key to sessions table with CASCADE delete - Indexed for pending signal lookups - Add workflow actions (check_stop_signal, request_stop, clear_stop_signal) - Integrated into ActionExecutor with stop_registry parameter - Workflow variables updated with signal info - Add has_stop_signal() condition function to evaluator - Can be used in workflow transition conditions - Registered via evaluator.register_stop_registry() - Wire up StopRegistry in HookManager - Passed to ActionExecutor - Registered with ConditionEvaluator

- Add src/gobby/autonomous/progress_tracker.py with ProgressTracker class - Track progress events by type (file_modified, task_completed, etc.) - Classify events as high-value or low-value - Stagnation detection based on time and event counts - Add database migration (v38) for loop_progress table - Indexed for efficient session queries - Separate index for high-value event lookups - Export ProgressTracker, ProgressType, ProgressEvent, ProgressSummary from gobby.autonomous module

- Add src/gobby/autonomous/stuck_detector.py with StuckDetector class - Layer 1: Task selection loop detection - Layer 2: Progress stagnation detection (via ProgressTracker) - Layer 3: Tool call pattern detection - Add database migration (v39) for task_selection_history table - Tracks task selections by session for loop detection - Indexed for efficient lookups - Add workflow actions in autonomous_actions.py: - start_progress_tracking / stop_progress_tracking - record_progress / get_progress_summary - detect_task_loop / detect_stuck - record_task_selection - Wire up ProgressTracker and StuckDetector in HookManager - Register all actions in ActionExecutor Note: check_stop_signal was already implemented in gt-f29c73

Add REST API endpoints for stop signal management: - POST /sessions/{session_id}/stop - Signal a session to stop gracefully - GET /sessions/{session_id}/stop - Check for pending stop signal - DELETE /sessions/{session_id}/stop - Clear a stop signal These endpoints integrate with the StopRegistry through HookManager, allowing external systems (dashboards, monitoring) to request graceful shutdown of autonomous sessions. Includes comprehensive tests for all endpoints and error cases.

Add WebSocket integration for autonomous loop monitoring: - Add broadcast_autonomous_event() method to WebSocketServer for events: - loop_started/stopped: Progress tracking lifecycle - task_started: Task selected for work - progress_recorded: High-value progress events - stuck_detected: Loop stuck conditions detected - stop_requested: External stop signal received - Add stop_request message handler to WebSocket server for clients to request session stops via WebSocket in addition to HTTP - Update ActionExecutor with websocket_server parameter to enable event broadcasting from autonomous workflow actions - Wire up stop_registry to WebSocket server for stop_request handling

Enable conditional branching in workflows based on webhook responses: - Add `webhook` condition type to ConditionEvaluator.check_exit_conditions() - Add evaluate_webhook_conditions() async method for pre-evaluating webhooks - Support checking webhook response status codes, body content, JSON fields - Allow storing webhook results in workflow variables (store_as option) - Register webhook_executor method for enabling webhook conditions Webhook condition config options: - url, method, headers, payload, timeout - expect_success: Check response is 2xx (default true) - status_code: Check specific status code or list of codes - body_contains: Check body contains string - json_field: Check JSON field value (dot notation) - json_value: Expected value for json_field - store_as: Store full result in named variable Includes comprehensive tests for all webhook condition features.

Add agent mode for external task validation that spawns an isolated agent instance instead of using direct LLM API calls. Changes: - Add external_validator_mode config option (llm|agent, default: llm) - Add AgentRunner-based validation in external_validator.py - Wire external validation into close_task() flow - Pass agent_runner through create_task_registry → setup_internal_registries - Add 5 new tests for agent mode validation The agent mode provides more thorough validation as the agent can use tools to read files, run tests, and verify implementation details.

…ions - Add 2,314 new tests (from 3,413 to 5,727) - Add ~20,000 lines of test code across 30 files - Fix all originally failing tests - Add tests for: storage, hooks, llm, sessions, workflows, mcp_proxy, agents, memory, tasks Coverage improvements: - hooks/event_handlers.py: +coverage - hooks/hook_manager.py: +coverage - hooks/plugins.py: +coverage - memory/extractor.py: +coverage - llm/claude.py: +coverage - sessions/summary.py: +coverage - storage/mcp.py: +coverage - sync/tasks.py: +coverage - runner.py: +coverage - workflows/actions.py: +coverage - worktrees/git.py: +coverage - And many more modules Target of 80% requires additional work on hard-to-test adapter, CLI, and spawner modules.

Add test file tests/agents/spawners/test_windows_spawner.py with 83 tests covering all Windows spawner classes: - WindowsTerminalSpawner: 18 tests for is_available() and spawn() - CmdSpawner: 15 tests for is_available() and spawn() - PowerShellSpawner: 18 tests including pwsh/powershell fallback - WSLSpawner: 18 tests including Windows-to-WSL path conversion Additional test coverage: - Security tests for command injection prevention - Edge case tests for paths with spaces and special characters - Environment variable handling and validation - Platform-specific skip decorators for Windows-only integration tests All tests use mocked Windows APIs to allow running on any platform. Module coverage: 99%

- Complete rewrite of test_loader.py with 49 tests - Cover all functions: WorkflowLoader, _find_workflow_file, _merge_workflows, _merge_steps, discover_lifecycle_workflows, _scan_directory, clear_discovery_cache, validate_workflow_for_agent - Test workflow loading scenarios: valid YAML, invalid YAML, caching, project paths - Test inheritance: valid chains, self-cycles, two-way cycles, three-level cycles - Test error handling: missing workflows, missing parents, circular inheritance - Test edge cases: empty YAML, files vs directories in scan, priority sorting - Achieve 100% code coverage for loader.py (up from 82%)

- Added 66 tests covering all functions in git_hooks.py - Tests cover install_git_hooks, uninstall_git_hooks, and helper functions - Tests include different input scenarios: empty files, existing hooks, pre-commit framework hooks - Tests cover error handling: backup failures, subprocess timeouts, missing directories - Tests mock file system operations and git operations appropriately - Achieved 100% code coverage for git_hooks.py (up from 12%)

- Add 65 tests covering all commands: start, list, show, status, cancel, stats, cleanup - Test success cases, error handling, JSON output, and edge cases - Mock external dependencies (database, HTTP calls, daemon connection) - Use Click's CliRunner for CLI command testing - Coverage for agents.py increased from 15% to 99%

- Add comprehensive tests for CLI modules (daemon, agents, extensions, init, install) - Add tests for CLI installers (shared, claude, codex, gemini, git_hooks, antigravity) - Add tests for agents/spawners/windows.py and agents/tty_config.py - Add tests for adapters/base.py - Add tests for workflows (actions, mcp_actions, todo_actions, llm_actions, git_utils, etc.) - Add tests for mcp_proxy modules (actions, services, transports) - Add tests for utils (git.py, project_context.py, project_init.py) - Rename conflicting test files to avoid import collisions - Coverage increased from 74% to 80.51% - Total tests: 7,239 (up from 5,914)

Renamed tests/utils/test_git.py to tests/utils/test_utils_git.py to avoid conflict with tests/worktrees/test_git.py

…hared.py, codex.py, and memory_actions.py - Fix test_remove_json_write_error in test_shared.py (99% coverage) - Create 35 tests for codex installer module (100% coverage) - Add edge case test for memory_actions.py (100% coverage)

- Consolidate completed sprints into single table (sprints 1-17) - Rename 'Testing & Recovery' to 'End-to-End Testing' for clarity - Move Testing (18) and Documentation (19) to 'Pending (Final Polish)' section - Ensure all sprints are in numerical order

- Remove sprint numbering from completed sprints table - Update POST_MVP_ENHANCEMENTS reference to docs/plans/enhancements.md - Simplify completed sprints to just Focus and Plan Reference columns

- Remove sprint numbering from completed sprints - Rename 'Post-MVP Sprints' to 'Remaining Sprints' with simplified layout - Move Session Management Tools and Subagent System to completed - Add remaining task notes for partial items (Task V2, Worktree, Autonomous) - Move Testing and Documentation to end as 'Production Ready' milestone - Simplify Milestones section (Completed vs Remaining) - Clean up Parallel Tracks and Recommendations sections

…files - Add assertions for unused result variables where testing behavior was incomplete - Remove dead code (unused variables like call_count, original_shutdown) - Remove assignments where only side effects matter (e.g., GobbyRunner constructor) - Add proper assertions for spawn results, connect_all results, and cleanup counts

- Fix E741 ambiguous variable names (l -> line) - Fix E402 imports not at top of file (move imports) - Fix B007 unused loop control variables (_index, _name)

Renamed 'f' to 'file_path' in sessions.py to avoid shadowing the file handle variable from earlier in the file, which was causing a mypy type mismatch error.

coderabbitai · 2026-01-08T16:01:56Z

Caution

Review failed

The pull request is closed.

Note

`.coderabbit.yaml` has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)

Validation error: Unrecognized key(s) in object: 'tools', 'ignore'

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

📝 Walkthrough

Walkthrough

Adds an autonomous execution subsystem (progress tracking, stop-signal registry, stuck detection) and wires it into workflow actions, HookManager, HTTP/WebSocket, and servers; introduces DB migrations and task_type column rename; renames CLI workflow commands to plural, consolidates git-hook installation, adds agent-mode external validation and diff summarization, and large test and docs updates.

Changes

Cohort / File(s)	Summary
Autonomous core `src/gobby/autonomous/__init__.py`, `src/gobby/autonomous/progress_tracker.py`, `src/gobby/autonomous/stop_registry.py`, `src/gobby/autonomous/stuck_detector.py`	New package: progress tracking (event recording, stagnation detection), thread-safe stop-signal registry, task-selection history and multi-layer stuck detection with DB persistence and public APIs.
Workflow integrations `src/gobby/workflows/autonomous_actions.py`, `src/gobby/workflows/stop_signal_actions.py`, `src/gobby/workflows/actions.py`, `src/gobby/workflows/evaluator.py`	New workflow actions and evaluator wiring for progress tracking, recording progress/task selections, detecting loops/stuck, and stop-signal handling; ActionExecutor now accepts stop_registry/progress_tracker/stuck_detector/websocket_server.
Server & realtime API `src/gobby/servers/websocket.py`, `src/gobby/servers/http.py`, `src/gobby/servers/routes/sessions.py`, `src/gobby/servers/routes/dependencies.py`, `src/gobby/servers/routes/mcp.py`	WebSocket supports stop_request messages and broadcasts autonomous events; HTTP startup wires stop_registry into HookManager; new session endpoints to signal/query/clear stop signals; DI for metrics_manager and optional tool metrics enrichment on MCP endpoints.
HookManager & executor wiring `src/gobby/hooks/hook_manager.py`, `src/gobby/workflows/actions.py`	HookManager initializes autonomous components and passes them into ActionExecutor; ActionExecutor registers and implements stop/autonomous actions and non-blocking WebSocket broadcasts.
Database schema & storage `src/gobby/storage/migrations.py`, `src/gobby/storage/tasks.py`, `src/gobby/storage/database.py`, `src/gobby/storage/sessions.py`, `src/gobby/storage/*`	New migrations for session_stop_signals, loop_progress, task_selection_history; task column renamed to task_type usage; safe_update helper; storage methods refactored to use safe_update; added nosec annotations.
Agent runner & task registry `src/gobby/agents/runner.py`, `src/gobby/mcp_proxy/registries.py`, `src/gobby/mcp_proxy/tools/tasks.py`	prepare_run now validates/loads workflows before creating child sessions and initializes workflow state conditionally; create_task_registry and task tools wired to accept optional agent_runner for external/agent-mode validation.
External validation & commit helpers `src/gobby/tasks/external_validator.py`, `src/gobby/tasks/commits.py`, `src/gobby/config/tasks.py`	External validation supports "llm" and new "agent" modes via AgentRunner; added is_doc_only_diff and summarize_diff_for_validation to pre-process diffs; TaskValidationConfig gets external_validator_mode field.
Handoff / session summaries `src/gobby/cli/sessions.py`, `src/gobby/mcp_proxy/tools/session_messages.py`	New create-handoff CLI/tool with compact/full modes, LLM-based full summaries, file/db persistence, transcript formatting helper and expanded options for output and paths.
CLI and docs `src/gobby/cli/workflows.py`, `src/gobby/cli/__init__.py`, `docs/`, `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, `ROADMAP.md`, `docs/examples/`	Renamed workflow CLI group to `workflows`; updated many docs/examples to pluralized commands; removed CLI git-hooks module and consolidated hook install instructions; added GEMINI.md and large roadmap/docs edits.
MCP metrics & tools `src/gobby/mcp_proxy/metrics.py`, `src/gobby/mcp_proxy/tools/metrics.py`, `src/gobby/servers/routes/mcp.py`	Added ToolMetricsManager.get_failing_tools; reset_metrics accepts tool_name; MCP tools expose get_failing_tools and reset_tool_metrics; list_all_mcp_tools can include metrics enrichment.
Tests added/expanded `tests/**` (many files)	Large additions/expansions of unit and integration tests covering adapters (Codex/Gemini), spawners (Embedded/Headless/Windows), agent runner/registry, spawn helpers, TTY config, autonomous package scaffolding, and more.
Type hints, linting, small refactors many `src/gobby/*` files	Numerous typing refinements, signature narrowings, minor refactors, and linter/nosec annotations across adapters, sessions, storage, utils, and workflows — mostly non-functional.
Metadata & utilities `.gobby/tasks_meta.json`, `scripts/cleanup_skills.py`, `.gitignore`, `pyproject.toml`	Metadata timestamp/hash bump; new cleanup_skills script; .gitignore updated to ignore .scripts/; pyproject updated with tooling and dev deps.

Sequence Diagram(s)

sequenceDiagram
    participant Session
    participant ProgressTracker
    participant DB
    participant StuckDetector
    participant WebSocket

    Session->>ProgressTracker: record_tool_call(session_id, tool_name, result)
    ProgressTracker->>DB: INSERT loop_progress event
    ProgressTracker->>ProgressTracker: infer is_high_value
    ProgressTracker->>DB: query recent events (get_summary)
    ProgressTracker->>StuckDetector: provide summary / recent events
    StuckDetector->>DB: read task_selection_history / loop patterns
    alt stuck_detected
        StuckDetector->>DB: (optional) persist detection result
        StuckDetector->>WebSocket: broadcast_autonomous_event(stuck_detected, session_id)
    end

sequenceDiagram
    participant Client
    participant WebSocket
    participant StopRegistry
    participant DB
    participant WorkflowExecutor

    Client->>WebSocket: send stop_request(session_id, reason, source)
    WebSocket->>StopRegistry: signal_stop(session_id, source, reason)
    StopRegistry->>DB: upsert session_stop_signals
    StopRegistry-->>WebSocket: return StopSignal
    WebSocket-->>Client: stop_response
    WebSocket->>WebSocket: broadcast_autonomous_event(stop_requested, session_id)
    Note over WorkflowExecutor,StopRegistry: During workflow evaluation
    WorkflowExecutor->>StopRegistry: has_pending_signal(session_id)
    alt pending
        StopRegistry-->>WorkflowExecutor: True
        WorkflowExecutor->>StopRegistry: acknowledge(session_id)
        WorkflowExecutor->>WorkflowExecutor: inject stop handling / graceful shutdown
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

v0.1.1 #2 — appears closely related: overlapping additions of the autonomous subsystem (progress_tracker/stop_registry/stuck_detector), workflow CLI rename to "workflows", DB migrations, and edits in agents/runner.py and workflows/actions.py.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	Title 'v0.1.1' is a version identifier, which is vague and does not meaningfully describe the primary changes in the changeset.	Use a more descriptive title that highlights the main purpose of the changes, such as 'Release v0.1.1: database schema refactor and comprehensive type safety improvements' or similar.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 93.69% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3286222 and 2ddc509.

📒 Files selected for processing (11)

.gobby/tasks.jsonl
.gobby/tasks_meta.json
GEMINI.md
ROADMAP.md
src/gobby/servers/routes/mcp.py
tests/cli/test_cli_daemon.py
tests/llm/test_resolver.py
tests/utils/test_project_context.py
tests/workflows/test_actions_coverage.py
tests/workflows/test_session_actions.py
tests/workflows/test_webhook_condition.py

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR consolidates changes to bring the main branch up to date with v0.1.1. The changes include significant additions to the testing infrastructure for agents, spawners, and related functionality, as well as a configuration update for the tasks registry initialization.

Key changes:

Added comprehensive test suites for terminal spawners (cross-platform, macOS, Linux, Windows)
Added tests for agent runner, registry, and spawn functionality
Updated MCP proxy registries to pass agent_runner parameter

Reviewed changes

Copilot reviewed 63 out of 160 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/autonomous/init.py	Added module docstring for autonomous session management tests
tests/agents/test_tty_config.py	New comprehensive test suite for terminal configuration functionality
tests/agents/test_spawners.py	New comprehensive test suite for terminal spawner implementations
tests/agents/test_spawn.py	Extended tests for spawn functionality including prompt file management and terminal preparation
tests/agents/test_runner.py	Added tests for agent runner methods including get_run, list_runs, cancel_run, and workflow handling
tests/agents/test_registry.py	New test suite for running agent registry functionality
tests/agents/spawners/test_windows_spawner.py	New comprehensive test suite for Windows terminal spawners
tests/agents/spawners/test_embedded_spawner.py	New comprehensive test suite for embedded PTY spawner
tests/agents/spawners/init.py	Added module docstring for spawner tests
src/gobby/mcp_proxy/registries.py	Added agent_runner parameter to tasks registry initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/gobby/mcp_proxy/tools/tasks.py (1)

785-815: Schema docs overstate override_justification requirement vs implementation

The input schema descriptions say:

skip_validation: “Provide override_justification explaining why.”

override_justification: “Required when skip_validation=True or no_commit_needed=True.”

no_commit_needed: “Requires override_justification.”

In code, a hard requirement is enforced only when:

requires_commit_check is True,

there are no linked commits,

and no_commit_needed is True.

skip_validation=True is not currently enforced to require override_justification; the close can proceed with override_justification=None.

Either:

Enforce the documented requirement for skip_validation (returning a structured error when it’s True and justification is missing), or

Loosen the schema text to reflect actual behavior to avoid confusing agents/tools using this schema.

🤖 Fix all issues with AI agents

In @src/gobby/autonomous/progress_tracker.py:
- Around line 437-445: The comprehension building ProgressEvent objects in
ProgressTracker (the list returned where ProgressEvent(...,
details=eval(row["details"]) if row["details"] else {}, ...)) is using eval
which is unsafe; replace eval with a safe deserializer such as
ast.literal_eval(row["details"]) or json.loads(row["details"]) depending on how
details were serialized, and update any writers to use json.dumps when storing
JSON; ensure the change is applied to the details field in the ProgressEvent
construction and any related read/write helpers so stored details are
consistently serialized/deserialized without executing code.

In @src/gobby/autonomous/stop_registry.py:
- Around line 1-8: The file needs to be reformatted to satisfy ruff; run the
formatter against the module (e.g., run "ruff format
src/gobby/autonomous/stop_registry.py" or run ruff format for the repo) to apply
the required whitespace and style changes so the module-level docstring and file
formatting comply with CI checks; commit the reformatted file (stop_registry.py)
and push.
- Around line 245-269: The cutoff calculation in cleanup_stale is wrong because
it sets cutoff to midnight and then subtracts hours; change it to compute a true
"now minus max_age_hours" cutoff: in cleanup_stale compute cutoff =
datetime.now(UTC) - timedelta(hours=max_age_hours) (add timedelta import) and
update the SQL to compare acknowledged_at directly against that cutoff (e.g.,
WHERE datetime(acknowledged_at) < datetime(?) and pass cutoff.isoformat()),
keeping the rest of the deletion logic and logging intact.

In @src/gobby/autonomous/stuck_detector.py:
- Around line 1-7: The file src/gobby/autonomous/stuck_detector.py fails CI
formatting; run the formatter (ruff format
src/gobby/autonomous/stuck_detector.py) or apply ruff-compatible formatting to
the module (ensure the top-level docstring and surrounding whitespace follow
ruff/PEP8 rules) and commit the formatted file so the pipeline passes.
- Around line 130-136: The code is serializing `context` with `str(context)`
which emits Python-style single-quoted strings that break `json.loads()`;
replace `str(context) if context else None` with `json.dumps(context) if context
is not None else None`, add `import json` at the top, and optionally wrap
`json.dumps` in a try/except to fall back to `str(context)` if `context` is not
JSON-serializable; keep the surrounding tuple (session_id, task_id,
now.isoformat(), ...) unchanged so downstream
`json.loads()`/`ast.literal_eval()` can reliably parse the stored value.

In @src/gobby/cli/sessions.py:
- Around line 610-611: The loop variable `f` in the `for f in files_written:`
loop shadows the file handle `f` used earlier in the `with open(path) as f:`
block, causing a mypy type error; rename the loop variable (e.g., `filename` or
`written_path`) and update the `click.echo` call to use the new name
(`click.echo(f"  File: {filename}")`) so the file handle symbol is no longer
shadowed.

In @src/gobby/tasks/external_validator.py:
- Around line 188-199: AgentConfig is missing required session context fields so
agent_runner.run() fails during prepare_run; update the external validator to
supply parent_session_id, project_id, and machine_id (e.g., from a
session_context passed into this function) when constructing AgentConfig, or
implement a separate path such as a validate_only mode that skips session
creation and bypasses prepare_run; specifically, modify where AgentConfig is
built in external_validator.py and ensure either AgentConfig contains
parent_session_id/project_id/machine_id or that agent_runner.run() is invoked
via a validation-only execution that does not require prepare_run.

In @tests/adapters/test_base.py:
- Around line 475-609: Add the pytest integration marker to the
TestAdapterIntegration test class: annotate the class TestAdapterIntegration
with @pytest.mark.integration and ensure pytest is imported in the test module
if not already; this will mark all methods (test_session_lifecycle_simulation,
test_tool_blocking_scenario, test_context_injection_scenario) as integration
tests.

In @tests/adapters/test_codex.py:
- Around line 1507-1536: Add the pytest integration marker to the
TestCodexAdapterEventMapping test class by decorating the class with
@pytest.mark.integration so all its tests are categorized as integration tests;
ensure pytest is imported in the test module if not already present. Target the
TestCodexAdapterEventMapping class (and its methods
test_event_map_contains_all_supported_events, test_tool_item_types_complete,
test_session_tracking_events_complete) when applying the decorator.

In @tests/adapters/test_gemini.py:
- Around line 885-1000: The TestIntegration class lacks the integration marker;
add the pytest integration marker by decorating the TestIntegration class with
@pytest.mark.integration (ensure pytest is imported if not already) so all
methods (test_session_lifecycle_roundtrip, test_tool_execution_roundtrip,
test_tool_denied_by_workflow) are treated as integration tests; place the
decorator immediately above the class definition referencing the TestIntegration
symbol.

In @tests/agents/spawners/test_headless_spawner.py:
- Around line 1017-1027: The test test_spawn_with_unicode_in_args should not
accept an unrelated ASCII fallback; remove the fallback check and assert only
that the Unicode substring "\u4e16\u754c" appears in the captured stdout after
calling HeadlessSpawner.spawn (check result.success and
result.process.communicate() as currently done), or alternatively make the test
explicitly skip/mark xfail on environments that cannot support Unicode; update
the assertion to only verify "\u4e16\u754c" in stdout (and keep the existing
result.success check) so the test fails when Unicode is lost rather than
silently passing.

In @tests/agents/test_spawners.py:
- Around line 493-509: The test test_spawn_disables_destroy_unattached should
not assert a non-existent pid because TmuxSpawner.spawn() returns SpawnResult
with pid=None on success; update the test to remove or change the pid assertion
(e.g., assert result.pid is None or simply omit the pid check), keep asserting
result.success is True, and ensure the existing assertion(s) verifying the
chained tmux set-option behavior remain (use the mock_popen/mock_process
expectations to validate set-option calls if needed).

🧹 Nitpick comments (36)

tests/agents/test_spawners.py (2)

1622-1670: Duplicate EmbeddedPTYResult.close() tests across modules

This file defines TestEmbeddedPTYResultClose, while tests/agents/spawners/test_embedded_spawner.py also has an almost identical TestEmbeddedPTYResult suite. That duplication increases maintenance cost without adding much value.

Consider consolidating the EmbeddedPTYResult.close() tests into a single module (likely the more focused embedded_spawner test file) and removing one copy here.

680-861: Overlap between EmbeddedSpawner tests here and in test_embedded_spawner.py

TestEmbeddedSpawner and TestEmbeddedSpawnerUnix in this file cover many of the same behaviors (Windows unsupported, PTY errors, parent-path handling, env vars, Unix PTY integration) as the dedicated tests/agents/spawners/test_embedded_spawner.py suite.

The behavior under test is valid, but you now have two places to update when EmbeddedSpawner changes. Consider keeping one “unit-style” suite (likely test_embedded_spawner.py) and trimming redundant cases here, or narrowing this file to higher-level integration/security checks only.

tests/agents/spawners/test_embedded_spawner.py (2)

771-779: Consider avoiding hard-coding MAX_ENV_PROMPT_LENGTH in multiple test modules

This file asserts MAX_ENV_PROMPT_LENGTH == 4096, and tests/agents/test_spawn.py does the same. The check is fine as a specification of current behavior, but duplicating the exact value across modules means two places to update if the threshold ever changes.

If you want to keep the value check, consider centralizing it in a single test file or factoring a small helper to assert it once.

560-569: Generic exception test adds little beyond more specific fork/openpty tests

test_spawn_error_handling_comprehensive covers the “generic Exception during fork” path on top of already-specific tests for fork and openpty errors. It’s valid, but it’s largely overlapping with test_spawn_fork_error and test_spawn_closes_fds_on_exception.

If test runtime becomes a concern, this one is a good candidate to drop or narrow (e.g., assert only that no exception leaks out).

tests/agents/spawners/test_windows_spawner.py (2)

369-490: CmdSpawner tests rely on outer /c and inner /k semantics

The CmdSpawner tests assert both /c (outer cmd /c start ...) and /k (keeping the inner window open). This matches the typical pattern of cmd /c start "" cmd /k ... and aligns with the described implementation.

Just be aware that if the spawn implementation is ever simplified to a single-level cmd /k ... call, these tests will need updating; conversely, they currently lock in the two-level cmd /c start structure.

1172-1236: Security/injection tests are sanity checks rather than strict guarantees

The injection-prevention tests for CmdSpawner, PowerShellSpawner, and WSLSpawner currently only assert that the dangerous payload does not appear as a standalone token (split() / split("&&")) in the final script/command. That’s a useful smoke test and matches current quoting behavior, but it’s not a full guarantee against all shell-injection variants.

If you want stronger guarantees later, you might tighten these assertions to inspect the exact command string structure or add negative cases around specific quoting patterns, but this is optional for now.

tests/agents/test_spawn.py (4)

1111-1226: Prompt-file tests correctly exercise behavior but are tightly coupled to private globals

TestPromptFileManagement and TestCreatePromptFileExceptionHandling validate real behavior of _create_prompt_file, _cleanup_all_prompt_files, and _prompt_files_to_cleanup (content, directory, permissions, cleanup, atexit behavior, and error handling). That’s valuable coverage.

The trade-off is heavy reliance on private module state (_prompt_files_to_cleanup and _atexit_registered). Any internal refactor of how prompt files are tracked will require updating these tests. If you want to reduce brittleness later, consider:

Adding a small public helper around prompt-file lifecycle and asserting through that, or

Narrowing tests to “black-box” effects (file exists with correct mode, then is cleaned up), without asserting on the internal set directly.

For now, behavior is consistent with the current implementation.

1289-1407: TerminalSpawner preference and AUTO-behavior tests match intended API

The new TestTerminalSpawnerMethods and parts of TestTerminalSpawnerSpawnAgent/TestTerminalSpawnerAutoDetect correctly specify:

get_available_terminals() returns only spawners whose is_available() is true.

get_preferred_terminal() respects user preferences from get_tty_config().get_preferences() and skips unknown/unsupported terminals.

spawn(..., terminal=TerminalType.AUTO) uses get_preferred_terminal() and fails clearly when nothing is available.

String terminal names (e.g. "tmux") are converted to TerminalType before dispatch.

The patching strategy is a bit intricate (mixing class-level and instance-level mocks). If you refactor TerminalSpawner internals, consider simplifying these tests to patch spawner._spawners[...] directly (as done in other tests) to keep them robust against constructor changes.

1747-1782: PreparedSpawn dataclass tests are reasonable but somewhat redundant

TestPreparedSpawnDataclass mainly checks that field assignment on PreparedSpawn works and that workflow_name can be None. This is fine but fairly minimal for a dataclass and somewhat overlaps with the prepare_terminal_spawn tests that already assert end-to-end values.

If you later want to trim test volume, this small dataclass-only test is a low-risk candidate to remove, relying instead on the higher-level prepare_terminal_spawn coverage.

1785-1842: MAX_ENV_PROMPT_LENGTH tests duplicate constant-value checks

You now assert MAX_ENV_PROMPT_LENGTH == 4096 in both this file and tests/agents/spawners/test_embedded_spawner.py. It’s valid to lock in that specification, but duplicating the exact value increases maintenance overhead on any future threshold change.

Consider centralizing the constant-value assertion in one place and letting the rest of the suite focus on behavioral boundaries (which you already test thoroughly via spawn_agent and prepare_terminal_spawn).
tests/agents/spawners/test_headless_spawner.py (4)
42-45: Consider importing the constant for comparison.

The test hardcodes 4096, which could break silently if the production constant changes. Consider importing MAX_ENV_PROMPT_LENGTH directly for comparison to ensure the test stays in sync.
♻️ Suggested improvement
+from gobby.agents.spawn import MAX_ENV_PROMPT_LENGTH
+
 def test_max_env_prompt_length_value(self):
     """_get_spawn_utils returns correct MAX_ENV_PROMPT_LENGTH."""
     _, _, max_env_prompt_length = _get_spawn_utils()
-    assert max_env_prompt_length == 4096
+    assert max_env_prompt_length == MAX_ENV_PROMPT_LENGTH
527-540: Consider extracting env extraction helper to reduce duplication.

The pattern call_args[0][2] if len(call_args[0]) > 2 else call_args[1].get("env") is repeated across multiple tests. Consider a helper fixture or method for cleaner extraction.
♻️ Optional helper function
def _get_env_from_call_args(call_args):
    """Extract env dict from mock.call_args."""
    return call_args[0][2] if len(call_args[0]) > 2 else call_args[1].get("env")
910-912: Consider adding @pytest.mark.integration marker.

Per the coding guidelines, tests should use pytest.mark.integration for categorization. These integration tests run real processes and would benefit from the marker for selective test execution.
♻️ Suggested marker addition
 @pytest.mark.skipif(sys.platform == "win32", reason="Unix-specific tests")
+@pytest.mark.integration
 class TestHeadlessSpawnerIntegration:
     """Integration tests for HeadlessSpawner on Unix systems."""
997-1003: Test relies on implementation detail.

The test mocks subprocess.Popen to raise IndexError("Empty command"), but this assumes how Python handles empty command lists. The actual behavior may vary. Consider testing the actual behavior without mocking, or at least documenting this assumption.
src/gobby/mcp_proxy/tools/metrics.py (1)

203-233: Consider clarifying the distinction between reset_metrics and reset_tool_metrics.

Both tools call metrics_manager.reset_metrics() but with different parameter subsets:

reset_metrics: accepts project_id, server_name, tool_name

reset_tool_metrics: accepts only server_name, tool_name (described as "admin operation")

This appears intentional for different access patterns, but the distinction may not be immediately clear to users. Consider enhancing the description on Line 205 to clarify when to use each, e.g., "Admin tool to reset metrics for a specific tool across all projects."

src/gobby/mcp_proxy/tools/session_messages.py (1)

101-121: Duplicated helper function across multiple modules.

_format_turns_for_llm is duplicated in at least three locations:

src/gobby/cli/sessions.py

src/gobby/sessions/summary.py

src/gobby/mcp_proxy/tools/session_messages.py (this file)

Additionally, src/gobby/workflows/actions.py already has a wrapper that calls a shared format_turns_for_llm. This version also differs from summary.py in that it doesn't handle the "thinking" block type.

Consider consolidating into a single shared utility (e.g., in gobby.sessions.transcripts or a dedicated formatting module) and importing it where needed.

tests/agents/test_tty_config.py (1)

412-435: Consider using pytest fixtures or context managers for guaranteed cleanup.

Several tests manually call Path(f.name).unlink() at the end. If an assertion fails before cleanup, temp files may leak. Using pytest.fixture with cleanup in a finally block, or addCleanup, would be more robust.

This is a minor style improvement and doesn't block the PR.
src/gobby/cli/sessions.py (2)
27-47: Duplicate helper function.

As noted in the MCP tools file, _format_turns_for_llm is duplicated here. Consider consolidating into a shared module.

538-543: Inconsistent async runtime usage.

This function uses anyio.run() while other CLI commands in this file use asyncio.run(). Consider using asyncio.run() for consistency:
Suggested change
-            import anyio
-
-            async def _generate() -> str:
-                return await provider.generate_summary(context, prompt_template=prompt_template)
-
-            full_markdown = anyio.run(_generate)
+            full_markdown = asyncio.run(
+                provider.generate_summary(context, prompt_template=prompt_template)
+            )
src/gobby/servers/http.py (1)

318-325: Consider exposing _stop_registry as a public property or method.

The code accesses the private attribute _stop_registry with defensive hasattr checks, which is consistent with the established pattern across the codebase (also used in src/gobby/servers/routes/sessions.py). However, this direct access to private attributes across multiple components indicates architectural coupling. While the defensive checks mitigate immediate risk, consider adding a public accessor method (e.g., get_stop_registry()) to HookManager to formalize this as an intentional public API and reduce reliance on private attribute access.
src/gobby/tasks/expansion.py (1)
467-495: Consider compiling regex patterns for better performance.

The function name extraction logic is robust with good fallback handling. However, the regex patterns are compiled on every call within the loop.

For better performance, especially when processing many signatures during task expansion, consider compiling the patterns once at module or class level:
♻️ Optional refactor to improve performance

At the module level (after imports):
# Compiled regex patterns for function signature parsing
_FUNC_NAME_PATTERN = re.compile(r"(?:async\s+)?def\s+(\w+)")
_FUNC_PAREN_PATTERN = re.compile(r"(\w+)\s*\(")
Then update the extraction logic:
                     # Extract function name robustly using regex
                     # Handles: "def func_name(", "async def func_name(", "func_name("
                     func_name = None
                     # Try regex patterns first
-                    match = re.search(r"(?:async\s+)?def\s+(\w+)", sig)
+                    match = _FUNC_NAME_PATTERN.search(sig)
                     if match:
                         func_name = match.group(1)
                     else:
                         # Fallback: try to get name before first paren
-                        match = re.search(r"(\w+)\s*\(", sig)
+                        match = _FUNC_PAREN_PATTERN.search(sig)
                         if match:
                             func_name = match.group(1)
src/gobby/agents/runner.py (1)

285-369: Workflow validation-before-session creation is a solid robustness improvement

Validating and loading the workflow (and rejecting lifecycle workflows) before creating the child session, then initializing workflow state only when a definition is present, avoids orphan sessions and keeps workflow/tool filtering consistent with WorkflowStateManager. As an optional follow-up, you might also populate WorkflowState.initial_step when constructing the state so reset flows can rely on it without inferring from step, but the current behavior is functionally sound.

src/gobby/hooks/hook_manager.py (1)

36-39: Autonomous stop/progress/stuck wiring into HookManager looks consistent

Initializing StopRegistry, ProgressTracker, and StuckDetector on the shared LocalDatabase, threading websocket_server from the broadcaster, passing all of these into ActionExecutor, and registering the stop_registry with the workflow evaluator line up cleanly with the new migrations and autonomous components. One optional cleanup would be to expose stop_registry via a HookManager accessor rather than relying on the _stop_registry private attribute (used by the HTTP routes), but the current wiring is functionally correct.

Also applies to: 191-197, 249-253, 268-283

src/gobby/tasks/commits.py (1)

148-245: Diff summarization respects char budget; consider guarding header-only overflow

The summarization logic (per-file stats header, per-file truncation, final hard cap) gives the validator a good overview while enforcing max_chars. One edge case: if there are so many files that the header alone exceeds max_chars, remaining_chars becomes <= 0 and chars_per_file turns negative, so file hunks are truncated in a slightly arbitrary way before the final global truncation. Not a correctness bug, but you might add a guard like if remaining_chars <= 0: return "".join(summary_parts)[:max_chars] + "...truncated..." to skip per-file bodies in that scenario.

src/gobby/servers/websocket.py (1)

74-80: Consider tightening stop_registry typing for better static safety

stop_registry is currently typed as Any; given it’s always expected to be a StopRegistry, consider adding a TYPE_CHECKING import and annotating this parameter/attribute accordingly to catch integration mistakes at mypy time. This matches how other core dependencies are typed elsewhere.

Also applies to: 89-95

src/gobby/mcp_proxy/tools/tasks.py (1)

105-113: Assumes config.get_gobby_tasks_config().validation is always populated

create_task_registry pulls validation_config = config.get_gobby_tasks_config().validation and immediately reads auto_generate_on_create/auto_generate_on_expand from it. This is fine if validation is guaranteed to be a TaskValidationConfig (via defaults), but will raise if it can be None in misconfigured setups.

If validation is not strictly required in DaemonConfig, consider either:

Providing a non-optional default in the config model, or

Guarding here and falling back to the hardcoded True defaults when validation_config is falsy.

Also applies to: 128-138

src/gobby/servers/routes/mcp.py (1)

306-327: Metrics enrichment logic is sound; consider optional server-level filtering

The enrichment block correctly:

Fetches all per-tool metrics for the resolved project,

Indexes them by (server_name, tool_name),

Populates call_count, success_rate, and avg_latency_ms on each tool, with sensible 0/None defaults when missing.

If metrics volume grows large and server_filter is commonly used, you might want to pass server_name=server_filter into metrics_manager.get_metrics in that case to avoid scanning all metrics, but the current design is functionally correct.
src/gobby/workflows/actions.py (2)
102-121: New autonomous/stop-signal dependencies are wired consistently

The additions of stop_registry, progress_tracker, stuck_detector, and websocket_server to ActionExecutor.__init__ and their registrations in _register_defaults are coherent and follow the existing “store dependency on self and wrap via handler” pattern. Given the rest of this class already uses Any | None for injected services, keeping that convention here is acceptable, though you could tighten types via TYPE_CHECKING imports later if desired.

Also applies to: 135-138, 243-254

1117-1148: Avoid calling task.exception() on cancelled broadcast tasks

In _broadcast_autonomous_event, the done callback currently does:
task.add_done_callback(
    lambda t: logger.debug(f"Broadcast {event} failed: {t.exception()}")
    if t.exception()
    else None
)
If the broadcast task is cancelled (e.g., during shutdown), t.exception() will itself raise CancelledError inside the callback, which can surface as “Task exception was never retrieved”.

Safer pattern:

Check t.cancelled() first, and only call t.exception() when it is not cancelled.
Proposed tweak to the callback
-        task.add_done_callback(
-            lambda t: logger.debug(f"Broadcast {event} failed: {t.exception()}")
-            if t.exception()
-            else None
-        )
+        def _on_done(t: asyncio.Task) -> None:
+            if t.cancelled():
+                return
+            exc = t.exception()
+            if exc:
+                logger.debug(f"Broadcast {event} failed: {exc}")
+
+        task.add_done_callback(_on_done)
src/gobby/workflows/stop_signal_actions.py (1)

17-83: Stop-signal helpers align well with StopRegistry API

check_stop_signal, has_stop_signal, request_stop, and clear_stop_signal all match the semantics of StopRegistry (using get_signal(), has_pending_signal(), signal_stop(), and clear() respectively), handle a missing registry defensively, and return structured dicts suitable for workflow engines.

One minor assumption is that state.variables is always a dict; given WorkflowState sets a dict default, that’s reasonable, but if other code ever assigns None into variables, you may want a quick if state.variables is None: state.variables = {} before writing _stop_signal_* keys.

Also applies to: 86-163
tests/agents/test_registry.py (2)
925-975: Concurrent add/remove test could assert stronger invariants

test_concurrent_add_and_remove currently only checks that no exceptions were recorded, but doesn’t assert anything about the final registry state (there’s just a comment about expected counts).

Given the add/remove sets don’t overlap, you could safely assert something like registry.count() == 100 afterwards to catch subtle race issues in the internal bookkeeping, without making the test brittle.

1265-1283: test_cleanup_stale_exact_boundary doesn’t currently validate behavior

This test sets an agent’s started_at to exactly one hour ago and then asserts:
assert len(removed) >= 0
which is always true and doesn’t check any real behavior. Either:

Decide what the intended contract is at the boundary (e.g., age > max_age vs >=), and assert accordingly, or

Drop the test if the implementation is allowed to be fuzzy there.

As-is, it’s effectively just coverage padding.
src/gobby/tasks/external_validator.py (1)
81-97: Mode access can use direct attribute instead of getattr.

Since TaskValidationConfig explicitly defines external_validator_mode with a typed Literal["llm", "agent"] default, you can access it directly rather than using getattr with a fallback.
♻️ Suggested simplification
-    mode = getattr(config, "external_validator_mode", "llm")
+    mode = config.external_validator_mode
src/gobby/workflows/evaluator.py (1)
132-133: Consider using TYPE_CHECKING import for StopRegistry type.

_stop_registry is typed as Any, but since you already use TYPE_CHECKING for WebhookExecutor, you could also import StopRegistry for better type safety.
♻️ Suggested improvement
 if TYPE_CHECKING:
     from .webhook_executor import WebhookExecutor
+    from gobby.autonomous.stop_registry import StopRegistry

 # ...

     def __init__(self) -> None:
         """Initialize the condition evaluator."""
         self._plugin_conditions: dict[str, Any] = {}
         self._task_manager: Any = None
-        self._stop_registry: Any = None
+        self._stop_registry: StopRegistry | None = None
         self._webhook_executor: WebhookExecutor | None = None
src/gobby/workflows/autonomous_actions.py (1)
118-124: Consider logging a warning for unrecognized progress types.

Silent fallback to TOOL_CALL for invalid enum values may hide bugs. A debug log would help troubleshoot issues.
♻️ Add debug logging for fallback
     if isinstance(progress_type, str):
         try:
             progress_type = ProgressType(progress_type)
         except ValueError:
+            logger.debug(f"Unknown progress type '{progress_type}', falling back to TOOL_CALL")
             progress_type = ProgressType.TOOL_CALL
src/gobby/autonomous/stuck_detector.py (1)
245-252: Tool pattern key generation is safe but could be more robust.

The key generation works correctly since ProgressEvent.details defaults to an empty dict. However, the key format "tool_name:[args_list]" creates string representations of lists which may vary. Consider using a more stable key format.
♻️ Consider tuple-based key for stability
             if event.tool_name:
                 # Create a key from tool name and key args
-                key = f"{event.tool_name}:{event.details.get('tool_args_keys', [])}"
+                args_keys = tuple(sorted(event.details.get('tool_args_keys', [])))
+                key = f"{event.tool_name}:{args_keys}"
                 tool_counts[key] = tool_counts.get(key, 0) + 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e44eaa0 and 49e9cbc.

📒 Files selected for processing (160)

.gobby/tasks.jsonl
.gobby/tasks_meta.json
AGENTS.md
CLAUDE.md
GEMINI.md
ROADMAP.md
docs/architecture/app_decomposition.md
docs/architecture/tasks_decomposition.md
docs/examples/workflows/README.md
docs/examples/workflows/agent-delegation.yaml
docs/examples/workflows/parallel-worktree-agents.yaml
docs/guides/cli-commands.md
docs/guides/http-endpoints.md
docs/guides/mcp-tools.md
docs/guides/tasks.md
docs/guides/workflow-actions.md
docs/guides/workflows.md
docs/old/automation-summary.md
docs/old/config-settings-audit.md
docs/old/webhook-action-schema.md
docs/plans/completed/TASKS.md
docs/plans/completed/WORKFLOWS.md
docs/plans/enhancements.md
docs/plans/todo-list.md
docs/workflow-actions.md
src/gobby/agents/runner.py
src/gobby/autonomous/__init__.py
src/gobby/autonomous/progress_tracker.py
src/gobby/autonomous/stop_registry.py
src/gobby/autonomous/stuck_detector.py
src/gobby/cli/__init__.py
src/gobby/cli/sessions.py
src/gobby/cli/tasks/hooks.py
src/gobby/cli/tasks/main.py
src/gobby/cli/workflows.py
src/gobby/config/tasks.py
src/gobby/hooks/hook_manager.py
src/gobby/mcp_proxy/metrics.py
src/gobby/mcp_proxy/registries.py
src/gobby/mcp_proxy/tools/metrics.py
src/gobby/mcp_proxy/tools/session_messages.py
src/gobby/mcp_proxy/tools/tasks.py
src/gobby/mcp_proxy/tools/worktrees.py
src/gobby/servers/http.py
src/gobby/servers/routes/dependencies.py
src/gobby/servers/routes/mcp.py
src/gobby/servers/routes/sessions.py
src/gobby/servers/websocket.py
src/gobby/storage/migrations.py
src/gobby/storage/tasks.py
src/gobby/tasks/commits.py
src/gobby/tasks/expansion.py
src/gobby/tasks/external_validator.py
src/gobby/workflows/actions.py
src/gobby/workflows/autonomous_actions.py
src/gobby/workflows/evaluator.py
src/gobby/workflows/loader.py
src/gobby/workflows/stop_signal_actions.py
tests/adapters/test_base.py
tests/adapters/test_codex.py
tests/adapters/test_gemini.py
tests/agents/spawners/__init__.py
tests/agents/spawners/test_embedded_spawner.py
tests/agents/spawners/test_headless_spawner.py
tests/agents/spawners/test_windows_spawner.py
tests/agents/test_registry.py
tests/agents/test_runner.py
tests/agents/test_spawn.py
tests/agents/test_spawners.py
tests/agents/test_tty_config.py
tests/autonomous/__init__.py
tests/autonomous/test_autonomous.py
tests/cli/installers/__init__.py
tests/cli/installers/test_antigravity.py
tests/cli/installers/test_claude.py
tests/cli/installers/test_codex_installer.py
tests/cli/installers/test_gemini_installer.py
tests/cli/installers/test_git_hooks_installer.py
tests/cli/installers/test_shared.py
tests/cli/test_cli_agents.py
tests/cli/test_cli_daemon.py
tests/cli/test_cli_extensions.py
tests/cli/test_cli_init.py
tests/cli/test_cli_install.py
tests/cli/test_tasks_cli.py
tests/config/test_app_config.py
tests/hooks/test_event_handlers.py
tests/hooks/test_hooks_manager.py
tests/hooks/test_plugins.py
tests/integration/test_task_expansion_flow.py
tests/llm/test_codex_executor.py
tests/llm/test_llm_claude.py
tests/llm/test_resolver.py
tests/mcp_proxy/services/__init__.py
tests/mcp_proxy/services/test_system.py
tests/mcp_proxy/test_actions.py
tests/mcp_proxy/test_internal_registries.py
tests/mcp_proxy/test_lazy.py
tests/mcp_proxy/test_manager_coverage.py
tests/mcp_proxy/test_mcp_tools.py
tests/mcp_proxy/test_mcp_tools_session_messages.py
tests/mcp_proxy/test_validation_integration.py
tests/mcp_proxy/test_validation_mcp_tools.py
tests/mcp_proxy/tools/test_agents.py
tests/mcp_proxy/tools/test_session_messages_coverage.py
tests/mcp_proxy/tools/test_task_expansion.py
tests/mcp_proxy/tools/test_tasks_coverage.py
tests/mcp_proxy/transports/__init__.py
tests/mcp_proxy/transports/test_base.py
tests/memory/test_extractor.py
tests/memory/test_manager.py
tests/servers/test_http_coverage.py
tests/servers/test_http_server.py
tests/servers/test_mcp_routes.py
tests/servers/test_sessions_routes.py
tests/sessions/test_analyzer.py
tests/sessions/test_sessions_processor_unit.py
tests/sessions/test_sessions_transcripts_claude.py
tests/sessions/test_sessions_transcripts_codex.py
tests/sessions/test_sessions_transcripts_gemini.py
tests/sessions/test_summary.py
tests/sessions/test_transcript_parsers.py
tests/storage/test_labels.py
tests/storage/test_storage_agents.py
tests/storage/test_storage_mcp.py
tests/storage/test_storage_memories.py
tests/storage/test_storage_sessions.py
tests/storage/test_storage_tasks.py
tests/sync/test_skill_sync.py
tests/tasks/test_auto_decompose.py
tests/tasks/test_commits.py
tests/tasks/test_context.py
tests/tasks/test_expansion_coverage.py
tests/tasks/test_external_validator.py
tests/tasks/test_research.py
tests/tasks/test_sync_tasks.py
tests/tasks/test_task_filters.py
tests/tasks/test_validation.py
tests/test_runner.py
tests/utils/test_project_context.py
tests/utils/test_utils_git.py
tests/utils/test_utils_metrics.py
tests/utils/test_utils_project_init.py
tests/workflows/test_actions_coverage.py
tests/workflows/test_artifact_actions.py
tests/workflows/test_context_actions.py
tests/workflows/test_engine_coverage.py
tests/workflows/test_git_utils.py
tests/workflows/test_hooks.py
tests/workflows/test_llm_actions.py
tests/workflows/test_loader.py
tests/workflows/test_memory_actions.py
tests/workflows/test_session_actions.py
tests/workflows/test_summary_actions.py
tests/workflows/test_task_enforcement.py
tests/workflows/test_todo_actions.py
tests/workflows/test_webhook_condition.py
tests/workflows/test_workflow_actions.py
tests/workflows/test_workflow_mcp_actions.py
tests/worktrees/test_git.py

💤 Files with no reviewable changes (5)

docs/workflow-actions.md
docs/architecture/tasks_decomposition.md
src/gobby/cli/tasks/main.py
docs/architecture/app_decomposition.md
src/gobby/cli/tasks/hooks.py

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

⚙️ CodeRabbit configuration file

src/**/*.py: - Check for proper type hints (project uses strict mypy)

Ensure async functions are properly awaited

Watch for potential security issues (SQL injection, command injection)

Verify proper error handling

Files:

src/gobby/servers/http.py
src/gobby/config/tasks.py
src/gobby/servers/routes/dependencies.py
src/gobby/autonomous/stop_registry.py
src/gobby/workflows/actions.py
src/gobby/mcp_proxy/registries.py
src/gobby/workflows/autonomous_actions.py
src/gobby/workflows/stop_signal_actions.py
src/gobby/agents/runner.py
src/gobby/mcp_proxy/tools/tasks.py
src/gobby/mcp_proxy/tools/session_messages.py
src/gobby/workflows/evaluator.py
src/gobby/autonomous/stuck_detector.py
src/gobby/tasks/commits.py
src/gobby/cli/sessions.py
src/gobby/workflows/loader.py
src/gobby/servers/routes/sessions.py
src/gobby/storage/tasks.py
src/gobby/servers/routes/mcp.py
src/gobby/storage/migrations.py
src/gobby/hooks/hook_manager.py
src/gobby/tasks/expansion.py
src/gobby/autonomous/progress_tracker.py
src/gobby/cli/workflows.py
src/gobby/mcp_proxy/tools/worktrees.py
src/gobby/autonomous/__init__.py
src/gobby/mcp_proxy/metrics.py
src/gobby/tasks/external_validator.py
src/gobby/cli/__init__.py
src/gobby/mcp_proxy/tools/metrics.py
src/gobby/servers/websocket.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use asyncio with pytest configured with asyncio_mode = 'auto' for automatic async test detection
Use pytest markers slow, integration, and e2e for test categorization

Files:

tests/autonomous/__init__.py
tests/agents/test_spawners.py
tests/agents/spawners/test_embedded_spawner.py
tests/agents/test_tty_config.py
tests/adapters/test_codex.py
tests/adapters/test_base.py
tests/agents/spawners/test_headless_spawner.py
tests/agents/test_registry.py
tests/agents/spawners/test_windows_spawner.py
tests/agents/spawners/__init__.py
tests/agents/test_spawn.py
tests/adapters/test_gemini.py
tests/agents/test_runner.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure tests are meaningful, not just coverage padding

Check for proper mocking and fixtures

Verify async tests use pytest-asyncio markers

Files:

tests/autonomous/__init__.py
tests/agents/test_spawners.py
tests/agents/spawners/test_embedded_spawner.py
tests/agents/test_tty_config.py
tests/adapters/test_codex.py
tests/adapters/test_base.py
tests/agents/spawners/test_headless_spawner.py
tests/agents/test_registry.py
tests/agents/spawners/test_windows_spawner.py
tests/agents/spawners/__init__.py
tests/agents/test_spawn.py
tests/adapters/test_gemini.py
tests/agents/test_runner.py

src/gobby/mcp_proxy/**

⚙️ CodeRabbit configuration file

src/gobby/mcp_proxy/**: - This is the MCP (Model Context Protocol) proxy layer

Check for proper tool schema validation

Ensure tools return appropriate response structures

Files:

src/gobby/mcp_proxy/registries.py
src/gobby/mcp_proxy/tools/tasks.py
src/gobby/mcp_proxy/tools/session_messages.py
src/gobby/mcp_proxy/tools/worktrees.py
src/gobby/mcp_proxy/metrics.py
src/gobby/mcp_proxy/tools/metrics.py

src/gobby/hooks/**

⚙️ CodeRabbit configuration file

src/gobby/hooks/**: - Hook handlers must be idempotent

Check for proper event handling patterns

Ensure plugins follow the HookPlugin interface

Files:

src/gobby/hooks/hook_manager.py

🧠 Learnings (6)

📚 Learning: 2025-12-17T19:24:27.134Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-17T19:24:27.134Z
Learning: Install and use bd git hooks by running `bd hooks install` to ensure DB ↔ JSONL consistency

Applied to files:

docs/guides/tasks.md
docs/plans/completed/TASKS.md

📚 Learning: 2025-12-17T19:54:30.415Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Applies to .gobby/tasks.jsonl : Include `.gobby/tasks.jsonl` in commits when completing work tracked in Gobby

Applied to files:

docs/guides/tasks.md
docs/plans/completed/TASKS.md
src/gobby/mcp_proxy/tools/tasks.py
.gobby/tasks_meta.json
src/gobby/tasks/commits.py
src/gobby/storage/tasks.py

📚 Learning: 2025-12-17T19:54:30.415Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Use Gobby's native task system for ALL task tracking instead of markdown TODOs, task lists, or other tracking methods

Applied to files:

docs/guides/tasks.md
docs/plans/completed/TASKS.md

📚 Learning: 2025-12-17T19:54:30.415Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Do NOT duplicate tracking systems; rely solely on Gobby tasks

Applied to files:

docs/guides/tasks.md
docs/plans/completed/TASKS.md

📚 Learning: 2025-12-17T19:24:40.567Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Use `gobby-*` prefix for internal server names and route them locally via InternalRegistryManager, while proxying other servers via MCPClientManager

Applied to files:

src/gobby/servers/routes/dependencies.py
src/gobby/servers/routes/mcp.py

📚 Learning: 2025-12-17T19:54:30.415Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Store AI-generated planning and design documents (PLAN.md, IMPLEMENTATION.md, ARCHITECTURE.md, DESIGN.md, etc.) in a dedicated `history/` directory

Applied to files:

ROADMAP.md

🧬 Code graph analysis (28)

tests/agents/test_spawners.py (7)

src/gobby/agents/spawners/base.py (2)

EmbeddedPTYResult (61-85)

TerminalType (22-46)

src/gobby/agents/spawners/cross_platform.py (3)

AlacrittySpawner (92-145)

KittySpawner (19-89)

TmuxSpawner (148-266)

src/gobby/agents/spawners/embedded.py (3)

EmbeddedSpawner (49-227)

spawn (57-160)

spawn_agent (162-227)

src/gobby/agents/spawners/linux.py (2)

GnomeTerminalSpawner (16-69)

KonsoleSpawner (72-125)

src/gobby/agents/spawners/macos.py (4)

GhosttySpawner (27-105)

ITermSpawner (108-190)

TerminalAppSpawner (193-262)

escape_applescript (19-24)

tests/agents/spawners/test_embedded_spawner.py (1)

spawner (33-35)

src/gobby/servers/websocket.py (1)

close (37-37)

tests/agents/spawners/test_embedded_spawner.py (3)

src/gobby/agents/spawners/base.py (1)

EmbeddedPTYResult (61-85)

src/gobby/agents/spawners/embedded.py (4)

EmbeddedSpawner (49-227)

_get_spawn_utils (30-46)

spawn (57-160)

spawn_agent (162-227)

src/gobby/agents/spawn.py (1)

build_cli_command (147-211)

src/gobby/servers/routes/dependencies.py (2)

src/gobby/servers/routes/mcp.py (1)

mcp_proxy (990-1090)

src/gobby/mcp_proxy/metrics.py (1)

ToolMetricsManager (71-620)

src/gobby/autonomous/stop_registry.py (1)

src/gobby/storage/database.py (3)

LocalDatabase (17-99)

fetchone (66-69)

fetchall (71-74)

tests/agents/test_tty_config.py (1)

src/gobby/agents/tty_config.py (9)

PlatformPreferences (39-73)

TerminalConfig (18-36)

TTYConfig (123-175)

generate_default_tty_config (205-290)

get_tty_config (297-307)

load_tty_config (178-202)

reload_tty_config (310-319)

get_terminal_config (135-160)

get_preferences (162-175)

src/gobby/workflows/autonomous_actions.py (3)

src/gobby/autonomous/progress_tracker.py (7)

ProgressTracker (87-446)

ProgressType (20-33)

clear_session (396-414)

get_summary (233-315)

is_stagnant (317-331)

record_event (124-173)

is_high_value (68-70)

src/gobby/autonomous/stuck_detector.py (5)

StuckDetector (45-371)

clear_session (306-326)

detect_task_loop (142-187)

is_stuck (274-304)

record_task_selection (99-140)

src/gobby/workflows/definitions.py (1)

WorkflowState (94-130)

src/gobby/workflows/stop_signal_actions.py (2)

src/gobby/autonomous/stop_registry.py (7)

StopRegistry (37-270)

acknowledge (156-180)

get_signal (113-142)

is_pending (32-34)

has_pending_signal (144-154)

signal_stop (60-111)

clear (182-202)

src/gobby/workflows/definitions.py (1)

WorkflowState (94-130)

src/gobby/agents/runner.py (4)

src/gobby/agents/session.py (2)

create_child_session (142-206)

ChildSessionConfig (23-48)

src/gobby/llm/executor.py (1)

AgentResult (63-94)

src/gobby/workflows/definitions.py (1)

WorkflowState (94-130)

src/gobby/workflows/state_manager.py (1)

save_state (53-97)

src/gobby/mcp_proxy/tools/tasks.py (4)

src/gobby/tasks/commits.py (2)

is_doc_only_diff (120-145)

summarize_diff_for_validation (148-245)

src/gobby/agents/runner.py (1)

AgentRunner (147-920)

src/gobby/config/tasks.py (1)

TaskValidationConfig (173-281)

src/gobby/tasks/external_validator.py (1)

run_external_validation (44-97)

src/gobby/mcp_proxy/tools/session_messages.py (3)

src/gobby/cli/sessions.py (2)

_format_turns_for_llm (27-47)

sessions (51-53)

src/gobby/workflows/actions.py (1)

_format_turns_for_llm (836-838)

src/gobby/sessions/summary.py (1)

_format_turns_for_llm (395-426)

src/gobby/workflows/evaluator.py (3)

src/gobby/workflows/definitions.py (1)

WorkflowState (94-130)

src/gobby/workflows/webhook_executor.py (1)

json_body (34-46)

src/gobby/autonomous/stop_registry.py (1)

has_pending_signal (144-154)

src/gobby/autonomous/stuck_detector.py (5)

tests/autonomous/test_autonomous.py (2)

progress_tracker (85-87)

session_id (73-81)

src/gobby/autonomous/progress_tracker.py (3)

ProgressTracker (87-446)

get_summary (233-315)

get_recent_events (416-446)

src/gobby/storage/database.py (2)

LocalDatabase (17-99)

fetchall (71-74)

tests/memory/test_manager.py (1)

db (31-36)

src/gobby/workflows/autonomous_actions.py (2)

record_task_selection (223-255)

detect_task_loop (142-175)

tests/adapters/test_codex.py (2)

src/gobby/adapters/codex.py (4)

CodexConnectionState (66-72)

CodexItem (98-105)

CodexThread (76-82)

CodexTurn (86-94)

src/gobby/hooks/events.py (3)

HookEventType (19-55)

HookResponse (111-136)

SessionSource (58-64)

tests/adapters/test_base.py (2)

src/gobby/adapters/base.py (1)

BaseAdapter (17-91)

src/gobby/hooks/events.py (4)

HookEvent (68-107)

HookEventType (19-55)

HookResponse (111-136)

SessionSource (58-64)

tests/agents/spawners/test_headless_spawner.py (4)

src/gobby/agents/spawners/base.py (2)

HeadlessResult (89-104)

get_output (102-104)

src/gobby/agents/spawners/headless.py (2)

HeadlessSpawner (38-225)

spawn_and_capture (95-158)

src/gobby/agents/spawners/embedded.py (3)

_get_spawn_utils (30-46)

spawn (57-160)

spawn_agent (162-227)

src/gobby/agents/spawn.py (2)

build_cli_command (147-211)

_create_prompt_file (96-144)

src/gobby/servers/routes/sessions.py (2)

src/gobby/utils/metrics.py (1)

inc_counter (316-328)

src/gobby/autonomous/stop_registry.py (3)

signal_stop (60-111)

get_signal (113-142)

clear (182-202)

tests/agents/test_registry.py (1)

src/gobby/agents/registry.py (12)

RunningAgent (25-86)

RunningAgentRegistry (89-398)

get_running_agent_registry (406-418)

list_all (283-291)

remove (192-219)

get_by_session (221-235)

get_by_pid (237-251)

list_by_parent (253-268)

list_by_mode (270-281)

count_by_parent (303-316)

cleanup_by_pids (318-351)

add_event_callback (124-134)

tests/agents/spawners/test_windows_spawner.py (2)

src/gobby/agents/spawners/base.py (2)

SpawnResult (50-57)

TerminalType (22-46)

src/gobby/agents/spawners/windows.py (4)

CmdSpawner (76-136)

PowerShellSpawner (139-217)

WindowsTerminalSpawner (18-73)

WSLSpawner (220-308)

src/gobby/servers/routes/mcp.py (3)

src/gobby/servers/routes/dependencies.py (2)

get_metrics_manager (115-118)

get_server (39-47)

src/gobby/mcp_proxy/metrics.py (2)

ToolMetricsManager (71-620)

get_metrics (156-217)

src/gobby/servers/http.py (1)

_resolve_project_id (234-270)

src/gobby/autonomous/progress_tracker.py (2)

src/gobby/storage/database.py (3)

LocalDatabase (17-99)

fetchall (71-74)

fetchone (66-69)

tests/memory/test_manager.py (1)

db (31-36)

tests/agents/test_spawn.py (1)

src/gobby/agents/spawn.py (10)

spawn (285-341)

PreparedSpawn (437-459)

TerminalSpawner (214-433)

_cleanup_all_prompt_files (85-93)

_create_prompt_file (96-144)

build_cli_command (147-211)

prepare_terminal_spawn (462-554)

read_prompt_from_env (557-582)

spawn_agent (343-417)

_write_prompt_file (419-433)

tests/adapters/test_gemini.py (2)

src/gobby/adapters/gemini.py (2)

GeminiAdapter (35-302)

normalize_tool_name (127-136)

src/gobby/hooks/events.py (4)

HookEvent (68-107)

HookEventType (19-55)

HookResponse (111-136)

SessionSource (58-64)

src/gobby/autonomous/__init__.py (3)

src/gobby/autonomous/progress_tracker.py (4)

ProgressEvent (58-70)

ProgressSummary (74-84)

ProgressTracker (87-446)

ProgressType (20-33)

src/gobby/autonomous/stop_registry.py (2)

StopRegistry (37-270)

StopSignal (22-34)

src/gobby/autonomous/stuck_detector.py (3)

StuckDetectionResult (35-42)

StuckDetector (45-371)

TaskSelectionEvent (25-31)

src/gobby/mcp_proxy/metrics.py (1)

src/gobby/mcp_proxy/tools/metrics.py (2)

get_failing_tools (104-133)

reset_metrics (174-201)

src/gobby/tasks/external_validator.py (3)

src/gobby/config/tasks.py (1)

TaskValidationConfig (173-281)

src/gobby/llm/service.py (1)

LLMService (27-238)

src/gobby/agents/runner.py (2)

AgentRunner (147-920)

AgentConfig (35-111)

src/gobby/cli/__init__.py (1)

src/gobby/cli/workflows.py (1)

workflows (39-41)

src/gobby/mcp_proxy/tools/metrics.py (2)

src/gobby/mcp_proxy/tools/internal.py (1)

tool (75-140)

src/gobby/mcp_proxy/metrics.py (2)

get_failing_tools (292-343)

reset_metrics (345-384)

src/gobby/servers/websocket.py (3)

tests/autonomous/test_autonomous.py (3)

stop_registry (91-93)

session_id (73-81)

signal (947-952)

src/gobby/autonomous/stop_registry.py (1)

signal_stop (60-111)

tests/servers/test_http_server.py (1)

signal_stop (949-954)

🪛 GitHub Actions: CI

tests/agents/test_spawners.py

[error] 508-508: SpawnResult pid is None when success expected.

src/gobby/autonomous/stop_registry.py

[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.

src/gobby/autonomous/stuck_detector.py

[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.

src/gobby/cli/sessions.py

[error] 610-610: Incompatible types in assignment (expression has type "str", variable has type "TextIOWrapper[_WrappedBuffer]") [assignment]

src/gobby/mcp_proxy/tools/worktrees.py

[warning] 57-57: Bandit: B108 hardcoded_tmp_directory: probable insecure usage of temporary directory /tmp.

src/gobby/mcp_proxy/metrics.py

[warning] 189-189: Bandit: B608 hardcoded_sql_expressions: possible SQL injection risk due to string-based query construction.

🪛 markdownlint-cli2 (0.18.1)

GEMINI.md

59-59: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test (Python 3.12)
GitHub Check: Test (Python 3.11)

coderabbitai · 2026-01-08T16:13:33Z

+"""Stop signal registry for autonomous session management.
+
+Provides thread-safe stop signal management for autonomous workflows.
+External systems (HTTP, WebSocket, CLI, MCP) can signal sessions to stop
+gracefully, and workflows can check for pending stop signals at step
+transitions.
+"""
+


⚠️ Potential issue | 🟡 Minor

Pipeline failure: formatting required.

The CI pipeline reports that this file needs reformatting to satisfy ruff format standards. Run ruff format src/gobby/autonomous/stop_registry.py to fix.

🧰 Tools

🪛 GitHub Actions: CI

[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.

🤖 Prompt for AI Agents

In @src/gobby/autonomous/stop_registry.py around lines 1 - 8, The file needs to be reformatted to satisfy ruff; run the formatter against the module (e.g., run "ruff format src/gobby/autonomous/stop_registry.py" or run ruff format for the repo) to apply the required whitespace and style changes so the module-level docstring and file formatting comply with CI checks; commit the reformatted file (stop_registry.py) and push.

coderabbitai · 2026-01-08T16:13:33Z

+"""Stuck detection for autonomous session management.
+
+Provides multi-layer stuck detection for autonomous workflows:
+1. Task selection loop detection - same tasks being selected repeatedly
+2. Progress stagnation - no meaningful progress being made
+3. Tool call patterns - repeated identical tool calls
+"""


⚠️ Potential issue | 🟡 Minor

Pipeline failure: File requires formatting.

The CI pipeline reports this file needs to be reformatted to satisfy ruff format standards. Run ruff format src/gobby/autonomous/stuck_detector.py to fix.

🧰 Tools

🪛 GitHub Actions: CI

[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.

🤖 Prompt for AI Agents

In @src/gobby/autonomous/stuck_detector.py around lines 1 - 7, The file src/gobby/autonomous/stuck_detector.py fails CI formatting; run the formatter (ruff format src/gobby/autonomous/stuck_detector.py) or apply ruff-compatible formatting to the module (ensure the top-level docstring and surrounding whitespace follow ruff/PEP8 rules) and commit the formatted file so the pipeline passes.

coderabbitai · 2026-01-08T16:13:34Z

+class TestAdapterIntegration:
+    """Integration tests for adapter behavior with realistic scenarios."""
+
+    def test_session_lifecycle_simulation(self):
+        """Simulate full session lifecycle through adapter."""
+        events_processed = []
+
+        class LifecycleAdapter(BaseAdapter):
+            source = SessionSource.CLAUDE
+
+            def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:
+                event_type_map = {
+                    "session-start": HookEventType.SESSION_START,
+                    "pre-tool-use": HookEventType.BEFORE_TOOL,
+                    "post-tool-use": HookEventType.AFTER_TOOL,
+                    "session-end": HookEventType.SESSION_END,
+                }
+                hook_type = native_event.get("hook_type")
+                if hook_type not in event_type_map:
+                    return None
+
+                events_processed.append(hook_type)
+                return HookEvent(
+                    event_type=event_type_map[hook_type],
+                    session_id=native_event.get("session_id", "test-sess"),
+                    source=SessionSource.CLAUDE,
+                    timestamp=datetime.now(UTC),
+                    data=native_event.get("input_data", {}),
+                )
+
+            def translate_from_hook_response(self, response: HookResponse) -> dict:
+                return {"continue": response.decision == "allow"}
+
+        adapter = LifecycleAdapter()
+        mock_manager = MagicMock()
+        mock_manager.handle.return_value = HookResponse(decision="allow")
+
+        # Simulate session lifecycle
+        session_events = [
+            {"hook_type": "session-start", "session_id": "sess-1"},
+            {"hook_type": "pre-tool-use", "session_id": "sess-1"},
+            {"hook_type": "post-tool-use", "session_id": "sess-1"},
+            {"hook_type": "pre-tool-use", "session_id": "sess-1"},
+            {"hook_type": "post-tool-use", "session_id": "sess-1"},
+            {"hook_type": "session-end", "session_id": "sess-1"},
+        ]
+
+        for event in session_events:
+            result = adapter.handle_native(event, mock_manager)
+            assert result["continue"] is True
+
+        assert events_processed == [
+            "session-start",
+            "pre-tool-use",
+            "post-tool-use",
+            "pre-tool-use",
+            "post-tool-use",
+            "session-end",
+        ]
+        assert mock_manager.handle.call_count == 6
+
+    def test_tool_blocking_scenario(self):
+        """Simulate tool being blocked by hook manager."""
+
+        class BlockingAdapter(BaseAdapter):
+            source = SessionSource.CLAUDE
+
+            def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:
+                return HookEvent(
+                    event_type=HookEventType.BEFORE_TOOL,
+                    session_id="blocking-session",
+                    source=SessionSource.CLAUDE,
+                    timestamp=datetime.now(UTC),
+                    data=native_event.get("input_data", {}),
+                )
+
+            def translate_from_hook_response(self, response: HookResponse) -> dict:
+                result = {"continue": response.decision == "allow"}
+                if response.reason:
+                    result["stopReason"] = response.reason
+                if response.system_message:
+                    result["systemMessage"] = response.system_message
+                return result
+
+        adapter = BlockingAdapter()
+        mock_manager = MagicMock()
+        mock_manager.handle.return_value = HookResponse(
+            decision="deny",
+            reason="No task claimed",
+            system_message="Please claim a task before editing files",
+        )
+
+        result = adapter.handle_native(
+            {"input_data": {"tool_name": "Write"}},
+            mock_manager,
+        )
+
+        assert result["continue"] is False
+        assert result["stopReason"] == "No task claimed"
+        assert "Please claim a task" in result["systemMessage"]
+
+    def test_context_injection_scenario(self):
+        """Simulate context being injected via hook response."""
+
+        class ContextAdapter(BaseAdapter):
+            source = SessionSource.CLAUDE
+
+            def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:
+                return HookEvent(
+                    event_type=HookEventType.SESSION_START,
+                    session_id="context-session",
+                    source=SessionSource.CLAUDE,
+                    timestamp=datetime.now(UTC),
+                    data={},
+                )
+
+            def translate_from_hook_response(self, response: HookResponse) -> dict:
+                result = {"continue": True}
+                if response.context:
+                    result["result"] = response.context
+                return result
+
+        adapter = ContextAdapter()
+        mock_manager = MagicMock()
+        mock_manager.handle.return_value = HookResponse(
+            decision="allow",
+            context="## Continuation Context\nYou have 3 pending tasks.",
+        )
+
+        result = adapter.handle_native({}, mock_manager)
+
+        assert result["continue"] is True
+        assert "Continuation Context" in result["result"]
+        assert "3 pending tasks" in result["result"]
+


🛠️ Refactor suggestion | 🟠 Major

Add integration marker to integration tests.

The TestAdapterIntegration class contains integration-style tests that simulate full lifecycle scenarios. According to the coding guidelines, these should be marked with @pytest.mark.integration.

🏷️ Add integration marker

Add the marker to the class:

+@pytest.mark.integration class TestAdapterIntegration: """Integration tests for adapter behavior with realistic scenarios."""

Based on coding guidelines for test categorization.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class TestAdapterIntegration:

"""Integration tests for adapter behavior with realistic scenarios."""

def test_session_lifecycle_simulation(self):

"""Simulate full session lifecycle through adapter."""

events_processed = []

class LifecycleAdapter(BaseAdapter):

source = SessionSource.CLAUDE

def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:

event_type_map = {

"session-start": HookEventType.SESSION_START,

"pre-tool-use": HookEventType.BEFORE_TOOL,

"post-tool-use": HookEventType.AFTER_TOOL,

"session-end": HookEventType.SESSION_END,

}

hook_type = native_event.get("hook_type")

if hook_type not in event_type_map:

return None

events_processed.append(hook_type)

return HookEvent(

event_type=event_type_map[hook_type],

session_id=native_event.get("session_id", "test-sess"),

source=SessionSource.CLAUDE,

timestamp=datetime.now(UTC),

data=native_event.get("input_data", {}),

)

def translate_from_hook_response(self, response: HookResponse) -> dict:

return {"continue": response.decision == "allow"}

adapter = LifecycleAdapter()

mock_manager = MagicMock()

mock_manager.handle.return_value = HookResponse(decision="allow")

# Simulate session lifecycle

session_events = [

{"hook_type": "session-start", "session_id": "sess-1"},

{"hook_type": "pre-tool-use", "session_id": "sess-1"},

{"hook_type": "post-tool-use", "session_id": "sess-1"},

{"hook_type": "pre-tool-use", "session_id": "sess-1"},

{"hook_type": "post-tool-use", "session_id": "sess-1"},

{"hook_type": "session-end", "session_id": "sess-1"},

]

for event in session_events:

result = adapter.handle_native(event, mock_manager)

assert result["continue"] is True

assert events_processed == [

"session-start",

"pre-tool-use",

"post-tool-use",

"pre-tool-use",

"post-tool-use",

"session-end",

]

assert mock_manager.handle.call_count == 6

def test_tool_blocking_scenario(self):

"""Simulate tool being blocked by hook manager."""

class BlockingAdapter(BaseAdapter):

source = SessionSource.CLAUDE

def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:

return HookEvent(

event_type=HookEventType.BEFORE_TOOL,

session_id="blocking-session",

source=SessionSource.CLAUDE,

timestamp=datetime.now(UTC),

data=native_event.get("input_data", {}),

)

def translate_from_hook_response(self, response: HookResponse) -> dict:

result = {"continue": response.decision == "allow"}

if response.reason:

result["stopReason"] = response.reason

if response.system_message:

result["systemMessage"] = response.system_message

return result

adapter = BlockingAdapter()

mock_manager = MagicMock()

mock_manager.handle.return_value = HookResponse(

decision="deny",

reason="No task claimed",

system_message="Please claim a task before editing files",

)

result = adapter.handle_native(

{"input_data": {"tool_name": "Write"}},

mock_manager,

)

assert result["continue"] is False

assert result["stopReason"] == "No task claimed"

assert "Please claim a task" in result["systemMessage"]

def test_context_injection_scenario(self):

"""Simulate context being injected via hook response."""

class ContextAdapter(BaseAdapter):

source = SessionSource.CLAUDE

def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:

return HookEvent(

event_type=HookEventType.SESSION_START,

session_id="context-session",

source=SessionSource.CLAUDE,

timestamp=datetime.now(UTC),

data={},

)

def translate_from_hook_response(self, response: HookResponse) -> dict:

result = {"continue": True}

if response.context:

result["result"] = response.context

return result

adapter = ContextAdapter()

mock_manager = MagicMock()

mock_manager.handle.return_value = HookResponse(

decision="allow",

context="## Continuation Context\nYou have 3 pending tasks.",

)

result = adapter.handle_native({}, mock_manager)

assert result["continue"] is True

assert "Continuation Context" in result["result"]

assert "3 pending tasks" in result["result"]

@pytest.mark.integration

class TestAdapterIntegration:

"""Integration tests for adapter behavior with realistic scenarios."""

def test_session_lifecycle_simulation(self):

"""Simulate full session lifecycle through adapter."""

events_processed = []

class LifecycleAdapter(BaseAdapter):

source = SessionSource.CLAUDE

def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:

event_type_map = {

"session-start": HookEventType.SESSION_START,

"pre-tool-use": HookEventType.BEFORE_TOOL,

"post-tool-use": HookEventType.AFTER_TOOL,

"session-end": HookEventType.SESSION_END,

}

hook_type = native_event.get("hook_type")

if hook_type not in event_type_map:

return None

events_processed.append(hook_type)

return HookEvent(

event_type=event_type_map[hook_type],

session_id=native_event.get("session_id", "test-sess"),

source=SessionSource.CLAUDE,

timestamp=datetime.now(UTC),

data=native_event.get("input_data", {}),

)

def translate_from_hook_response(self, response: HookResponse) -> dict:

return {"continue": response.decision == "allow"}

adapter = LifecycleAdapter()

mock_manager = MagicMock()

mock_manager.handle.return_value = HookResponse(decision="allow")

# Simulate session lifecycle

session_events = [

{"hook_type": "session-start", "session_id": "sess-1"},

{"hook_type": "pre-tool-use", "session_id": "sess-1"},

{"hook_type": "post-tool-use", "session_id": "sess-1"},

{"hook_type": "pre-tool-use", "session_id": "sess-1"},

{"hook_type": "post-tool-use", "session_id": "sess-1"},

{"hook_type": "session-end", "session_id": "sess-1"},

]

for event in session_events:

result = adapter.handle_native(event, mock_manager)

assert result["continue"] is True

assert events_processed == [

"session-start",

"pre-tool-use",

"post-tool-use",

"pre-tool-use",

"post-tool-use",

"session-end",

]

assert mock_manager.handle.call_count == 6

def test_tool_blocking_scenario(self):

"""Simulate tool being blocked by hook manager."""

class BlockingAdapter(BaseAdapter):

source = SessionSource.CLAUDE

def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:

return HookEvent(

event_type=HookEventType.BEFORE_TOOL,

session_id="blocking-session",

source=SessionSource.CLAUDE,

timestamp=datetime.now(UTC),

data=native_event.get("input_data", {}),

)

def translate_from_hook_response(self, response: HookResponse) -> dict:

result = {"continue": response.decision == "allow"}

if response.reason:

result["stopReason"] = response.reason

if response.system_message:

result["systemMessage"] = response.system_message

return result

adapter = BlockingAdapter()

mock_manager = MagicMock()

mock_manager.handle.return_value = HookResponse(

decision="deny",

reason="No task claimed",

system_message="Please claim a task before editing files",

)

result = adapter.handle_native(

{"input_data": {"tool_name": "Write"}},

mock_manager,

)

assert result["continue"] is False

assert result["stopReason"] == "No task claimed"

assert "Please claim a task" in result["systemMessage"]

def test_context_injection_scenario(self):

"""Simulate context being injected via hook response."""

class ContextAdapter(BaseAdapter):

source = SessionSource.CLAUDE

def translate_to_hook_event(self, native_event: dict) -> HookEvent | None:

return HookEvent(

event_type=HookEventType.SESSION_START,

session_id="context-session",

source=SessionSource.CLAUDE,

timestamp=datetime.now(UTC),

data={},

)

def translate_from_hook_response(self, response: HookResponse) -> dict:

result = {"continue": True}

if response.context:

result["result"] = response.context

return result

adapter = ContextAdapter()

mock_manager = MagicMock()

mock_manager.handle.return_value = HookResponse(

decision="allow",

context="## Continuation Context\nYou have 3 pending tasks.",

)

result = adapter.handle_native({}, mock_manager)

assert result["continue"] is True

assert "Continuation Context" in result["result"]

assert "3 pending tasks" in result["result"]

🤖 Prompt for AI Agents

In @tests/adapters/test_base.py around lines 475 - 609, Add the pytest integration marker to the TestAdapterIntegration test class: annotate the class TestAdapterIntegration with @pytest.mark.integration and ensure pytest is imported in the test module if not already; this will mark all methods (test_session_lifecycle_simulation, test_tool_blocking_scenario, test_context_injection_scenario) as integration tests.

coderabbitai · 2026-01-08T16:13:34Z

+class TestCodexAdapterEventMapping:
+    """Tests verifying event type mapping constants."""
+
+    def test_event_map_contains_all_supported_events(self):
+        """EVENT_MAP contains all events we claim to support."""
+        expected_methods = [
+            "thread/started",
+            "thread/archive",
+            "turn/started",
+            "turn/completed",
+            "item/commandExecution/requestApproval",
+            "item/fileChange/requestApproval",
+            "item/completed",
+        ]
+
+        for method in expected_methods:
+            assert method in CodexAdapter.EVENT_MAP
+
+    def test_tool_item_types_complete(self):
+        """TOOL_ITEM_TYPES contains all tool-related item types."""
+        assert "commandExecution" in CodexAdapter.TOOL_ITEM_TYPES
+        assert "fileChange" in CodexAdapter.TOOL_ITEM_TYPES
+        assert "mcpToolCall" in CodexAdapter.TOOL_ITEM_TYPES
+
+    def test_session_tracking_events_complete(self):
+        """SESSION_TRACKING_EVENTS contains necessary events."""
+        assert "thread/started" in CodexAdapter.SESSION_TRACKING_EVENTS
+        assert "turn/started" in CodexAdapter.SESSION_TRACKING_EVENTS
+        assert "turn/completed" in CodexAdapter.SESSION_TRACKING_EVENTS
+        assert "item/completed" in CodexAdapter.SESSION_TRACKING_EVENTS


🛠️ Refactor suggestion | 🟠 Major

Add integration marker to API surface tests.

The TestCodexAdapterEventMapping class validates the public API surface and event mapping constants. According to the coding guidelines, integration tests should be marked with @pytest.mark.integration.

🏷️ Add integration marker

+@pytest.mark.integration class TestCodexAdapterEventMapping: """Tests verifying event type mapping constants."""

Based on coding guidelines for test categorization.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class TestCodexAdapterEventMapping:

"""Tests verifying event type mapping constants."""

def test_event_map_contains_all_supported_events(self):

"""EVENT_MAP contains all events we claim to support."""

expected_methods = [

"thread/started",

"thread/archive",

"turn/started",

"turn/completed",

"item/commandExecution/requestApproval",

"item/fileChange/requestApproval",

"item/completed",

]

for method in expected_methods:

assert method in CodexAdapter.EVENT_MAP

def test_tool_item_types_complete(self):

"""TOOL_ITEM_TYPES contains all tool-related item types."""

assert "commandExecution" in CodexAdapter.TOOL_ITEM_TYPES

assert "fileChange" in CodexAdapter.TOOL_ITEM_TYPES

assert "mcpToolCall" in CodexAdapter.TOOL_ITEM_TYPES

def test_session_tracking_events_complete(self):

"""SESSION_TRACKING_EVENTS contains necessary events."""

assert "thread/started" in CodexAdapter.SESSION_TRACKING_EVENTS

assert "turn/started" in CodexAdapter.SESSION_TRACKING_EVENTS

assert "turn/completed" in CodexAdapter.SESSION_TRACKING_EVENTS

assert "item/completed" in CodexAdapter.SESSION_TRACKING_EVENTS

@pytest.mark.integration

class TestCodexAdapterEventMapping:

"""Tests verifying event type mapping constants."""

def test_event_map_contains_all_supported_events(self):

"""EVENT_MAP contains all events we claim to support."""

expected_methods = [

"thread/started",

"thread/archive",

"turn/started",

"turn/completed",

"item/commandExecution/requestApproval",

"item/fileChange/requestApproval",

"item/completed",

]

for method in expected_methods:

assert method in CodexAdapter.EVENT_MAP

def test_tool_item_types_complete(self):

"""TOOL_ITEM_TYPES contains all tool-related item types."""

assert "commandExecution" in CodexAdapter.TOOL_ITEM_TYPES

assert "fileChange" in CodexAdapter.TOOL_ITEM_TYPES

assert "mcpToolCall" in CodexAdapter.TOOL_ITEM_TYPES

def test_session_tracking_events_complete(self):

"""SESSION_TRACKING_EVENTS contains necessary events."""

assert "thread/started" in CodexAdapter.SESSION_TRACKING_EVENTS

assert "turn/started" in CodexAdapter.SESSION_TRACKING_EVENTS

assert "turn/completed" in CodexAdapter.SESSION_TRACKING_EVENTS

assert "item/completed" in CodexAdapter.SESSION_TRACKING_EVENTS

🤖 Prompt for AI Agents

In @tests/adapters/test_codex.py around lines 1507 - 1536, Add the pytest integration marker to the TestCodexAdapterEventMapping test class by decorating the class with @pytest.mark.integration so all its tests are categorized as integration tests; ensure pytest is imported in the test module if not already present. Target the TestCodexAdapterEventMapping class (and its methods test_event_map_contains_all_supported_events, test_tool_item_types_complete, test_session_tracking_events_complete) when applying the decorator.

coderabbitai · 2026-01-08T16:13:34Z

+class TestIntegration:
+    """Integration tests for full round-trip scenarios."""
+
+    @pytest.fixture
+    def adapter(self):
+        """Create a GeminiAdapter instance."""
+        return GeminiAdapter()
+
+    @pytest.fixture
+    def mock_hook_manager(self):
+        """Create a mock HookManager."""
+        return MagicMock()
+
+    def test_session_lifecycle_roundtrip(self, adapter, mock_hook_manager):
+        """Tests full session start/end lifecycle."""
+        # Session start
+        mock_hook_manager.handle.return_value = HookResponse(
+            decision="allow",
+            context="Welcome! You have 3 pending tasks.",
+        )
+
+        start_event = {
+            "hook_type": "SessionStart",
+            "input_data": {
+                "session_id": "gemini-lifecycle-123",
+                "cwd": "/home/user/project",
+                "timestamp": "2025-01-15T10:00:00Z",
+            },
+        }
+
+        start_result = adapter.handle_native(start_event, mock_hook_manager)
+
+        assert start_result["decision"] == "allow"
+        assert "pending tasks" in start_result["hookSpecificOutput"]["additionalContext"]
+
+        # Session end
+        mock_hook_manager.handle.return_value = HookResponse(decision="allow")
+
+        end_event = {
+            "hook_type": "SessionEnd",
+            "input_data": {
+                "session_id": "gemini-lifecycle-123",
+                "timestamp": "2025-01-15T11:00:00Z",
+            },
+        }
+
+        end_result = adapter.handle_native(end_event, mock_hook_manager)
+
+        assert end_result["decision"] == "allow"
+
+    def test_tool_execution_roundtrip(self, adapter, mock_hook_manager):
+        """Tests full tool execution lifecycle."""
+        # Before tool
+        mock_hook_manager.handle.return_value = HookResponse(decision="allow")
+
+        before_event = {
+            "hook_type": "BeforeTool",
+            "input_data": {
+                "session_id": "gemini-tool-456",
+                "tool_name": "WriteFileTool",
+                "tool_input": {
+                    "path": "/tmp/test.txt",
+                    "content": "Hello, World!",
+                },
+            },
+        }
+
+        before_result = adapter.handle_native(before_event, mock_hook_manager)
+
+        assert before_result["decision"] == "allow"
+
+        # Verify the tool name was normalized in the HookEvent
+        call_args = mock_hook_manager.handle.call_args[0][0]
+        assert call_args.metadata["normalized_tool_name"] == "Write"
+
+        # After tool
+        mock_hook_manager.handle.return_value = HookResponse(decision="allow")
+
+        after_event = {
+            "hook_type": "AfterTool",
+            "input_data": {
+                "session_id": "gemini-tool-456",
+                "tool_name": "WriteFileTool",
+                "tool_output": {"success": True, "bytes_written": 13},
+            },
+        }
+
+        after_result = adapter.handle_native(after_event, mock_hook_manager)
+
+        assert after_result["decision"] == "allow"
+
+    def test_tool_denied_by_workflow(self, adapter, mock_hook_manager):
+        """Tests tool denial scenario."""
+        mock_hook_manager.handle.return_value = HookResponse(
+            decision="deny",
+            reason="No task claimed. Use gobby-tasks.create_task() first.",
+            system_message="File modifications blocked: claim a task first.",
+        )
+
+        native_event = {
+            "hook_type": "BeforeTool",
+            "input_data": {
+                "session_id": "gemini-deny-789",
+                "tool_name": "EditFileTool",
+                "tool_input": {
+                    "path": "/src/main.py",
+                    "edit": "...",
+                },
+            },
+        }
+
+        result = adapter.handle_native(native_event, mock_hook_manager)
+
+        assert result["decision"] == "deny"
+        assert "No task claimed" in result["reason"]
+        assert result["systemMessage"] == "File modifications blocked: claim a task first."


🛠️ Refactor suggestion | 🟠 Major

Add integration marker to integration tests.

The TestIntegration class contains integration tests that simulate full lifecycle scenarios including session start/end and tool execution. According to the coding guidelines, these should be marked with @pytest.mark.integration.

🏷️ Add integration marker

+@pytest.mark.integration class TestIntegration: """Integration tests for full round-trip scenarios."""

Based on coding guidelines for test categorization.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class TestIntegration:

"""Integration tests for full round-trip scenarios."""

@pytest.fixture

def adapter(self):

"""Create a GeminiAdapter instance."""

return GeminiAdapter()

@pytest.fixture

def mock_hook_manager(self):

"""Create a mock HookManager."""

return MagicMock()

def test_session_lifecycle_roundtrip(self, adapter, mock_hook_manager):

"""Tests full session start/end lifecycle."""

# Session start

mock_hook_manager.handle.return_value = HookResponse(

decision="allow",

context="Welcome! You have 3 pending tasks.",

)

start_event = {

"hook_type": "SessionStart",

"input_data": {

"session_id": "gemini-lifecycle-123",

"cwd": "/home/user/project",

"timestamp": "2025-01-15T10:00:00Z",

},

}

start_result = adapter.handle_native(start_event, mock_hook_manager)

assert start_result["decision"] == "allow"

assert "pending tasks" in start_result["hookSpecificOutput"]["additionalContext"]

# Session end

mock_hook_manager.handle.return_value = HookResponse(decision="allow")

end_event = {

"hook_type": "SessionEnd",

"input_data": {

"session_id": "gemini-lifecycle-123",

"timestamp": "2025-01-15T11:00:00Z",

},

}

end_result = adapter.handle_native(end_event, mock_hook_manager)

assert end_result["decision"] == "allow"

def test_tool_execution_roundtrip(self, adapter, mock_hook_manager):

"""Tests full tool execution lifecycle."""

# Before tool

mock_hook_manager.handle.return_value = HookResponse(decision="allow")

before_event = {

"hook_type": "BeforeTool",

"input_data": {

"session_id": "gemini-tool-456",

"tool_name": "WriteFileTool",

"tool_input": {

"path": "/tmp/test.txt",

"content": "Hello, World!",

},

},

}

before_result = adapter.handle_native(before_event, mock_hook_manager)

assert before_result["decision"] == "allow"

# Verify the tool name was normalized in the HookEvent

call_args = mock_hook_manager.handle.call_args[0][0]

assert call_args.metadata["normalized_tool_name"] == "Write"

# After tool

mock_hook_manager.handle.return_value = HookResponse(decision="allow")

after_event = {

"hook_type": "AfterTool",

"input_data": {

"session_id": "gemini-tool-456",

"tool_name": "WriteFileTool",

"tool_output": {"success": True, "bytes_written": 13},

},

}

after_result = adapter.handle_native(after_event, mock_hook_manager)

assert after_result["decision"] == "allow"

def test_tool_denied_by_workflow(self, adapter, mock_hook_manager):

"""Tests tool denial scenario."""

mock_hook_manager.handle.return_value = HookResponse(

decision="deny",

reason="No task claimed. Use gobby-tasks.create_task() first.",

system_message="File modifications blocked: claim a task first.",

)

native_event = {

"hook_type": "BeforeTool",

"input_data": {

"session_id": "gemini-deny-789",

"tool_name": "EditFileTool",

"tool_input": {

"path": "/src/main.py",

"edit": "...",

},

},

}

result = adapter.handle_native(native_event, mock_hook_manager)

assert result["decision"] == "deny"

assert "No task claimed" in result["reason"]

assert result["systemMessage"] == "File modifications blocked: claim a task first."

@pytest.mark.integration

class TestIntegration:

"""Integration tests for full round-trip scenarios."""

@pytest.fixture

def adapter(self):

"""Create a GeminiAdapter instance."""

return GeminiAdapter()

@pytest.fixture

def mock_hook_manager(self):

"""Create a mock HookManager."""

return MagicMock()

def test_session_lifecycle_roundtrip(self, adapter, mock_hook_manager):

"""Tests full session start/end lifecycle."""

# Session start

mock_hook_manager.handle.return_value = HookResponse(

decision="allow",

context="Welcome! You have 3 pending tasks.",

)

start_event = {

"hook_type": "SessionStart",

"input_data": {

"session_id": "gemini-lifecycle-123",

"cwd": "/home/user/project",

"timestamp": "2025-01-15T10:00:00Z",

},

}

start_result = adapter.handle_native(start_event, mock_hook_manager)

assert start_result["decision"] == "allow"

assert "pending tasks" in start_result["hookSpecificOutput"]["additionalContext"]

# Session end

mock_hook_manager.handle.return_value = HookResponse(decision="allow")

end_event = {

"hook_type": "SessionEnd",

"input_data": {

"session_id": "gemini-lifecycle-123",

"timestamp": "2025-01-15T11:00:00Z",

},

}

end_result = adapter.handle_native(end_event, mock_hook_manager)

assert end_result["decision"] == "allow"

def test_tool_execution_roundtrip(self, adapter, mock_hook_manager):

"""Tests full tool execution lifecycle."""

# Before tool

mock_hook_manager.handle.return_value = HookResponse(decision="allow")

before_event = {

"hook_type": "BeforeTool",

"input_data": {

"session_id": "gemini-tool-456",

"tool_name": "WriteFileTool",

"tool_input": {

"path": "/tmp/test.txt",

"content": "Hello, World!",

},

},

}

before_result = adapter.handle_native(before_event, mock_hook_manager)

assert before_result["decision"] == "allow"

# Verify the tool name was normalized in the HookEvent

call_args = mock_hook_manager.handle.call_args[0][0]

assert call_args.metadata["normalized_tool_name"] == "Write"

# After tool

mock_hook_manager.handle.return_value = HookResponse(decision="allow")

after_event = {

"hook_type": "AfterTool",

"input_data": {

"session_id": "gemini-tool-456",

"tool_name": "WriteFileTool",

"tool_output": {"success": True, "bytes_written": 13},

},

}

after_result = adapter.handle_native(after_event, mock_hook_manager)

assert after_result["decision"] == "allow"

def test_tool_denied_by_workflow(self, adapter, mock_hook_manager):

"""Tests tool denial scenario."""

mock_hook_manager.handle.return_value = HookResponse(

decision="deny",

reason="No task claimed. Use gobby-tasks.create_task() first.",

system_message="File modifications blocked: claim a task first.",

)

native_event = {

"hook_type": "BeforeTool",

"input_data": {

"session_id": "gemini-deny-789",

"tool_name": "EditFileTool",

"tool_input": {

"path": "/src/main.py",

"edit": "...",

},

},

}

result = adapter.handle_native(native_event, mock_hook_manager)

assert result["decision"] == "deny"

assert "No task claimed" in result["reason"]

assert result["systemMessage"] == "File modifications blocked: claim a task first."

🤖 Prompt for AI Agents

In @tests/adapters/test_gemini.py around lines 885 - 1000, The TestIntegration class lacks the integration marker; add the pytest integration marker by decorating the TestIntegration class with @pytest.mark.integration (ensure pytest is imported if not already) so all methods (test_session_lifecycle_roundtrip, test_tool_execution_roundtrip, test_tool_denied_by_workflow) are treated as integration tests; place the decorator immediately above the class definition referencing the TestIntegration symbol.

coderabbitai · 2026-01-08T16:13:34Z

+    def test_spawn_with_unicode_in_args(self):
+        """spawn() handles unicode in arguments."""
+        spawner = HeadlessSpawner()
+        result = spawner.spawn(
+            ["echo", "hello \u4e16\u754c"],  # "hello world" in Chinese
+            cwd="/tmp",
+        )
+
+        assert result.success is True
+        stdout, _ = result.process.communicate()
+        assert "\u4e16\u754c" in stdout or "world" in stdout.lower()


⚠️ Potential issue | 🟡 Minor

Questionable fallback assertion for Unicode test.

Line 1027's fallback or "world" in stdout.lower() doesn't logically relate to testing Chinese characters \u4e16\u754c (世界, meaning "world" in Chinese). If the terminal doesn't support Unicode, this fallback masks the actual test failure. Consider removing it or using a more appropriate assertion.

🐛 Suggested fix

assert result.success is True stdout, _ = result.process.communicate() - assert "\u4e16\u754c" in stdout or "world" in stdout.lower() + assert "\u4e16\u754c" in stdout

🤖 Prompt for AI Agents

In @tests/agents/spawners/test_headless_spawner.py around lines 1017 - 1027, The test test_spawn_with_unicode_in_args should not accept an unrelated ASCII fallback; remove the fallback check and assert only that the Unicode substring "\u4e16\u754c" appears in the captured stdout after calling HeadlessSpawner.spawn (check result.success and result.process.communicate() as currently done), or alternatively make the test explicitly skip/mark xfail on environments that cannot support Unicode; update the assertion to only verify "\u4e16\u754c" in stdout (and keep the existing result.success check) so the test fails when Unicode is lost rather than silently passing.

… table - Add migration 40 to rename column using ALTER TABLE RENAME COLUMN - Update all SQL queries to use task_type instead of type - Update Task.from_row() to read from task_type column - Remove outdated comments about DB column mapping This aligns DB column names with Python field names, supporting the safe_update helper for SQL injection remediation.

Remove obsolete type: ignore comments that mypy no longer requires: - json_helpers.py: msgspec decode type ignore - codex.py: fix variable shadowing (api_key -> env_api_key) - mcp.py: t.name attribute type ignore - stdio.py: 12 @mcp.tool() decorator type ignores

claude · 2026-01-08T17:25:54Z

PR Review: v0.1.1

This is a substantial PR merging dev into main with 88K+ additions across 100+ files. I've reviewed the major changes focusing on the new autonomous execution features, task management enhancements, and session handling improvements.

Summary

Major Features Added:

Autonomous session execution with progress tracking and stuck detection
Graceful stop signals for session management
Enhanced handoff generation with compact and full summaries
Tool failure metrics tracking
Consolidated git hook management
HTTP and WebSocket endpoints for session control

Overall Assessment: The code quality is high with good separation of concerns, comprehensive test coverage for new autonomous features, and proper error handling. There are a few areas for improvement noted below.

Code Quality and Best Practices

Strengths

Excellent Test Coverage - The new tests/autonomous/test_autonomous.py is exemplary with 1,474 lines covering all three detection layers, thread safety, and integration tests
Well-Structured Modules - Clean separation with progress_tracker.py, stop_registry.py, and stuck_detector.py each having clear responsibilities
Strong Type Safety - Good use of dataclasses, type hints, and TYPE_CHECKING guards
Thread Safety - Proper use of locks in critical sections with context managers
Database Schema Design - Well-designed tables with foreign key constraints, CASCADE delete, and strategic indexes including partial indexes

Areas for Improvement

Security: JSON Deserialization

Location: src/gobby/autonomous/stuck_detector.py:366-369
Issue: Using ast.literal_eval on untrusted data is risky
Recommendation: Use json.loads exclusively for user-provided context data

Error Handling

Location: src/gobby/cli/sessions.py:456-467
Issue: Broad Exception catching silently ignores errors
Recommendation: Be specific about expected exceptions

Code Duplication

Location: src/gobby/cli/sessions.py:110-127, 169-180, 269-280
Issue: Session ID resolution logic duplicated 3+ times
Recommendation: Extract to helper function

Input Validation

Location: src/gobby/autonomous/progress_tracker.py:203-218
Issue: Bash command parsing without type checking
Recommendation: Add explicit str() conversion

Potential Bugs

Race Condition in Stop Signal Creation

Location: src/gobby/autonomous/stop_registry.py:78-85
Issue: get_signal queries database without holding lock, early return bypasses ON CONFLICT
Fix: Move duplicate check after upsert

Timezone Handling Inconsistency

Location: Throughout autonomous modules
Issue: Using datetime.fromisoformat without timezone validation
Recommendation: Enforce UTC everywhere and validate timezone info

Performance Considerations

Database Indexes - Good coverage with partial index on session_stop_signals.acknowledged_at WHERE acknowledged_at IS NULL

Query Optimization - ProgressTracker.get_summary makes 4 separate queries, could combine with CTEs for better performance

Memory Usage - Loading entire transcript into memory in cli/sessions.py:421-433 could be problematic for very long sessions

Security Concerns

Command Injection - Safe, using list form of subprocess commands not shell=True

Path Traversal - Low risk, proper Path.exists checks in place

HTTP Endpoint Authorization - Stop signal endpoints lack authentication checks, anyone can stop any session if no upstream middleware

Test Coverage

Excellent for autonomous features:

1,474 lines of comprehensive tests
All three stuck detection layers covered
Thread safety and integration tests included
225 total test files in project

Missing tests for:

src/gobby/tasks/external_validator.py - 410 lines of complex logic
src/gobby/tasks/commits.py - Auto-linking logic and regex patterns
src/gobby/cli/sessions.py - create-handoff command with 178 new lines

Architecture and Design

Excellent Separation of Concerns - Three-layer stuck detection is well-designed:

Task loops - Business logic level
Progress stagnation - Work output level
Tool loops - Execution level

Good Design Patterns - Registry, Tracker, and Strategy patterns used appropriately

Recommendations Summary

High Priority

Security: Remove ast.literal_eval usage in stuck_detector.py
Testing: Add tests for external_validator.py and commits.py
Bug: Fix race condition in StopRegistry.signal_stop

Medium Priority

Code Quality: Extract session ID resolution to helper function
Performance: Optimize ProgressTracker.get_summary with single query
Error Handling: Be more specific with exception catching

Low Priority

Documentation: Add architecture diagrams for autonomous workflow
Validation: Add explicit input type checks for command strings
Security: Consider authentication for stop signal endpoints

Conclusion

Recommendation: Approve with minor fixes

This is a well-crafted PR with significant new functionality. The autonomous execution infrastructure is thoughtfully designed with excellent test coverage. Main concerns are:

A couple of security and safety issues with JSON parsing and exception handling
Some missing test coverage for non-autonomous features
Minor performance optimizations possible

The issues identified are not blockers but should be addressed before the next release. The autonomous features are production-ready, and the test coverage gives confidence in their reliability.

Great work on the comprehensive testing and clean architecture!

Review completed by: Claude Sonnet 4.5
Files examined: 20+ key files focusing on autonomous infrastructure, sessions, and task management
Test files reviewed: tests/autonomous/test_autonomous.py with 1,474 lines of comprehensive tests

…-2321c7]

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/gobby/tasks/research.py (1)
143-146: Fix dictionary key access for tool objects returned by mcp_manager.list_tools().

The code accesses t.name and t.description as object attributes, but mcp_manager.list_tools() returns dictionaries (per the type signature dict[str, list[dict[str, Any]]] in manager.py). This will raise AttributeError: 'dict' object has no attribute 'name' at runtime. Use dictionary key access instead:
Required change
for t in all_tools:
    if t["name"] in ("search_web", "google_search", "brave_search"):
        prompt += f"5. {t['name']}(query): {t['description'][:100]}...\n"
        break
src/gobby/tasks/validation.py (2)
500-500: Critical: Awaiting non-async function causes runtime error.

Line 500 attempts to await the call to gather_validation_context, but this method is not declared as async (line 440). This will cause a runtime error when context_files is provided.
🐛 Fix: Remove await or make the method async

Option 1 (Recommended): Remove await since the function is synchronous
         # Gather context if provided
         file_context = ""
         if context_files:
-            file_context = await self.gather_validation_context(context_files)
+            file_context = self.gather_validation_context(context_files)
Option 2: Make gather_validation_context async (if future async file I/O is planned)
-    async def gather_validation_context(self, file_paths: list[str]) -> str:
+    async def gather_validation_context(self, file_paths: list[str]) -> str:
Then use async file I/O operations inside the method.
440-459: Arbitrary file read via unvalidated file paths in gather_validation_context.

The method reads files directly from user-provided paths without validation. Since file_paths originates from task_data["files"] (user input passed to the Task class), a caller could include paths like /etc/passwd or ../../secrets to read files outside the intended project scope.

Implement path validation before reading files:

Resolve all paths to absolute paths and verify they're within an allowed directory

Reject paths with .. components or absolute paths outside the project

Maintain an allowlist of safe directories if multiple locations need access

🤖 Fix all issues with AI agents

In @pyproject.toml:
- Line 141: The pyproject dependency line currently pins Black as
"black>=24.0.0"; update this constraint to a more recent stable release by
replacing that token with either "black>=25.0.0" (looser) or "black>=25.12.0"
(recommended/pinned) so the project uses the latest fixes and improvements;
ensure any lockfile or CI config referencing the old version is updated to
match.

In @scripts/cleanup_skills.py:
- Around line 17-20: The script creates LocalDatabase and LocalSkillManager
without running migrations; after instantiating LocalDatabase (symbol:
LocalDatabase) call run_migrations(db) before constructing LocalSkillManager
(symbol: LocalSkillManager) so the schema is applied; update the script to
import or reference run_migrations and invoke run_migrations(db) right after db
= LocalDatabase() and before manager = LocalSkillManager(db=db).

In @src/gobby/autonomous/stuck_detector.py:
- Around line 123-136: The INSERT uses str(context) which stores Python reprs;
change it to json.dumps(context) (and add/import json) so context is stored as
valid JSON in the task_selection_history row, and in get_selection_history
replace ast.literal_eval fallback by using json.loads only: treat NULL/None as
None, wrap json.loads in a try/except to catch JSONDecodeError and log/handle
gracefully, and remove use of ast.literal_eval to avoid security risks; refer to
the insert block around the task_selection_history INSERT and the
get_selection_history function for where to change code and add error handling.

In @src/gobby/utils/machine_id.py:
- Line 118: Remove the unnecessary str() conversion around machineid.id() in the
function that returns the machine id in machine_id.py: replace the return that
currently wraps machineid.id() with a direct return of machineid.id() (i.e.,
drop str()), leaving other conversions for uuid.uuid4() untouched; this keeps
the return type consistent and avoids redundant conversion.

🧹 Nitpick comments (18)

docs/plans/todo-list.md (1)

1-7: Use Gobby tasks instead of markdown TODO lists.

According to team conventions documented in prior reviews, markdown TODO lists should not be created; use Gobby tasks instead for tracking planned work.

Please convert these items to Gobby tasks and remove this file, or confirm if this markdown file is intended for a different purpose (e.g., archived planning notes).
src/gobby/sessions/transcripts/claude.py (1)
211-213: Duplicate comment on lines 211-212.

The comment # Check for /clear command marker appears twice consecutively.
Remove duplicate comment
-        # Check for /clear command marker
         # Check for /clear command marker
         return "<command-name>/clear</command-name>" in str(content)
src/gobby/cli/daemon.py (1)
161-168: Add type annotation for consistency with line 270.

For consistency and to align with strict mypy requirements, this status_kwargs should have the same type annotation as the one at line 270.
📝 Suggested type annotation
-        status_kwargs = {
+        status_kwargs: dict[str, Any] = {
             "running": daemon_healthy,
             "pid": process.pid,
             "pid_file": str(pid_file),
             "log_files": str(log_file.parent),
             "http_port": http_port,
             "websocket_port": ws_port,
         }
src/gobby/sessions/summary.py (1)

395-426: Consider consolidating duplicate implementations across the codebase.

The type annotation update is correct. However, this method has nearly identical implementations in multiple files:

src/gobby/cli/sessions.py (lines 27-47)

src/gobby/mcp_proxy/tools/session_messages.py (lines 100-120)

The implementations are ~90% identical, with only this file including "thinking" block handling. Consider extracting this to a shared utility module to eliminate duplication and ensure consistency.

Based on relevant code snippets.

♻️ Consolidation approach

Create a shared utility module (e.g., src/gobby/sessions/formatting.py) and move this implementation there, then import and use it across all locations. The src/gobby/workflows/actions.py already delegates to a format_turns_for_llm function, which suggests this pattern is already partially established.
src/gobby/servers/routes/mcp.py (2)
243-250: Consider documenting the project_id resolution behavior

When project_id is not provided, _resolve_project_id uses the HTTP server's current working directory to find .gobby/project.json. In a server context, this directory is where the daemon was started, which may not be intuitive for API consumers.

The current implementation handles this gracefully by skipping metrics enrichment if resolution fails. However, the docstring could clarify that:

Metrics enrichment requires either an explicit project_id parameter, or

The server must be running from within a Gobby project directory
📝 Suggested docstring enhancement
     """
     List tools from MCP servers.
 
     Args:
         server_filter: Optional server name to filter by
         include_metrics: When True, include call_count, success_rate, avg_latency for each tool
-        project_id: Project ID for metrics lookup (uses current project if not specified)
+        project_id: Project ID for metrics lookup. If not provided, attempts to infer from
+                   the server's working directory. Metrics enrichment is skipped if no
+                   project context can be determined.
 
     Returns:
         Dict of server names to tool lists
     """
1212-1213: Minor inconsistency in attribute access pattern

Line 1212 uses direct attribute access with a type: ignore comment, while line 1213 uses getattr() for the same object. For consistency and to avoid the type: ignore directive, consider using getattr() for both:
                    {
-                       "name": t.name,  # type: ignore[attr-defined]
+                       "name": getattr(t, "name", ""),
                        "description": getattr(t, "description", ""),
                        "inputSchema": schema,
                    }
This assumes name could potentially be missing (requiring a default), but if name is guaranteed to exist and an empty string default would be invalid, the current approach with type: ignore is acceptable for strict mypy compliance with external SDK types.
src/gobby/workflows/templates.py (2)
31-39: Prefer bare raise to preserve the original traceback.

The str() cast is fine for explicit typing. However, raise e re-raises the exception but may alter the traceback in some edge cases. Using bare raise is the idiomatic pattern.
♻️ Suggested change
         except Exception as e:
             logger.error(f"Error rendering template: {e}", exc_info=True)
-            # Fallback to original string or raise?
-            # For workflows, it might be better to fail typically, but let's return error message in string for visibility if strict validation isn't on.
-            # actually, better to raise so the action fails and handles it.
-            raise e
+            raise
45-50: Same suggestion: prefer bare raise.

The str() cast is appropriate. Consider using bare raise instead of raise e for consistency with Python best practices.
♻️ Suggested change
         except Exception as e:
             logger.error(f"Error rendering template file '{template_name}': {e}", exc_info=True)
-            raise e
+            raise
src/gobby/cli/workflows.py (1)

193-195: Consider type guard pattern instead of bare assert for session_id.

The assert session_id is not None statements are used for mypy type narrowing after the conditional branch guarantees session_id is set. While this works, bare assert statements can be stripped when Python runs with -O (optimize) flag, potentially causing issues in production.

Since the preceding logic already raises SystemExit(1) when no session is found, the assert is effectively unreachable in normal flow and serves only as a type guard for mypy. This is an acceptable pattern for internal CLI code, but for consistency you could consider an explicit check or a typing.cast.
src/gobby/mcp_proxy/tools/session_messages.py (1)
101-121: Code duplication with format_turns_for_llm in summary_actions.py.

This helper duplicates logic from src/gobby/sessions/summary.py (lines 394-425) and src/gobby/workflows/summary_actions.py. Notably, this version omits handling for "thinking" blocks that the summary.py version includes:
# In summary.py but missing here:
elif block.get("type") == "thinking":
    text_parts.append(f"[Thinking: {block.get('thinking', '')}]")
Consider either:

Consolidating into a shared utility function, or

Adding the missing "thinking" block handling for consistency
♻️ Add missing thinking block handling
 def _format_turns_for_llm(turns: list[dict[str, Any]]) -> str:
     """Format transcript turns for LLM analysis."""
     formatted: list[str] = []
     for i, turn in enumerate(turns):
         message = turn.get("message", {})
         role = message.get("role", "unknown")
         content = message.get("content", "")
 
         if isinstance(content, list):
             text_parts: list[str] = []
             for block in content:
                 if isinstance(block, dict):
                     if block.get("type") == "text":
                         text_parts.append(str(block.get("text", "")))
+                    elif block.get("type") == "thinking":
+                        text_parts.append(f"[Thinking: {block.get('thinking', '')}]")
                     elif block.get("type") == "tool_use":
                         text_parts.append(f"[Tool: {block.get('name', 'unknown')}]")
             content = " ".join(text_parts)
 
         formatted.append(f"[Turn {i + 1} - {role}]: {content}")
 
     return "\n\n".join(formatted)
scripts/cleanup_skills.py (1)

13-13: Async function performs no async operations.

cleanup_db is declared async but calls only synchronous methods (list_skills, delete_skill, input). Consider making it a regular function, or if async is intentional for future use, add a comment.
src/gobby/servers/routes/sessions.py (1)
427-434: Consider extracting stop_registry access.

The hook_manager/stop_registry validation is repeated across all three endpoints. A helper function could reduce duplication:
Optional helper extraction
def _get_stop_registry(request: Request) -> "StopRegistry":
    """Get stop registry from request state, raising 503 if unavailable."""
    if not hasattr(request.app.state, "hook_manager"):
        raise HTTPException(status_code=503, detail="Hook manager not available")
    hook_manager = request.app.state.hook_manager
    if not hasattr(hook_manager, "_stop_registry") or not hook_manager._stop_registry:
        raise HTTPException(status_code=503, detail="Stop registry not available")
    return hook_manager._stop_registry
Also applies to: 487-494, 542-549
src/gobby/servers/websocket.py (1)

434-502: Minor documentation inconsistency: signal_id is actually the session_id.

The docstring at line 450 documents "signal_id": "uuid" suggesting a unique signal identifier, but the implementation returns signal.session_id (lines 483, 495). Since StopSignal is keyed by session (one pending signal per session), this is functionally correct but potentially confusing for API consumers expecting a distinct signal identifier.

Consider either:

Renaming the response field to clarify it's the session_id, or

Updating the docstring to reflect the actual semantics
src/gobby/workflows/actions.py (1)
1117-1150: Consider moving asyncio import to module level.

The import asyncio at line 1127 is inside the method body. Since this file already uses async patterns extensively, moving the import to the module level would be more idiomatic and avoid repeated import overhead on each call.

The fire-and-forget pattern with add_done_callback for silent error logging is appropriate for non-critical broadcasts.
♻️ Suggested change

Move the import to the top of the file with other imports:
import asyncio  # at module level, around line 1
Then remove line 1127.
src/gobby/autonomous/stop_registry.py (1)

203-242: list_pending lacks lock protection for consistency.

Unlike signal_stop, acknowledge, clear, and cleanup_stale, the list_pending method doesn't acquire self._lock. While individual SQLite reads are atomic, if consistency with concurrent writes is needed during enumeration, consider adding locking. For read-only listing operations, this is typically acceptable, but worth noting for completeness.

src/gobby/cli/sessions.py (1)

28-48: Code duplication: _format_turns_for_llm exists in multiple modules.

This helper function appears to be duplicated across:

src/gobby/sessions/summary.py

src/gobby/mcp_proxy/tools/session_messages.py

Here in src/gobby/cli/sessions.py

Consider extracting this to a shared utility module to maintain consistency and reduce maintenance burden.

src/gobby/workflows/evaluator.py (1)

304-323: Webhook conditions require pre-evaluation - document the contract.

The comment at lines 305-306 correctly notes that evaluate_webhook_conditions() must be called first, but callers might miss this. Consider:

Adding this requirement to the method docstring of check_exit_conditions

Or raising a more descriptive error when the pre-evaluated result is missing

The current warning log is helpful for debugging but doesn't prevent silent failures in production.
src/gobby/autonomous/stuck_detector.py (1)
257-263: Ensure consistent key generation for tool pattern detection.

Line 262 creates a pattern key by concatenating tool name with tool_args_keys list converted to string. Converting a list directly to a string (e.g., str([]) → '[]') might not produce consistent keys for comparison purposes, especially with nested structures.
♻️ Proposed improvement
         # Count tool call patterns
         tool_counts: dict[str, int] = {}
         for event in recent_events:
             if event.tool_name:
                 # Create a key from tool name and key args
-                key = f"{event.tool_name}:{event.details.get('tool_args_keys', [])}"
+                args_keys = event.details.get('tool_args_keys', [])
+                args_key = json.dumps(sorted(args_keys)) if isinstance(args_keys, list) else str(args_keys)
+                key = f"{event.tool_name}:{args_key}"
                 tool_counts[key] = tool_counts.get(key, 0) + 1
This ensures that lists with the same elements in different orders produce the same key, and handles edge cases more reliably.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49e9cbc and 3286222.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (136)

.gitignore
.gobby/tasks.jsonl
.gobby/tasks_meta.json
docs/plans/todo-list.md
examples/plugins/code_guardian.py
pyproject.toml
scripts/cleanup_skills.py
src/gobby/adapters/base.py
src/gobby/adapters/codex.py
src/gobby/agents/spawners/base.py
src/gobby/autonomous/progress_tracker.py
src/gobby/autonomous/stop_registry.py
src/gobby/autonomous/stuck_detector.py
src/gobby/cli/agents.py
src/gobby/cli/daemon.py
src/gobby/cli/sessions.py
src/gobby/cli/workflows.py
src/gobby/config/persistence.py
src/gobby/hooks/events.py
src/gobby/hooks/plugins.py
src/gobby/llm/claude.py
src/gobby/llm/claude_executor.py
src/gobby/llm/codex.py
src/gobby/llm/gemini_executor.py
src/gobby/mcp_proxy/lazy.py
src/gobby/mcp_proxy/manager.py
src/gobby/mcp_proxy/stdio.py
src/gobby/mcp_proxy/tools/session_messages.py
src/gobby/mcp_proxy/tools/task_readiness.py
src/gobby/mcp_proxy/tools/workflows.py
src/gobby/memory/manager.py
src/gobby/runner.py
src/gobby/servers/http.py
src/gobby/servers/routes/admin.py
src/gobby/servers/routes/mcp.py
src/gobby/servers/routes/sessions.py
src/gobby/servers/websocket.py
src/gobby/sessions/lifecycle.py
src/gobby/sessions/manager.py
src/gobby/sessions/processor.py
src/gobby/sessions/summary.py
src/gobby/sessions/transcripts/base.py
src/gobby/sessions/transcripts/claude.py
src/gobby/sessions/transcripts/codex.py
src/gobby/sessions/transcripts/gemini.py
src/gobby/skills/learner.py
src/gobby/storage/compaction.py
src/gobby/storage/database.py
src/gobby/storage/mcp.py
src/gobby/storage/memories.py
src/gobby/storage/migrations.py
src/gobby/storage/projects.py
src/gobby/storage/sessions.py
src/gobby/storage/skills.py
src/gobby/storage/tasks.py
src/gobby/storage/workflow_audit.py
src/gobby/storage/worktrees.py
src/gobby/sync/memories.py
src/gobby/sync/skills.py
src/gobby/sync/tasks.py
src/gobby/tasks/context.py
src/gobby/tasks/issue_extraction.py
src/gobby/tasks/research.py
src/gobby/tasks/spec_parser.py
src/gobby/tasks/validation.py
src/gobby/utils/json_helpers.py
src/gobby/utils/logging.py
src/gobby/utils/machine_id.py
src/gobby/workflows/actions.py
src/gobby/workflows/engine.py
src/gobby/workflows/evaluator.py
src/gobby/workflows/hooks.py
src/gobby/workflows/loader.py
src/gobby/workflows/state_manager.py
src/gobby/workflows/summary_actions.py
src/gobby/workflows/templates.py
tests/agents/spawners/test_headless_spawner.py
tests/agents/test_context_integration.py
tests/agents/test_spawners.py
tests/agents/test_tty_config.py
tests/cli/test_cli.py
tests/cli/test_tasks_commits.py
tests/cli/test_validation_cli.py
tests/config/test_mcp_config.py
tests/examples/test_code_guardian.py
tests/hooks/test_api_messages.py
tests/hooks/test_plugins.py
tests/hooks/test_webhooks.py
tests/integration/test_agent_execution.py
tests/integration/test_terminal_mode_worktrees.py
tests/integration/test_workflow_tool_filtering.py
tests/llm/test_claude_executor.py
tests/llm/test_executor.py
tests/llm/test_gemini_executor.py
tests/llm/test_litellm_executor.py
tests/llm/test_llm_codex.py
tests/llm/test_llm_litellm.py
tests/llm/test_resolver.py
tests/mcp_proxy/test_fallback_resolver.py
tests/mcp_proxy/test_gobby_daemon_tools.py
tests/mcp_proxy/test_mcp_manager.py
tests/mcp_proxy/test_semantic_search.py
tests/mcp_proxy/test_server_mgmt.py
tests/memory/test_context.py
tests/memory/test_search_benchmark.py
tests/plugins/test_example_notify.py
tests/servers/test_http_coverage.py
tests/servers/test_mcp_routes.py
tests/sessions/test_sessions_processor_integration.py
tests/storage/test_audit_coverage.py
tests/storage/test_storage_database.py
tests/storage/test_storage_migrations.py
tests/storage/test_storage_session_tasks.py
tests/sync/test_skill_sync.py
tests/tasks/test_context.py
tests/tasks/test_context_gatherer.py
tests/tasks/test_enhanced_validator.py
tests/tasks/test_escalation.py
tests/tasks/test_expansion_coverage.py
tests/tasks/test_issue_extraction.py
tests/tasks/test_prompts.py
tests/tasks/test_task_validation.py
tests/tasks/test_validation_history.py
tests/tasks/test_validation_models.py
tests/utils/test_utils_daemon_client.py
tests/utils/test_utils_status.py
tests/workflows/test_artifact_actions.py
tests/workflows/test_compact_handoff.py
tests/workflows/test_context_sources.py
tests/workflows/test_evaluator.py
tests/workflows/test_memory_lifecycle.py
tests/workflows/test_plugin_action_workflow.py
tests/workflows/test_templates.py
tests/workflows/test_webhook_executor.py
tests/workflows/test_webhook_workflow_integration.py
tests/worktrees/test_git.py

✅ Files skipped from review due to trivial changes (9)

src/gobby/storage/worktrees.py
src/gobby/mcp_proxy/tools/workflows.py
src/gobby/mcp_proxy/tools/task_readiness.py
src/gobby/storage/memories.py
src/gobby/sync/tasks.py
src/gobby/servers/routes/admin.py
src/gobby/workflows/state_manager.py
src/gobby/storage/skills.py
src/gobby/mcp_proxy/lazy.py

🚧 Files skipped from review as they are similar to previous changes (5)

.gobby/tasks_meta.json
src/gobby/autonomous/progress_tracker.py
src/gobby/servers/http.py
src/gobby/storage/tasks.py
tests/agents/spawners/test_headless_spawner.py

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

⚙️ CodeRabbit configuration file

src/**/*.py: - Check for proper type hints (project uses strict mypy)

Ensure async functions are properly awaited

Watch for potential security issues (SQL injection, command injection)

Verify proper error handling

Files:

src/gobby/sync/memories.py
src/gobby/llm/gemini_executor.py
src/gobby/workflows/summary_actions.py
src/gobby/sessions/transcripts/claude.py
src/gobby/storage/projects.py
src/gobby/tasks/issue_extraction.py
src/gobby/tasks/validation.py
src/gobby/workflows/engine.py
src/gobby/sessions/processor.py
src/gobby/sessions/transcripts/codex.py
src/gobby/tasks/research.py
src/gobby/sync/skills.py
src/gobby/workflows/actions.py
src/gobby/adapters/base.py
src/gobby/utils/machine_id.py
src/gobby/cli/agents.py
src/gobby/hooks/events.py
src/gobby/agents/spawners/base.py
src/gobby/servers/routes/sessions.py
src/gobby/llm/codex.py
src/gobby/autonomous/stuck_detector.py
src/gobby/utils/logging.py
src/gobby/utils/json_helpers.py
src/gobby/memory/manager.py
src/gobby/servers/routes/mcp.py
src/gobby/storage/compaction.py
src/gobby/cli/workflows.py
src/gobby/mcp_proxy/manager.py
src/gobby/llm/claude.py
src/gobby/tasks/spec_parser.py
src/gobby/servers/websocket.py
src/gobby/workflows/templates.py
src/gobby/runner.py
src/gobby/llm/claude_executor.py
src/gobby/sessions/transcripts/base.py
src/gobby/tasks/context.py
src/gobby/autonomous/stop_registry.py
src/gobby/sessions/lifecycle.py
src/gobby/cli/daemon.py
src/gobby/storage/sessions.py
src/gobby/storage/mcp.py
src/gobby/storage/database.py
src/gobby/hooks/plugins.py
src/gobby/skills/learner.py
src/gobby/workflows/loader.py
src/gobby/config/persistence.py
src/gobby/storage/workflow_audit.py
src/gobby/cli/sessions.py
src/gobby/mcp_proxy/stdio.py
src/gobby/workflows/hooks.py
src/gobby/sessions/summary.py
src/gobby/sessions/transcripts/gemini.py
src/gobby/storage/migrations.py
src/gobby/sessions/manager.py
src/gobby/workflows/evaluator.py
src/gobby/mcp_proxy/tools/session_messages.py
src/gobby/adapters/codex.py

src/gobby/hooks/**

⚙️ CodeRabbit configuration file

src/gobby/hooks/**: - Hook handlers must be idempotent

Check for proper event handling patterns

Ensure plugins follow the HookPlugin interface

Files:

src/gobby/hooks/events.py
src/gobby/hooks/plugins.py

src/gobby/mcp_proxy/**

⚙️ CodeRabbit configuration file

src/gobby/mcp_proxy/**: - This is the MCP (Model Context Protocol) proxy layer

Check for proper tool schema validation

Ensure tools return appropriate response structures

Files:

src/gobby/mcp_proxy/manager.py
src/gobby/mcp_proxy/stdio.py
src/gobby/mcp_proxy/tools/session_messages.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use asyncio with pytest configured with asyncio_mode = 'auto' for automatic async test detection
Use pytest markers slow, integration, and e2e for test categorization

Files:

tests/agents/test_context_integration.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure tests are meaningful, not just coverage padding

Check for proper mocking and fixtures

Verify async tests use pytest-asyncio markers

Files:

tests/agents/test_context_integration.py

🧠 Learnings (5)

📚 Learning: 2025-12-17T19:24:40.567Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Use `gobby-*` prefix for internal server names and route them locally via InternalRegistryManager, while proxying other servers via MCPClientManager

Applied to files:

src/gobby/servers/routes/mcp.py

📚 Learning: 2025-12-17T19:54:30.424Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.424Z
Learning: Do NOT create markdown TODO lists; use Gobby tasks instead

Applied to files:

docs/plans/todo-list.md

📚 Learning: 2025-12-17T19:24:40.567Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Applies to src/storage/database.py : Use SQLite for session, project, task, and MCP server/tool storage with thread-local database connections in `src/storage/database.py`

Applied to files:

src/gobby/storage/database.py

📚 Learning: 2025-12-17T19:24:40.567Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Applies to src/mcp_proxy/server.py : Use FastMCP server implementation for MCP proxy functionality in `src/mcp_proxy/server.py`

Applied to files:

src/gobby/mcp_proxy/stdio.py

📚 Learning: 2025-12-17T19:24:40.567Z

Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Implement MCP tool progressive disclosure with three-step discovery: list_tools() → get_tool_schema() → call_tool()

Applied to files:

src/gobby/mcp_proxy/stdio.py

🧬 Code graph analysis (30)

src/gobby/sessions/transcripts/claude.py (1)

src/gobby/sessions/transcripts/base.py (3)

extract_last_messages (68-81)

extract_turns_since_clear (83-101)

is_session_boundary (103-113)

src/gobby/tasks/issue_extraction.py (3)

src/gobby/tasks/expansion.py (1)

_extract_json (287-289)

src/gobby/mcp_proxy/importer.py (1)

_extract_json (389-409)

src/gobby/tasks/validation_models.py (1)

Issue (30-89)

src/gobby/workflows/engine.py (1)

src/gobby/workflows/definitions.py (1)

WorkflowState (94-130)

src/gobby/sessions/transcripts/codex.py (2)

src/gobby/sessions/transcripts/base.py (3)

extract_last_messages (68-81)

extract_turns_since_clear (83-101)

is_session_boundary (103-113)

src/gobby/sessions/transcripts/claude.py (3)

extract_last_messages (43-82)

extract_turns_since_clear (84-187)

is_session_boundary (189-213)

src/gobby/workflows/actions.py (3)

src/gobby/workflows/autonomous_actions.py (6)

detect_stuck (178-220)

detect_task_loop (142-175)

get_progress_summary (258-286)

record_progress (94-139)

start_progress_tracking (20-49)

stop_progress_tracking (52-91)

src/gobby/workflows/stop_signal_actions.py (3)

check_stop_signal (17-83)

clear_stop_signal (143-163)

request_stop (109-140)

src/gobby/servers/websocket.py (1)

broadcast_autonomous_event (638-670)

src/gobby/adapters/base.py (2)

src/gobby/adapters/codex.py (5)

translate_to_hook_event (911-1006)

translate_to_hook_event (1192-1253)

translate_from_hook_response (1008-1027)

translate_from_hook_response (1255-1273)

handle_native (1275-1292)

src/gobby/hooks/events.py (2)

HookEvent (68-107)

HookResponse (111-136)

src/gobby/autonomous/stuck_detector.py (2)

src/gobby/autonomous/progress_tracker.py (3)

get_summary (234-316)

is_stagnant (318-332)

get_recent_events (417-447)

src/gobby/storage/database.py (2)

execute (63-65)

fetchall (76-79)

src/gobby/memory/manager.py (3)

src/gobby/mcp_proxy/schema_hash.py (1)

get_stats (347-384)

src/gobby/mcp_proxy/semantic_search.py (1)

get_embedding_stats (366-406)

src/gobby/memory/semantic_search.py (1)

get_embedding_stats (361-399)

src/gobby/servers/routes/mcp.py (3)

src/gobby/servers/routes/dependencies.py (1)

get_metrics_manager (115-118)

src/gobby/mcp_proxy/metrics.py (2)

ToolMetricsManager (71-620)

get_metrics (156-217)

src/gobby/servers/http.py (1)

_resolve_project_id (234-270)

src/gobby/storage/compaction.py (1)

src/gobby/memory/manager.py (1)

get_stats (316-350)

src/gobby/cli/workflows.py (2)

src/gobby/agents/runner.py (1)

session_id (137-139)

src/gobby/mcp_proxy/lazy.py (1)

get_state (212-222)

src/gobby/tasks/spec_parser.py (5)

src/gobby/storage/sessions.py (1)

to_dict (71-95)

src/gobby/storage/worktrees.py (1)

to_dict (59-73)

src/gobby/mcp_proxy/metrics.py (1)

to_dict (52-68)

src/gobby/memory/extractor.py (1)

to_dict (47-54)

src/gobby/agents/registry.py (1)

to_dict (71-86)

tests/agents/test_context_integration.py (3)

tests/workflows/test_webhook_workflow_integration.py (1)

mock_session_manager (128-132)

tests/hooks/test_plugins.py (1)

mock_session_manager (639-643)

tests/workflows/test_plugin_action_workflow.py (1)

mock_session_manager (129-133)

scripts/cleanup_skills.py (3)

src/gobby/storage/skills.py (1)

LocalSkillManager (69-243)

src/gobby/storage/database.py (1)

LocalDatabase (22-166)

tests/memory/test_extractor.py (1)

db (15-19)

src/gobby/servers/websocket.py (3)

src/gobby/autonomous/stop_registry.py (1)

signal_stop (60-110)

tests/servers/test_http_server.py (1)

signal_stop (949-954)

tests/servers/test_sessions_routes.py (1)

signal_stop (707-712)

src/gobby/workflows/templates.py (1)

tests/workflows/test_webhook_workflow_integration.py (1)

render (140-147)

src/gobby/sessions/transcripts/base.py (3)

src/gobby/sessions/transcripts/claude.py (3)

extract_last_messages (43-82)

extract_turns_since_clear (84-187)

is_session_boundary (189-213)

src/gobby/sessions/transcripts/codex.py (3)

extract_last_messages (29-41)

extract_turns_since_clear (43-48)

is_session_boundary (50-51)

src/gobby/sessions/transcripts/gemini.py (3)

extract_last_messages (36-59)

extract_turns_since_clear (61-69)

is_session_boundary (71-76)

src/gobby/autonomous/stop_registry.py (1)

src/gobby/storage/database.py (4)

LocalDatabase (22-166)

execute (63-65)

fetchone (71-74)

fetchall (76-79)

src/gobby/storage/sessions.py (1)

src/gobby/storage/database.py (3)

safe_update (81-141)

fetchall (76-79)

fetchone (71-74)

src/gobby/storage/mcp.py (1)

src/gobby/storage/database.py (1)

execute (63-65)

src/gobby/skills/learner.py (2)

src/gobby/llm/codex.py (1)

generate_text (349-376)

src/gobby/llm/claude.py (1)

generate_text (397-455)

src/gobby/cli/sessions.py (3)

src/gobby/mcp_proxy/tools/session_messages.py (2)

_format_turns_for_llm (101-121)

create_handoff (291-520)

src/gobby/sessions/summary.py (1)

_format_turns_for_llm (395-426)

src/gobby/sessions/transcripts/claude.py (3)

ClaudeTranscriptParser (19-335)

extract_turns_since_clear (84-187)

extract_last_messages (43-82)

src/gobby/mcp_proxy/stdio.py (1)

src/gobby/mcp_proxy/tools/internal.py (1)

tool (75-140)

src/gobby/workflows/hooks.py (1)

src/gobby/hooks/events.py (1)

HookEvent (68-107)

src/gobby/sessions/summary.py (3)

src/gobby/cli/sessions.py (1)

_format_turns_for_llm (28-48)

src/gobby/mcp_proxy/tools/session_messages.py (1)

_format_turns_for_llm (101-121)

src/gobby/workflows/actions.py (1)

_format_turns_for_llm (836-838)

examples/plugins/code_guardian.py (1)

src/gobby/hooks/events.py (2)

HookEvent (68-107)

HookResponse (111-136)

src/gobby/sessions/transcripts/gemini.py (3)

src/gobby/sessions/transcripts/base.py (3)

extract_last_messages (68-81)

extract_turns_since_clear (83-101)

is_session_boundary (103-113)

src/gobby/sessions/transcripts/claude.py (3)

extract_last_messages (43-82)

extract_turns_since_clear (84-187)

is_session_boundary (189-213)

src/gobby/sessions/transcripts/codex.py (3)

extract_last_messages (29-41)

extract_turns_since_clear (43-48)

is_session_boundary (50-51)

src/gobby/workflows/evaluator.py (2)

src/gobby/workflows/webhook_executor.py (2)

WebhookExecutor (58-399)

json_body (34-46)

src/gobby/autonomous/stop_registry.py (1)

has_pending_signal (143-153)

src/gobby/mcp_proxy/tools/session_messages.py (2)

src/gobby/sessions/summary.py (1)

_format_turns_for_llm (395-426)

src/gobby/sessions/transcripts/claude.py (3)

ClaudeTranscriptParser (19-335)

extract_turns_since_clear (84-187)

extract_last_messages (43-82)

src/gobby/adapters/codex.py (5)

src/gobby/hooks/events.py (1)

HookEvent (68-107)

src/gobby/adapters/base.py (2)

translate_to_hook_event (34-47)

handle_native (64-93)

src/gobby/adapters/claude_code.py (2)

translate_to_hook_event (72-116)

handle_native (193-220)

src/gobby/adapters/gemini.py (2)

translate_to_hook_event (138-214)

handle_native (275-302)

src/gobby/hooks/hook_manager.py (1)

HookManager (72-827)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test (Python 3.12)
GitHub Check: Test (Python 3.11)
GitHub Check: claude-review
GitHub Check: Test (Python 3.12)
GitHub Check: Test (Python 3.11)

coderabbitai · 2026-01-08T17:28:19Z

 [dependency-groups]
 dev = [
    "bandit>=1.8.0",
+    "black>=24.0.0",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

latest Black Python formatter version January 2026

💡 Result:

As of January 8, 2026:

Stable release: black 25.12.0. [1]

Preview / pre-release (2026 style): black 26.1a1 (pre-release). [2]

Sources: GitHub releases for psf/black [1], release notes for 26.1a1 preview [2].

Update the Black version constraint to a more recent stable version.

Black 24.0.0 is outdated; the latest stable version as of January 2026 is 25.12.0. Update the constraint to black>=25.0.0 or pin to black>=25.12.0 to benefit from recent bug fixes and improvements.

🤖 Prompt for AI Agents

In @pyproject.toml at line 141, The pyproject dependency line currently pins Black as "black>=24.0.0"; update this constraint to a more recent stable release by replacing that token with either "black>=25.0.0" (looser) or "black>=25.12.0" (recommended/pinned) so the project uses the latest fixes and improvements; ensure any lockfile or CI config referencing the old version is updated to match.

coderabbitai · 2026-01-08T17:28:19Z

+        from gobby.storage.database import LocalDatabase
+
+        db = LocalDatabase()
+        manager = LocalSkillManager(db=db)


⚠️ Potential issue | 🟠 Major

Missing database migrations before use.

The script instantiates LocalDatabase and LocalSkillManager without running migrations. Per the test fixture pattern in tests/memory/test_extractor.py, run_migrations(db) should be called after database initialization to ensure the schema is up-to-date.

Proposed fix

from gobby.storage.database import LocalDatabase +from gobby.storage.migrations import run_migrations db = LocalDatabase() +run_migrations(db) manager = LocalSkillManager(db=db)

🤖 Prompt for AI Agents

In @scripts/cleanup_skills.py around lines 17 - 20, The script creates LocalDatabase and LocalSkillManager without running migrations; after instantiating LocalDatabase (symbol: LocalDatabase) call run_migrations(db) before constructing LocalSkillManager (symbol: LocalSkillManager) so the schema is applied; update the script to import or reference run_migrations and invoke run_migrations(db) right after db = LocalDatabase() and before manager = LocalSkillManager(db=db).

coderabbitai · 2026-01-08T17:28:20Z

+        with self._lock:
+            self.db.execute(
+                """
+                INSERT INTO task_selection_history (
+                    session_id, task_id, selected_at, context
+                ) VALUES (?, ?, ?, ?)
+                """,
+                (
+                    session_id,
+                    task_id,
+                    now.isoformat(),
+                    str(context) if context else None,
+                ),
+            )


⚠️ Potential issue | 🟡 Minor

Use JSON serialization for context instead of Python string representation.

Line 134 uses str(context) to serialize the context dictionary, which produces Python's repr format (e.g., {'key': 'value'}). This creates a mismatch with the deserialization logic in get_selection_history (lines 366-370), which first attempts ast.literal_eval and then falls back to json.loads.

Using str() is fragile and can fail with complex nested structures. Additionally, ast.literal_eval poses a security concern if the database could contain untrusted data.

🔧 Proposed fix

self.db.execute( """ INSERT INTO task_selection_history ( session_id, task_id, selected_at, context ) VALUES (?, ?, ?, ?) """, ( session_id, task_id, now.isoformat(), - str(context) if context else None, + json.dumps(context) if context else None, ), )

Then simplify the deserialization in get_selection_history:

context = None if row["context"]: try: - context = ast.literal_eval(row["context"]) - except (ValueError, SyntaxError): - try: - context = json.loads(row["context"]) - except json.JSONDecodeError: - logger.warning( - f"Failed to parse context for task selection: {row['context'][:100]}" - ) - context = None + context = json.loads(row["context"]) + except json.JSONDecodeError: + logger.warning( + f"Failed to parse context for task selection: {row['context'][:100]}" + ) + context = None

Based on coding guidelines, ensure proper error handling and avoid security risks.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In @src/gobby/autonomous/stuck_detector.py around lines 123 - 136, The INSERT uses str(context) which stores Python reprs; change it to json.dumps(context) (and add/import json) so context is stored as valid JSON in the task_selection_history row, and in get_selection_history replace ast.literal_eval fallback by using json.loads only: treat NULL/None as None, wrap json.loads in a try/except to catch JSONDecodeError and log/handle gracefully, and remove use of ast.literal_eval to avoid security risks; refer to the insert block around the task_selection_history INSERT and the get_selection_history function for where to change code and add error handling.

coderabbitai · 2026-01-08T17:28:20Z

        import machineid

-        return machineid.id()
+        return str(machineid.id())


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's view the complete file context around line 118 cat -n src/gobby/utils/machine_id.py | sed -n '110,135p'

Repository: GobbyAI/gobby

Length of output: 964

🏁 Script executed:

# Check if there's a requirements file or setup.py that specifies the version fd -e "requirements*.txt" -e "setup.py" -e "pyproject.toml" -e "poetry.lock" | head -10 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'

Repository: GobbyAI/gobby

Length of output: 39

🌐 Web query:

py-machineid library machineid.id() return type documentation

💡 Result:

machineid.id() returns a string (the raw machine GUID). See the py-machineid README and PyPI docs showing id() -> str and notes about the optional winregistry kwarg. [1][2]

Sources

py-machineid GitHub README. [1]

py-machineid on PyPI. [2]

Remove redundant str() conversion on line 118.

The py-machineid library's machineid.id() function already returns a string. The str() wrapper is unnecessary and creates an inconsistency with the intent of similar conversions elsewhere in the code (lines 121, 125, which correctly convert uuid.uuid4() objects that require string conversion).

Replace:

return str(machineid.id())

With:

return machineid.id()

🤖 Prompt for AI Agents

In @src/gobby/utils/machine_id.py at line 118, Remove the unnecessary str() conversion around machineid.id() in the function that returns the machine id in machine_id.py: replace the return that currently wraps machineid.id() with a direct return of machineid.id() (i.e., drop str()), leaving other conversions for uuid.uuid4() untouched; this keeps the return type consistent and avoids redundant conversion.

- Fix cleanup_stale time calculation in stop_registry.py - Fix stuck_detector.py time comparison in detect_task_loop - Fix mcp.py metrics enrichment error handling and tool guards - Fix websocket.py StopSignal attribute references - Move collections import to module level in learner.py - Add Windows skip marker to test_headless_spawner.py - Add integration marker to TestMacOSIntegration - Add integration/slow markers to TestEmbeddedSpawnerUnix - Fix test_cli_daemon.py HOME isolation - Add integration marker to TestExecutorCreationWithConfig - Rename misleading test_concurrent_read_simulation test - Add assertion for clear_session not called in test - Update test_session_actions.py docstring - Fix test_webhook_condition.py assertion for error list

claude · 2026-01-08T17:36:27Z

PR Review: v0.1.1

Thank you for this comprehensive release! This is a major PR (88K additions, 268 files) merging significant autonomous session features from dev to main. Here is my detailed review:

🎯 Summary

This PR introduces autonomous workflow capabilities, including progress tracking, stuck detection, graceful stop signals, and enhanced handoff generation. The features are well-architected and thoroughly tested.

✅ Strengths

1. Excellent Code Quality

Comprehensive test coverage: 1,474 lines of tests just for the autonomous module
Thread-safe implementations: All three autonomous modules use proper locking
Type safety: Proper use of dataclasses, type hints, and enums
Clear separation of concerns: Progress tracking, stop signals, and stuck detection are cleanly separated

2. Strong Architecture

Multi-layer stuck detection (lines 45-63 in stuck_detector.py):
- Layer 1: Task selection loops
- Layer 2: Progress stagnation
- Layer 3: Tool call patterns
Configurable thresholds: All detection limits are parameterized
Database-backed persistence: Proper use of SQLite with migrations

3. Robust Database Migrations

Progressive schema evolution (migrations 1-26+)
Proper use of PRAGMA foreign_keys for safe table alterations
Orphaned data handling with fallback project IDs

4. Security Considerations

JSON parsing is mostly safe (controlled inputs)
Thread safety throughout critical sections
Proper cleanup methods to prevent data leaks

⚠️ Issues & Concerns

1. Critical: Potential Security Risk

Location: stuck_detector.py:366-369

Issue: Using ast.literal_eval() on database-stored strings is risky. While safer than eval(), it can still fail or be exploited if the context field is corrupted.

Current code:

context = ast.literal_eval(row["context"])

Recommendation: Use JSON consistently:

try:
    context = json.loads(row["context"]) if row["context"] else None
except json.JSONDecodeError:
    logger.warning(f"Invalid JSON context")
    context = None

2. Performance: Missing Composite Index

Location: Migration for loop_progress table

Issue: The loop_progress table queries frequently filter by session_id and is_high_value together (progress_tracker.py:262-270), but there is no composite index.

Recommendation: Add this index:

CREATE INDEX idx_loop_progress_session_high_value 
ON loop_progress(session_id, is_high_value);

3. Bug: Time Comparison Type Mismatch

Location: stuck_detector.py:157

Issue: String comparison of ISO8601 timestamps works most of the time, but can fail with timezone edge cases:

cutoff = (datetime.now(UTC) - timedelta(hours=1)).isoformat()
# Later: WHERE selected_at > ?

Recommendation: Use explicit datetime comparison:

AND datetime(selected_at) > datetime(?)

4. Code Smell: Hardcoded Thresholds

Location: progress_tracker.py:101-105

Issue: Critical autonomous behavior thresholds are hardcoded constants instead of being configurable.

Recommendation: Move to config file support.

5. Memory Leak Risk: No Automatic Cleanup

Issue: The autonomous tables (loop_progress, task_selection_history, session_stop_signals) grow indefinitely without automatic cleanup.

Recommendation: Implement periodic cleanup similar to cleanup_stale in stop_registry.py:244.

🔍 Minor Issues

6. Testing Gaps

Missing edge case: concurrent signal_stop + acknowledge race condition
No migration rollback tests
Database corruption recovery scenarios

7. Documentation

New tables lack inline schema comments
Some lines exceed 100 characters

8. Type Safety

progress_tracker.py:444: json.loads() without try/except (marked as safe, but risky)
Consider using TypedDict for details dictionaries

🎓 Best Practices Observed

✅ Excellent exception handling in test cleanup
✅ Proper use of threading locks
✅ Comprehensive docstrings with Args/Returns
✅ Integration tests verifying cross-module behavior
✅ Backward-compatible migrations

📊 Test Coverage

225 test files total
1,474 lines of autonomous tests alone
Thread safety tests included
Integration tests covering workflow scenarios

Estimated coverage: Greater than 80 percent (meets project threshold)

🚀 Recommendations

Before Merge (Critical):

Fix ast.literal_eval() security issue in stuck_detector.py:366
Add composite index for loop_progress queries

Follow-up PRs (Medium Priority):

Add autonomous config to config.yaml
Implement automatic cleanup for autonomous tables
Add migration rollback tests

✅ Approval Recommendation

Approve with minor fixes. This is excellent work overall. The autonomous features are well-designed and thoroughly tested. Address the critical issues above before merging.

🎉 Overall Assessment

Score: 9/10

This is a substantial, well-engineered feature addition. The multi-layer stuck detection is particularly clever, and the thread-safety throughout is commendable. The test coverage is outstanding. Great work!

Review completed

- Updated delete command to accept multiple task refs like enrich - Supports: #1,#2,#3 or #1 #2 #3 formats - Added --yes/-y flag to skip confirmation for batch operations - Shows summary when deleting multiple tasks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…oped session refs - Add migration 76 to change sessions.seq_num index from global to project-scoped - Add migration 77 to re-backfill session seq_num values per project - Update seq_num assignment in sessions.py to be per-project - Update resolve_session_reference() to accept optional project_id parameter - Update both BASELINE_SCHEMA and BASELINE_SCHEMA_V2 for new databases This enables different projects to have independent session numbering (#1, #2, etc.) matching how tasks already work. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- clones.py: Wrap merge_branch in try/finally so temp ref cleanup (branch -D) runs even if merge raises (#16) - migrations.py: Make migration 147 idempotent with PRAGMA table_info check before ALTER TABLE ADD COLUMN (#25) - pipelines.py: Escape SQL wildcards (%, _) in LIKE patterns with ESCAPE clause (#26) - pipeline_executor.py: Namespace child pipeline outputs under "output" key to prevent clobbering wrapper keys like "status" (#28) - pipeline_heartbeat.py: Catch exceptions in _has_alive_agents to prevent killing the heartbeat loop (#29) - lifecycle_monitor.py: Log swallowed exceptions in orphan completion instead of bare except/pass (#2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The post-#12138 tests/workflows/test_pipeline_executor.py hit 3,164 lines, violating CLAUDE.md rule #2 (files must stay under 1,000 lines). Split into seven focused modules with shared fixtures lifted into tests/workflows/conftest.py: - test_pipeline_executor_core.py (427 lines) — Init, Execute, StepExecution, ExecStep, PromptStep - test_pipeline_executor_nested.py (534 lines) — nested pipelines, dict-form invoke_pipeline, depth/cycle - test_pipeline_executor_conditions.py (252 lines) — condition evaluation, default input merging - test_pipeline_executor_approvals.py (657 lines) — approval gate, approve/reject - test_pipeline_executor_child_session.py (565 lines) — pipeline child session, MCP session resolution, parent_session_id - test_pipeline_executor_outputs.py (423 lines) — build_outputs, coerce, emit_event, notify_completion, close_session - test_pipeline_executor_errors.py (179 lines) — execute() error paths, wait step Fixture sharing: mock_db, mock_execution_manager, mock_llm_service, mock_template_engine, mock_webhook_notifier, simple_pipeline, pipeline_with_prompt, pipeline_with_inputs are now in tests/workflows/conftest.py. Other workflow test modules (test_mcp_step, test_pipeline_resume, test_summary_actions, test_webhook_executor) define module-local overrides for these names — pytest fixture resolution keeps those isolated. Verified all still pass. Behavior-preserving: 118 tests (same count as post-#12138+#12211 monolith), 0 tests added or removed. One shell command rewritten (exec_step_captures_stderr) from python -c to sh -c to avoid escaped-quote JSON-proxy issues — same stderr capture semantics, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… callers New shared primitive at web/src/components/ui/SegmentedControl.tsx, next to the existing Switch primitive. Generic over a string-union value type, full radiogroup keyboard nav (ArrowLeft/Right wrap, Home jumps to first, End jumps to last), per-option onClick escape hatch for callers that need a side effect alongside state mutation, and locked-in subdued bg-accent/15 text-accent active styling — saturated bg-accent fill is intentionally not exposed as a variant because that treatment was a Pass 3 anti-pattern, not a useful option. Two sizes: sm (~23px, default, text-xs px-2 py-1) and md (~28px, text-sm px-2.5 py-1.5). Wider variants can be added when a third size has a real caller, not preemptively. Test suite at web/src/components/ui/__tests__/SegmentedControl.test.tsx covers click, per-option onClick co-firing, ArrowLeft/Right wrap, Home/End, disabled state, and an active-class regression guard so a future "let's add a saturated variant" PR can't slip through unnoticed. Migrated four simple callers, dropping ~50 lines of duplicated class chains and hand-rolled keyboard plumbing: - chat/ModeSelector.tsx — Plan/Act/Auto, was already subdued, now keyboard-navigable for free. - dashboard/TimeRangePills.tsx — was using saturated bg-accent fill, now demoted to subdued tint (fixes accent-violation #1). - activity/PipelinesTab.tsx — was using saturated bg-accent + a 30+ line hand-rolled roving-tabindex implementation. Now subdued + nav comes from the primitive (fixes #2 + drops the keyboard helpers). - workflows/ReportsPage.tsx — was saturated, now subdued (fixes #3). ProjectSelector and the SessionsTab Live/Expired select land in the next commit. The activity-panel header layout fix lands in commit 3.

…imit Pull search, indexing/lifecycle, and cross-reference logic out of the 1,702-line MemoryManager god-class into three focused services under src/gobby/memory/services/. MemoryManager keeps every public method as a thin facade wired through the new services, so all external callers and test surfaces (including the static MemoryManager._rrf_merge / _rrf_scores helpers and the patched _fetch_all_project_memories / _search_graph_for_memories hooks) keep working. CrossrefRebuildError moves to services/crossref.py and is re-exported from manager for backward compatibility. The huge search_memories body — formerly 312 lines with an inline build_results closure — is now split into _search_with_graph, _search_qdrant_fts, and a shared _build_results method on SearchService. manager.py drops from 1,702 to 920 lines, well under the 1,000-line monolith threshold from CLAUDE.md guideline #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joshwilhelmi added 30 commits January 7, 2026 15:48

[gt-c23ff1] feat: add tool_name parameter to reset_metrics

98c960d

- Add tool_name parameter to ToolMetricsManager.reset_metrics() - Add tool_name parameter to reset_metrics MCP tool - Allows resetting metrics for a specific tool, not just server-wide

[gt-c23ff1] feat: add reset_tool_metrics admin MCP tool

7b9ad92

- Add new reset_tool_metrics(server_name, tool_name) MCP tool - Provides focused interface for resetting specific tool metrics - Complements existing reset_metrics() which accepts project_id

[gt-42d58e] fix: rename test_git.py to avoid import collision

f1e80ed

Renamed tests/utils/test_git.py to tests/utils/test_utils_git.py to avoid conflict with tests/worktrees/test_git.py

[gt-1b9bd4] chore: simplify ROADMAP.md sprint tables

eb9e6c1

- Remove sprint numbering from completed sprints table - Update POST_MVP_ENHANCEMENTS reference to docs/plans/enhancements.md - Simplify completed sprints to just Focus and Plan Reference columns

joshwilhelmi added 5 commits January 8, 2026 09:13

[gt-dce99b] fix: resolve additional pre-commit linting errors

49e9cbc

- Fix E741 ambiguous variable names (l -> line) - Fix E402 imports not at top of file (move imports) - Fix B007 unused loop control variables (_index, _name)

[gt-64df15] fix: rename shadowed loop variable to fix mypy type error

21252b7

Renamed 'f' to 'file_path' in sessions.py to avoid shadowing the file handle variable from earlier in the file, which was causing a mypy type mismatch error.

chore: formatting, strict type fixes, and ci prep

315e1a3

fix: ruff import order and formatting

8d608ea

joshwilhelmi requested a review from Copilot January 8, 2026 16:01

joshwilhelmi self-assigned this Jan 8, 2026

Copilot AI reviewed Jan 8, 2026

View reviewed changes

coderabbitai Bot reviewed Jan 8, 2026

View reviewed changes

joshwilhelmi added 6 commits January 8, 2026 10:54

Fix remaining code issues and test failures

d5356fd

fix: resolve remaining lint, type, and test issues

9637faf

style: apply pre-commit formatting fixes

e02b26c

fix: resolve remaining mypy errors

5529f40

Fix documentation lint errors (GEMINI.md, ROADMAP.md) [gt-fc4070] [gt…

cdd0fd2

…-2321c7]

coderabbitai Bot reviewed Jan 8, 2026

View reviewed changes

joshwilhelmi merged commit 2ddc509 into main Jan 8, 2026
3 of 14 checks passed

coderabbitai Bot mentioned this pull request May 18, 2026

0.4.6 release #15

Merged

3 tasks

coderabbitai Bot mentioned this pull request May 26, 2026

Release v0.4.9 #18

Merged

Conversation

joshwilhelmi commented Jan 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

.coderabbit.yaml has unrecognized properties

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Jan 8, 2026

PR Review: v0.1.1

Summary

Code Quality and Best Practices

Strengths

Areas for Improvement

Potential Bugs

Performance Considerations

Security Concerns

Test Coverage

Architecture and Design

Recommendations Summary

High Priority

Medium Priority

Low Priority

Conclusion

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Jan 8, 2026

PR Review: v0.1.1

joshwilhelmi commented Jan 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 8, 2026 •

edited

Loading

`.coderabbit.yaml` has unrecognized properties