v0.1.1#2
Conversation
- Add language specifier to code block in tasks.md - Move lifecycle workflow validation before session creation in runner.py - Add empty command validation in embedded.py spawner - Use full app_path instead of app_name in macos.py spawner - Fix PowerShell command injection in windows.py spawner - Skip None values in verification_dict loop in init.py - Fix hook detection to only check GOBBY_HOOK_START marker - Fix xargs filename handling for spaces in git_hooks.py - Add configurable limit parameter to reregister_active_sessions - Log JSONDecodeError with context in codex_executor.py - Convert search_messages to async in session_messages.py - Add direction validation in task_sync.py - Move cast and WorkflowLoader imports to module level in worktrees.py - Use Literal type for provider parameter in worktrees.py - Remove unreachable if-check in tasks.py - Move auto-transition inside transaction in tasks.py - Add robust func_name extraction with regex in expansion.py - Catch ValueError in validate_workflow_for_agent in loader.py
… status - Update Sprint 10, 12, 21, 22, 30 statuses - Update Milestones 5, 7, 8, 12 with completion details - Fix terminology: Phase-based → Step-based workflow enforcement - Document remaining Task V2 items (git hooks, CLI commands) - Document remaining Worktree items (merge resolution)
All hook events are integrated with workflows: - on_session_start, on_session_end, on_stop, on_pre_compact - on_before_tool (aliased from on_tool_call) - on_after_tool (aliased from on_tool_result) - on_before_agent (aliased from on_prompt_submit) TRIGGER_ALIASES in engine.py provides backward compatibility for legacy names.
All Session Message Tracking sprints are fully implemented: - 7.1: Database schema, LocalSessionMessageManager, ParsedMessage - 7.2: SessionMessageProcessor with byte-offset polling - 7.3: Runner/HookManager integration, WebSocket broadcasting
…ling - Clarify schema descriptions: skip_validation for bypassing LLM validation, no_commit_needed only for tasks with no code changes - Add summarize_diff_for_validation() to ensure all file names are visible in large diffs by prioritizing file list over full content - Add is_doc_only_diff() to detect documentation-only changes - Auto-skip LLM validation for doc-only changes (.md, .txt, .rst, etc.) - Add tests for new functions
The gobby tasks hooks command duplicated functionality already in cli/installers/git_hooks.py with an inferior implementation (no hook chaining, no backups, no pre-commit framework support). - Delete src/gobby/cli/tasks/hooks.py - Remove hooks_cmd from main.py registration - Update docs to reference gobby install instead - Mark git hook integration complete in ROADMAP.md
Mark session chaining and task-driven work loops as complete: - autonomous-loop.yaml workflow exists and is functional - autonomous-task.yaml provides task-driven execution - start_new_session action implemented in session_actions.py Remaining work: multi-surface stop signals, progress tracking with stuck detection
…orkflows' Consistent with other CLI commands (tasks, sessions, agents) and the MCP server naming (gobby-workflows). Updated: - src/gobby/cli/workflows.py: Renamed function workflow() to workflows() - src/gobby/cli/__init__.py: Updated import and add_command - CLAUDE.md, AGENTS.md, ROADMAP.md: Updated documentation - docs/guides/workflows.md: Updated CLI reference - docs/architecture/cli-commands.md: Updated command reference - docs/plans/completed/WORKFLOWS.md: Updated historical docs - docs/examples/workflows/*.yaml, README.md: Updated examples
- Add get_failing_tools() to ToolMetricsManager with failure rate query - Add get_failing_tools MCP tool to gobby-metrics server - Add include_metrics parameter to list_tools HTTP endpoint - Add get_metrics_manager dependency for FastAPI routes - Update ROADMAP.md to mark Sprints 12-15 as complete All MCP Proxy Enhancement milestones are now complete: - Sprint 12: Tool Metrics (get_failing_tools, include_metrics) - Sprint 13: Lazy Server Init (already implemented) - Sprint 14: Semantic Tool Search (already implemented) - Sprint 15: Self-Healing & Incremental Indexing (already implemented)
- Add tool_name parameter to ToolMetricsManager.reset_metrics() - Add tool_name parameter to reset_metrics MCP tool - Allows resetting metrics for a specific tool, not just server-wide
- Add new reset_tool_metrics(server_name, tool_name) MCP tool - Provides focused interface for resetting specific tool metrics - Complements existing reset_metrics() which accepts project_id
- Sprint 20 (Session Management): Pending → ✅ Complete All session MCP tools are implemented: get_session, get_current_session, list_sessions, session_stats, create_handoff, get_handoff_context, get_session_commits, pickup, mark_loop_complete - Sprint 29 (Autonomous Work Loop): Pending → 🔶 Partial Matches Milestone 11 which shows session chaining and task-driven work loops as complete, with stop signals and progress tracking remaining
… workflows - Add src/gobby/autonomous/stop_registry.py with StopRegistry class - Thread-safe stop signal management - signal_stop(), get_signal(), has_pending_signal(), acknowledge(), clear() - Database-backed persistence with session_stop_signals table - Add database migration (v37) for session_stop_signals table - Foreign key to sessions table with CASCADE delete - Indexed for pending signal lookups - Add workflow actions (check_stop_signal, request_stop, clear_stop_signal) - Integrated into ActionExecutor with stop_registry parameter - Workflow variables updated with signal info - Add has_stop_signal() condition function to evaluator - Can be used in workflow transition conditions - Registered via evaluator.register_stop_registry() - Wire up StopRegistry in HookManager - Passed to ActionExecutor - Registered with ConditionEvaluator
- Add src/gobby/autonomous/progress_tracker.py with ProgressTracker class - Track progress events by type (file_modified, task_completed, etc.) - Classify events as high-value or low-value - Stagnation detection based on time and event counts - Add database migration (v38) for loop_progress table - Indexed for efficient session queries - Separate index for high-value event lookups - Export ProgressTracker, ProgressType, ProgressEvent, ProgressSummary from gobby.autonomous module
- Add src/gobby/autonomous/stuck_detector.py with StuckDetector class - Layer 1: Task selection loop detection - Layer 2: Progress stagnation detection (via ProgressTracker) - Layer 3: Tool call pattern detection - Add database migration (v39) for task_selection_history table - Tracks task selections by session for loop detection - Indexed for efficient lookups - Add workflow actions in autonomous_actions.py: - start_progress_tracking / stop_progress_tracking - record_progress / get_progress_summary - detect_task_loop / detect_stuck - record_task_selection - Wire up ProgressTracker and StuckDetector in HookManager - Register all actions in ActionExecutor Note: check_stop_signal was already implemented in gt-f29c73
Add REST API endpoints for stop signal management:
- POST /sessions/{session_id}/stop - Signal a session to stop gracefully
- GET /sessions/{session_id}/stop - Check for pending stop signal
- DELETE /sessions/{session_id}/stop - Clear a stop signal
These endpoints integrate with the StopRegistry through HookManager,
allowing external systems (dashboards, monitoring) to request graceful
shutdown of autonomous sessions.
Includes comprehensive tests for all endpoints and error cases.
Add WebSocket integration for autonomous loop monitoring: - Add broadcast_autonomous_event() method to WebSocketServer for events: - loop_started/stopped: Progress tracking lifecycle - task_started: Task selected for work - progress_recorded: High-value progress events - stuck_detected: Loop stuck conditions detected - stop_requested: External stop signal received - Add stop_request message handler to WebSocket server for clients to request session stops via WebSocket in addition to HTTP - Update ActionExecutor with websocket_server parameter to enable event broadcasting from autonomous workflow actions - Wire up stop_registry to WebSocket server for stop_request handling
Enable conditional branching in workflows based on webhook responses: - Add `webhook` condition type to ConditionEvaluator.check_exit_conditions() - Add evaluate_webhook_conditions() async method for pre-evaluating webhooks - Support checking webhook response status codes, body content, JSON fields - Allow storing webhook results in workflow variables (store_as option) - Register webhook_executor method for enabling webhook conditions Webhook condition config options: - url, method, headers, payload, timeout - expect_success: Check response is 2xx (default true) - status_code: Check specific status code or list of codes - body_contains: Check body contains string - json_field: Check JSON field value (dot notation) - json_value: Expected value for json_field - store_as: Store full result in named variable Includes comprehensive tests for all webhook condition features.
Add agent mode for external task validation that spawns an isolated agent instance instead of using direct LLM API calls. Changes: - Add external_validator_mode config option (llm|agent, default: llm) - Add AgentRunner-based validation in external_validator.py - Wire external validation into close_task() flow - Pass agent_runner through create_task_registry → setup_internal_registries - Add 5 new tests for agent mode validation The agent mode provides more thorough validation as the agent can use tools to read files, run tests, and verify implementation details.
…ions - Add 2,314 new tests (from 3,413 to 5,727) - Add ~20,000 lines of test code across 30 files - Fix all originally failing tests - Add tests for: storage, hooks, llm, sessions, workflows, mcp_proxy, agents, memory, tasks Coverage improvements: - hooks/event_handlers.py: +coverage - hooks/hook_manager.py: +coverage - hooks/plugins.py: +coverage - memory/extractor.py: +coverage - llm/claude.py: +coverage - sessions/summary.py: +coverage - storage/mcp.py: +coverage - sync/tasks.py: +coverage - runner.py: +coverage - workflows/actions.py: +coverage - worktrees/git.py: +coverage - And many more modules Target of 80% requires additional work on hard-to-test adapter, CLI, and spawner modules.
Add test file tests/agents/spawners/test_windows_spawner.py with 83 tests covering all Windows spawner classes: - WindowsTerminalSpawner: 18 tests for is_available() and spawn() - CmdSpawner: 15 tests for is_available() and spawn() - PowerShellSpawner: 18 tests including pwsh/powershell fallback - WSLSpawner: 18 tests including Windows-to-WSL path conversion Additional test coverage: - Security tests for command injection prevention - Edge case tests for paths with spaces and special characters - Environment variable handling and validation - Platform-specific skip decorators for Windows-only integration tests All tests use mocked Windows APIs to allow running on any platform. Module coverage: 99%
- Complete rewrite of test_loader.py with 49 tests - Cover all functions: WorkflowLoader, _find_workflow_file, _merge_workflows, _merge_steps, discover_lifecycle_workflows, _scan_directory, clear_discovery_cache, validate_workflow_for_agent - Test workflow loading scenarios: valid YAML, invalid YAML, caching, project paths - Test inheritance: valid chains, self-cycles, two-way cycles, three-level cycles - Test error handling: missing workflows, missing parents, circular inheritance - Test edge cases: empty YAML, files vs directories in scan, priority sorting - Achieve 100% code coverage for loader.py (up from 82%)
- Added 66 tests covering all functions in git_hooks.py - Tests cover install_git_hooks, uninstall_git_hooks, and helper functions - Tests include different input scenarios: empty files, existing hooks, pre-commit framework hooks - Tests cover error handling: backup failures, subprocess timeouts, missing directories - Tests mock file system operations and git operations appropriately - Achieved 100% code coverage for git_hooks.py (up from 12%)
- Add 65 tests covering all commands: start, list, show, status, cancel, stats, cleanup - Test success cases, error handling, JSON output, and edge cases - Mock external dependencies (database, HTTP calls, daemon connection) - Use Click's CliRunner for CLI command testing - Coverage for agents.py increased from 15% to 99%
- Add comprehensive tests for CLI modules (daemon, agents, extensions, init, install) - Add tests for CLI installers (shared, claude, codex, gemini, git_hooks, antigravity) - Add tests for agents/spawners/windows.py and agents/tty_config.py - Add tests for adapters/base.py - Add tests for workflows (actions, mcp_actions, todo_actions, llm_actions, git_utils, etc.) - Add tests for mcp_proxy modules (actions, services, transports) - Add tests for utils (git.py, project_context.py, project_init.py) - Rename conflicting test files to avoid import collisions - Coverage increased from 74% to 80.51% - Total tests: 7,239 (up from 5,914)
Renamed tests/utils/test_git.py to tests/utils/test_utils_git.py to avoid conflict with tests/worktrees/test_git.py
…hared.py, codex.py, and memory_actions.py - Fix test_remove_json_write_error in test_shared.py (99% coverage) - Create 35 tests for codex installer module (100% coverage) - Add edge case test for memory_actions.py (100% coverage)
- Consolidate completed sprints into single table (sprints 1-17) - Rename 'Testing & Recovery' to 'End-to-End Testing' for clarity - Move Testing (18) and Documentation (19) to 'Pending (Final Polish)' section - Ensure all sprints are in numerical order
- Remove sprint numbering from completed sprints table - Update POST_MVP_ENHANCEMENTS reference to docs/plans/enhancements.md - Simplify completed sprints to just Focus and Plan Reference columns
- Remove sprint numbering from completed sprints - Rename 'Post-MVP Sprints' to 'Remaining Sprints' with simplified layout - Move Session Management Tools and Subagent System to completed - Add remaining task notes for partial items (Task V2, Worktree, Autonomous) - Move Testing and Documentation to end as 'Production Ready' milestone - Simplify Milestones section (Completed vs Remaining) - Clean up Parallel Tracks and Recommendations sections
…files - Add assertions for unused result variables where testing behavior was incomplete - Remove dead code (unused variables like call_count, original_shutdown) - Remove assignments where only side effects matter (e.g., GobbyRunner constructor) - Add proper assertions for spawn results, connect_all results, and cleanup counts
- Fix E741 ambiguous variable names (l -> line) - Fix E402 imports not at top of file (move imports) - Fix B007 unused loop control variables (_index, _name)
Renamed 'f' to 'file_path' in sessions.py to avoid shadowing the file handle variable from earlier in the file, which was causing a mypy type mismatch error.
|
Caution Review failedThe pull request is closed. Note
|
| Cohort / File(s) | Summary |
|---|---|
Autonomous core src/gobby/autonomous/__init__.py, src/gobby/autonomous/progress_tracker.py, src/gobby/autonomous/stop_registry.py, src/gobby/autonomous/stuck_detector.py |
New package: progress tracking (event recording, stagnation detection), thread-safe stop-signal registry, task-selection history and multi-layer stuck detection with DB persistence and public APIs. |
Workflow integrations src/gobby/workflows/autonomous_actions.py, src/gobby/workflows/stop_signal_actions.py, src/gobby/workflows/actions.py, src/gobby/workflows/evaluator.py |
New workflow actions and evaluator wiring for progress tracking, recording progress/task selections, detecting loops/stuck, and stop-signal handling; ActionExecutor now accepts stop_registry/progress_tracker/stuck_detector/websocket_server. |
Server & realtime API src/gobby/servers/websocket.py, src/gobby/servers/http.py, src/gobby/servers/routes/sessions.py, src/gobby/servers/routes/dependencies.py, src/gobby/servers/routes/mcp.py |
WebSocket supports stop_request messages and broadcasts autonomous events; HTTP startup wires stop_registry into HookManager; new session endpoints to signal/query/clear stop signals; DI for metrics_manager and optional tool metrics enrichment on MCP endpoints. |
HookManager & executor wiring src/gobby/hooks/hook_manager.py, src/gobby/workflows/actions.py |
HookManager initializes autonomous components and passes them into ActionExecutor; ActionExecutor registers and implements stop/autonomous actions and non-blocking WebSocket broadcasts. |
Database schema & storage src/gobby/storage/migrations.py, src/gobby/storage/tasks.py, src/gobby/storage/database.py, src/gobby/storage/sessions.py, src/gobby/storage/* |
New migrations for session_stop_signals, loop_progress, task_selection_history; task column renamed to task_type usage; safe_update helper; storage methods refactored to use safe_update; added nosec annotations. |
Agent runner & task registry src/gobby/agents/runner.py, src/gobby/mcp_proxy/registries.py, src/gobby/mcp_proxy/tools/tasks.py |
prepare_run now validates/loads workflows before creating child sessions and initializes workflow state conditionally; create_task_registry and task tools wired to accept optional agent_runner for external/agent-mode validation. |
External validation & commit helpers src/gobby/tasks/external_validator.py, src/gobby/tasks/commits.py, src/gobby/config/tasks.py |
External validation supports "llm" and new "agent" modes via AgentRunner; added is_doc_only_diff and summarize_diff_for_validation to pre-process diffs; TaskValidationConfig gets external_validator_mode field. |
Handoff / session summaries src/gobby/cli/sessions.py, src/gobby/mcp_proxy/tools/session_messages.py |
New create-handoff CLI/tool with compact/full modes, LLM-based full summaries, file/db persistence, transcript formatting helper and expanded options for output and paths. |
CLI and docs src/gobby/cli/workflows.py, src/gobby/cli/__init__.py, docs/**, AGENTS.md, CLAUDE.md, GEMINI.md, ROADMAP.md, docs/examples/** |
Renamed workflow CLI group to workflows; updated many docs/examples to pluralized commands; removed CLI git-hooks module and consolidated hook install instructions; added GEMINI.md and large roadmap/docs edits. |
MCP metrics & tools src/gobby/mcp_proxy/metrics.py, src/gobby/mcp_proxy/tools/metrics.py, src/gobby/servers/routes/mcp.py |
Added ToolMetricsManager.get_failing_tools; reset_metrics accepts tool_name; MCP tools expose get_failing_tools and reset_tool_metrics; list_all_mcp_tools can include metrics enrichment. |
Tests added/expanded tests/** (many files) |
Large additions/expansions of unit and integration tests covering adapters (Codex/Gemini), spawners (Embedded/Headless/Windows), agent runner/registry, spawn helpers, TTY config, autonomous package scaffolding, and more. |
| Type hints, linting, small refactors many src/gobby/* files |
Numerous typing refinements, signature narrowings, minor refactors, and linter/nosec annotations across adapters, sessions, storage, utils, and workflows — mostly non-functional. |
Metadata & utilities .gobby/tasks_meta.json, scripts/cleanup_skills.py, .gitignore, pyproject.toml |
Metadata timestamp/hash bump; new cleanup_skills script; .gitignore updated to ignore .scripts/; pyproject updated with tooling and dev deps. |
Sequence Diagram(s)
sequenceDiagram
participant Session
participant ProgressTracker
participant DB
participant StuckDetector
participant WebSocket
Session->>ProgressTracker: record_tool_call(session_id, tool_name, result)
ProgressTracker->>DB: INSERT loop_progress event
ProgressTracker->>ProgressTracker: infer is_high_value
ProgressTracker->>DB: query recent events (get_summary)
ProgressTracker->>StuckDetector: provide summary / recent events
StuckDetector->>DB: read task_selection_history / loop patterns
alt stuck_detected
StuckDetector->>DB: (optional) persist detection result
StuckDetector->>WebSocket: broadcast_autonomous_event(stuck_detected, session_id)
end
sequenceDiagram
participant Client
participant WebSocket
participant StopRegistry
participant DB
participant WorkflowExecutor
Client->>WebSocket: send stop_request(session_id, reason, source)
WebSocket->>StopRegistry: signal_stop(session_id, source, reason)
StopRegistry->>DB: upsert session_stop_signals
StopRegistry-->>WebSocket: return StopSignal
WebSocket-->>Client: stop_response
WebSocket->>WebSocket: broadcast_autonomous_event(stop_requested, session_id)
Note over WorkflowExecutor,StopRegistry: During workflow evaluation
WorkflowExecutor->>StopRegistry: has_pending_signal(session_id)
alt pending
StopRegistry-->>WorkflowExecutor: True
WorkflowExecutor->>StopRegistry: acknowledge(session_id)
WorkflowExecutor->>WorkflowExecutor: inject stop handling / graceful shutdown
end
Estimated code review effort
🎯 5 (Critical) | ⏱️ ~120 minutes
Possibly related PRs
- v0.1.1 #2 — appears closely related: overlapping additions of the autonomous subsystem (progress_tracker/stop_registry/stuck_detector), workflow CLI rename to "workflows", DB migrations, and edits in agents/runner.py and workflows/actions.py.
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Title check | ❓ Inconclusive | Title 'v0.1.1' is a version identifier, which is vague and does not meaningfully describe the primary changes in the changeset. | Use a more descriptive title that highlights the main purpose of the changes, such as 'Release v0.1.1: database schema refactor and comprehensive type safety improvements' or similar. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 93.69% which is sufficient. The required threshold is 80.00%. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing touches
- 📝 Generate docstrings
📜 Recent review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.gobby/tasks.jsonl.gobby/tasks_meta.jsonGEMINI.mdROADMAP.mdsrc/gobby/servers/routes/mcp.pytests/cli/test_cli_daemon.pytests/llm/test_resolver.pytests/utils/test_project_context.pytests/workflows/test_actions_coverage.pytests/workflows/test_session_actions.pytests/workflows/test_webhook_condition.py
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Pull request overview
This PR consolidates changes to bring the main branch up to date with v0.1.1. The changes include significant additions to the testing infrastructure for agents, spawners, and related functionality, as well as a configuration update for the tasks registry initialization.
Key changes:
- Added comprehensive test suites for terminal spawners (cross-platform, macOS, Linux, Windows)
- Added tests for agent runner, registry, and spawn functionality
- Updated MCP proxy registries to pass agent_runner parameter
Reviewed changes
Copilot reviewed 63 out of 160 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/autonomous/init.py | Added module docstring for autonomous session management tests |
| tests/agents/test_tty_config.py | New comprehensive test suite for terminal configuration functionality |
| tests/agents/test_spawners.py | New comprehensive test suite for terminal spawner implementations |
| tests/agents/test_spawn.py | Extended tests for spawn functionality including prompt file management and terminal preparation |
| tests/agents/test_runner.py | Added tests for agent runner methods including get_run, list_runs, cancel_run, and workflow handling |
| tests/agents/test_registry.py | New test suite for running agent registry functionality |
| tests/agents/spawners/test_windows_spawner.py | New comprehensive test suite for Windows terminal spawners |
| tests/agents/spawners/test_embedded_spawner.py | New comprehensive test suite for embedded PTY spawner |
| tests/agents/spawners/init.py | Added module docstring for spawner tests |
| src/gobby/mcp_proxy/registries.py | Added agent_runner parameter to tasks registry initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gobby/mcp_proxy/tools/tasks.py (1)
785-815: Schema docs overstateoverride_justificationrequirement vs implementationThe input schema descriptions say:
skip_validation: “Provide override_justification explaining why.”override_justification: “Required when skip_validation=True or no_commit_needed=True.”no_commit_needed: “Requires override_justification.”In code, a hard requirement is enforced only when:
requires_commit_checkis True,- there are no linked commits,
- and
no_commit_neededis True.
skip_validation=Trueis not currently enforced to requireoverride_justification; the close can proceed withoverride_justification=None.Either:
- Enforce the documented requirement for
skip_validation(returning a structured error when it’s True and justification is missing), or- Loosen the schema text to reflect actual behavior to avoid confusing agents/tools using this schema.
🤖 Fix all issues with AI agents
In @src/gobby/autonomous/progress_tracker.py:
- Around line 437-445: The comprehension building ProgressEvent objects in
ProgressTracker (the list returned where ProgressEvent(...,
details=eval(row["details"]) if row["details"] else {}, ...)) is using eval
which is unsafe; replace eval with a safe deserializer such as
ast.literal_eval(row["details"]) or json.loads(row["details"]) depending on how
details were serialized, and update any writers to use json.dumps when storing
JSON; ensure the change is applied to the details field in the ProgressEvent
construction and any related read/write helpers so stored details are
consistently serialized/deserialized without executing code.
In @src/gobby/autonomous/stop_registry.py:
- Around line 1-8: The file needs to be reformatted to satisfy ruff; run the
formatter against the module (e.g., run "ruff format
src/gobby/autonomous/stop_registry.py" or run ruff format for the repo) to apply
the required whitespace and style changes so the module-level docstring and file
formatting comply with CI checks; commit the reformatted file (stop_registry.py)
and push.
- Around line 245-269: The cutoff calculation in cleanup_stale is wrong because
it sets cutoff to midnight and then subtracts hours; change it to compute a true
"now minus max_age_hours" cutoff: in cleanup_stale compute cutoff =
datetime.now(UTC) - timedelta(hours=max_age_hours) (add timedelta import) and
update the SQL to compare acknowledged_at directly against that cutoff (e.g.,
WHERE datetime(acknowledged_at) < datetime(?) and pass cutoff.isoformat()),
keeping the rest of the deletion logic and logging intact.
In @src/gobby/autonomous/stuck_detector.py:
- Around line 1-7: The file src/gobby/autonomous/stuck_detector.py fails CI
formatting; run the formatter (ruff format
src/gobby/autonomous/stuck_detector.py) or apply ruff-compatible formatting to
the module (ensure the top-level docstring and surrounding whitespace follow
ruff/PEP8 rules) and commit the formatted file so the pipeline passes.
- Around line 130-136: The code is serializing `context` with `str(context)`
which emits Python-style single-quoted strings that break `json.loads()`;
replace `str(context) if context else None` with `json.dumps(context) if context
is not None else None`, add `import json` at the top, and optionally wrap
`json.dumps` in a try/except to fall back to `str(context)` if `context` is not
JSON-serializable; keep the surrounding tuple (session_id, task_id,
now.isoformat(), ...) unchanged so downstream
`json.loads()`/`ast.literal_eval()` can reliably parse the stored value.
In @src/gobby/cli/sessions.py:
- Around line 610-611: The loop variable `f` in the `for f in files_written:`
loop shadows the file handle `f` used earlier in the `with open(path) as f:`
block, causing a mypy type error; rename the loop variable (e.g., `filename` or
`written_path`) and update the `click.echo` call to use the new name
(`click.echo(f" File: {filename}")`) so the file handle symbol is no longer
shadowed.
In @src/gobby/tasks/external_validator.py:
- Around line 188-199: AgentConfig is missing required session context fields so
agent_runner.run() fails during prepare_run; update the external validator to
supply parent_session_id, project_id, and machine_id (e.g., from a
session_context passed into this function) when constructing AgentConfig, or
implement a separate path such as a validate_only mode that skips session
creation and bypasses prepare_run; specifically, modify where AgentConfig is
built in external_validator.py and ensure either AgentConfig contains
parent_session_id/project_id/machine_id or that agent_runner.run() is invoked
via a validation-only execution that does not require prepare_run.
In @tests/adapters/test_base.py:
- Around line 475-609: Add the pytest integration marker to the
TestAdapterIntegration test class: annotate the class TestAdapterIntegration
with @pytest.mark.integration and ensure pytest is imported in the test module
if not already; this will mark all methods (test_session_lifecycle_simulation,
test_tool_blocking_scenario, test_context_injection_scenario) as integration
tests.
In @tests/adapters/test_codex.py:
- Around line 1507-1536: Add the pytest integration marker to the
TestCodexAdapterEventMapping test class by decorating the class with
@pytest.mark.integration so all its tests are categorized as integration tests;
ensure pytest is imported in the test module if not already present. Target the
TestCodexAdapterEventMapping class (and its methods
test_event_map_contains_all_supported_events, test_tool_item_types_complete,
test_session_tracking_events_complete) when applying the decorator.
In @tests/adapters/test_gemini.py:
- Around line 885-1000: The TestIntegration class lacks the integration marker;
add the pytest integration marker by decorating the TestIntegration class with
@pytest.mark.integration (ensure pytest is imported if not already) so all
methods (test_session_lifecycle_roundtrip, test_tool_execution_roundtrip,
test_tool_denied_by_workflow) are treated as integration tests; place the
decorator immediately above the class definition referencing the TestIntegration
symbol.
In @tests/agents/spawners/test_headless_spawner.py:
- Around line 1017-1027: The test test_spawn_with_unicode_in_args should not
accept an unrelated ASCII fallback; remove the fallback check and assert only
that the Unicode substring "\u4e16\u754c" appears in the captured stdout after
calling HeadlessSpawner.spawn (check result.success and
result.process.communicate() as currently done), or alternatively make the test
explicitly skip/mark xfail on environments that cannot support Unicode; update
the assertion to only verify "\u4e16\u754c" in stdout (and keep the existing
result.success check) so the test fails when Unicode is lost rather than
silently passing.
In @tests/agents/test_spawners.py:
- Around line 493-509: The test test_spawn_disables_destroy_unattached should
not assert a non-existent pid because TmuxSpawner.spawn() returns SpawnResult
with pid=None on success; update the test to remove or change the pid assertion
(e.g., assert result.pid is None or simply omit the pid check), keep asserting
result.success is True, and ensure the existing assertion(s) verifying the
chained tmux set-option behavior remain (use the mock_popen/mock_process
expectations to validate set-option calls if needed).
🧹 Nitpick comments (36)
tests/agents/test_spawners.py (2)
1622-1670: Duplicate EmbeddedPTYResult.close() tests across modulesThis file defines
TestEmbeddedPTYResultClose, whiletests/agents/spawners/test_embedded_spawner.pyalso has an almost identicalTestEmbeddedPTYResultsuite. That duplication increases maintenance cost without adding much value.Consider consolidating the
EmbeddedPTYResult.close()tests into a single module (likely the more focused embedded_spawner test file) and removing one copy here.
680-861: Overlap between EmbeddedSpawner tests here and in test_embedded_spawner.py
TestEmbeddedSpawnerandTestEmbeddedSpawnerUnixin this file cover many of the same behaviors (Windows unsupported, PTY errors, parent-path handling, env vars, Unix PTY integration) as the dedicatedtests/agents/spawners/test_embedded_spawner.pysuite.The behavior under test is valid, but you now have two places to update when EmbeddedSpawner changes. Consider keeping one “unit-style” suite (likely
test_embedded_spawner.py) and trimming redundant cases here, or narrowing this file to higher-level integration/security checks only.tests/agents/spawners/test_embedded_spawner.py (2)
771-779: Consider avoiding hard-coding MAX_ENV_PROMPT_LENGTH in multiple test modulesThis file asserts
MAX_ENV_PROMPT_LENGTH == 4096, andtests/agents/test_spawn.pydoes the same. The check is fine as a specification of current behavior, but duplicating the exact value across modules means two places to update if the threshold ever changes.If you want to keep the value check, consider centralizing it in a single test file or factoring a small helper to assert it once.
560-569: Generic exception test adds little beyond more specific fork/openpty tests
test_spawn_error_handling_comprehensivecovers the “generic Exception during fork” path on top of already-specific tests for fork and openpty errors. It’s valid, but it’s largely overlapping withtest_spawn_fork_errorandtest_spawn_closes_fds_on_exception.If test runtime becomes a concern, this one is a good candidate to drop or narrow (e.g., assert only that no exception leaks out).
tests/agents/spawners/test_windows_spawner.py (2)
369-490: CmdSpawner tests rely on outer/cand inner/ksemanticsThe CmdSpawner tests assert both
/c(outercmd /c start ...) and/k(keeping the inner window open). This matches the typical pattern ofcmd /c start "" cmd /k ...and aligns with the described implementation.Just be aware that if the spawn implementation is ever simplified to a single-level
cmd /k ...call, these tests will need updating; conversely, they currently lock in the two-levelcmd /c startstructure.
1172-1236: Security/injection tests are sanity checks rather than strict guaranteesThe injection-prevention tests for CmdSpawner, PowerShellSpawner, and WSLSpawner currently only assert that the dangerous payload does not appear as a standalone token (
split()/split("&&")) in the final script/command. That’s a useful smoke test and matches current quoting behavior, but it’s not a full guarantee against all shell-injection variants.If you want stronger guarantees later, you might tighten these assertions to inspect the exact command string structure or add negative cases around specific quoting patterns, but this is optional for now.
tests/agents/test_spawn.py (4)
1111-1226: Prompt-file tests correctly exercise behavior but are tightly coupled to private globals
TestPromptFileManagementandTestCreatePromptFileExceptionHandlingvalidate real behavior of_create_prompt_file,_cleanup_all_prompt_files, and_prompt_files_to_cleanup(content, directory, permissions, cleanup, atexit behavior, and error handling). That’s valuable coverage.The trade-off is heavy reliance on private module state (
_prompt_files_to_cleanupand_atexit_registered). Any internal refactor of how prompt files are tracked will require updating these tests. If you want to reduce brittleness later, consider:
- Adding a small public helper around prompt-file lifecycle and asserting through that, or
- Narrowing tests to “black-box” effects (file exists with correct mode, then is cleaned up), without asserting on the internal set directly.
For now, behavior is consistent with the current implementation.
1289-1407: TerminalSpawner preference and AUTO-behavior tests match intended APIThe new
TestTerminalSpawnerMethodsand parts ofTestTerminalSpawnerSpawnAgent/TestTerminalSpawnerAutoDetectcorrectly specify:
get_available_terminals()returns only spawners whoseis_available()is true.get_preferred_terminal()respects user preferences fromget_tty_config().get_preferences()and skips unknown/unsupported terminals.spawn(..., terminal=TerminalType.AUTO)usesget_preferred_terminal()and fails clearly when nothing is available.- String terminal names (e.g.
"tmux") are converted toTerminalTypebefore dispatch.The patching strategy is a bit intricate (mixing class-level and instance-level mocks). If you refactor TerminalSpawner internals, consider simplifying these tests to patch
spawner._spawners[...]directly (as done in other tests) to keep them robust against constructor changes.
1747-1782: PreparedSpawn dataclass tests are reasonable but somewhat redundant
TestPreparedSpawnDataclassmainly checks that field assignment onPreparedSpawnworks and thatworkflow_namecan beNone. This is fine but fairly minimal for a dataclass and somewhat overlaps with theprepare_terminal_spawntests that already assert end-to-end values.If you later want to trim test volume, this small dataclass-only test is a low-risk candidate to remove, relying instead on the higher-level
prepare_terminal_spawncoverage.
1785-1842: MAX_ENV_PROMPT_LENGTH tests duplicate constant-value checksYou now assert
MAX_ENV_PROMPT_LENGTH == 4096in both this file andtests/agents/spawners/test_embedded_spawner.py. It’s valid to lock in that specification, but duplicating the exact value increases maintenance overhead on any future threshold change.Consider centralizing the constant-value assertion in one place and letting the rest of the suite focus on behavioral boundaries (which you already test thoroughly via
spawn_agentandprepare_terminal_spawn).tests/agents/spawners/test_headless_spawner.py (4)
42-45: Consider importing the constant for comparison.The test hardcodes
4096, which could break silently if the production constant changes. Consider importingMAX_ENV_PROMPT_LENGTHdirectly for comparison to ensure the test stays in sync.♻️ Suggested improvement
+from gobby.agents.spawn import MAX_ENV_PROMPT_LENGTH + def test_max_env_prompt_length_value(self): """_get_spawn_utils returns correct MAX_ENV_PROMPT_LENGTH.""" _, _, max_env_prompt_length = _get_spawn_utils() - assert max_env_prompt_length == 4096 + assert max_env_prompt_length == MAX_ENV_PROMPT_LENGTH
527-540: Consider extracting env extraction helper to reduce duplication.The pattern
call_args[0][2] if len(call_args[0]) > 2 else call_args[1].get("env")is repeated across multiple tests. Consider a helper fixture or method for cleaner extraction.♻️ Optional helper function
def _get_env_from_call_args(call_args): """Extract env dict from mock.call_args.""" return call_args[0][2] if len(call_args[0]) > 2 else call_args[1].get("env")
910-912: Consider adding@pytest.mark.integrationmarker.Per the coding guidelines, tests should use
pytest.mark.integrationfor categorization. These integration tests run real processes and would benefit from the marker for selective test execution.♻️ Suggested marker addition
@pytest.mark.skipif(sys.platform == "win32", reason="Unix-specific tests") +@pytest.mark.integration class TestHeadlessSpawnerIntegration: """Integration tests for HeadlessSpawner on Unix systems."""
997-1003: Test relies on implementation detail.The test mocks
subprocess.Popento raiseIndexError("Empty command"), but this assumes how Python handles empty command lists. The actual behavior may vary. Consider testing the actual behavior without mocking, or at least documenting this assumption.src/gobby/mcp_proxy/tools/metrics.py (1)
203-233: Consider clarifying the distinction betweenreset_metricsandreset_tool_metrics.Both tools call
metrics_manager.reset_metrics()but with different parameter subsets:
reset_metrics: acceptsproject_id,server_name,tool_namereset_tool_metrics: accepts onlyserver_name,tool_name(described as "admin operation")This appears intentional for different access patterns, but the distinction may not be immediately clear to users. Consider enhancing the description on Line 205 to clarify when to use each, e.g., "Admin tool to reset metrics for a specific tool across all projects."
src/gobby/mcp_proxy/tools/session_messages.py (1)
101-121: Duplicated helper function across multiple modules.
_format_turns_for_llmis duplicated in at least three locations:
src/gobby/cli/sessions.pysrc/gobby/sessions/summary.pysrc/gobby/mcp_proxy/tools/session_messages.py(this file)Additionally,
src/gobby/workflows/actions.pyalready has a wrapper that calls a sharedformat_turns_for_llm. This version also differs fromsummary.pyin that it doesn't handle the"thinking"block type.Consider consolidating into a single shared utility (e.g., in
gobby.sessions.transcriptsor a dedicated formatting module) and importing it where needed.tests/agents/test_tty_config.py (1)
412-435: Consider usingpytestfixtures or context managers for guaranteed cleanup.Several tests manually call
Path(f.name).unlink()at the end. If an assertion fails before cleanup, temp files may leak. Usingpytest.fixturewith cleanup in afinallyblock, oraddCleanup, would be more robust.This is a minor style improvement and doesn't block the PR.
src/gobby/cli/sessions.py (2)
27-47: Duplicate helper function.As noted in the MCP tools file,
_format_turns_for_llmis duplicated here. Consider consolidating into a shared module.
538-543: Inconsistent async runtime usage.This function uses
anyio.run()while other CLI commands in this file useasyncio.run(). Consider usingasyncio.run()for consistency:Suggested change
- import anyio - - async def _generate() -> str: - return await provider.generate_summary(context, prompt_template=prompt_template) - - full_markdown = anyio.run(_generate) + full_markdown = asyncio.run( + provider.generate_summary(context, prompt_template=prompt_template) + )src/gobby/servers/http.py (1)
318-325: Consider exposing_stop_registryas a public property or method.The code accesses the private attribute
_stop_registrywith defensivehasattrchecks, which is consistent with the established pattern across the codebase (also used insrc/gobby/servers/routes/sessions.py). However, this direct access to private attributes across multiple components indicates architectural coupling. While the defensive checks mitigate immediate risk, consider adding a public accessor method (e.g.,get_stop_registry()) toHookManagerto formalize this as an intentional public API and reduce reliance on private attribute access.src/gobby/tasks/expansion.py (1)
467-495: Consider compiling regex patterns for better performance.The function name extraction logic is robust with good fallback handling. However, the regex patterns are compiled on every call within the loop.
For better performance, especially when processing many signatures during task expansion, consider compiling the patterns once at module or class level:
♻️ Optional refactor to improve performance
At the module level (after imports):
# Compiled regex patterns for function signature parsing _FUNC_NAME_PATTERN = re.compile(r"(?:async\s+)?def\s+(\w+)") _FUNC_PAREN_PATTERN = re.compile(r"(\w+)\s*\(")Then update the extraction logic:
# Extract function name robustly using regex # Handles: "def func_name(", "async def func_name(", "func_name(" func_name = None # Try regex patterns first - match = re.search(r"(?:async\s+)?def\s+(\w+)", sig) + match = _FUNC_NAME_PATTERN.search(sig) if match: func_name = match.group(1) else: # Fallback: try to get name before first paren - match = re.search(r"(\w+)\s*\(", sig) + match = _FUNC_PAREN_PATTERN.search(sig) if match: func_name = match.group(1)src/gobby/agents/runner.py (1)
285-369: Workflow validation-before-session creation is a solid robustness improvementValidating and loading the workflow (and rejecting lifecycle workflows) before creating the child session, then initializing workflow state only when a definition is present, avoids orphan sessions and keeps workflow/tool filtering consistent with
WorkflowStateManager. As an optional follow-up, you might also populateWorkflowState.initial_stepwhen constructing the state so reset flows can rely on it without inferring fromstep, but the current behavior is functionally sound.src/gobby/hooks/hook_manager.py (1)
36-39: Autonomous stop/progress/stuck wiring into HookManager looks consistentInitializing
StopRegistry,ProgressTracker, andStuckDetectoron the sharedLocalDatabase, threadingwebsocket_serverfrom the broadcaster, passing all of these intoActionExecutor, and registering thestop_registrywith the workflow evaluator line up cleanly with the new migrations and autonomous components. One optional cleanup would be to exposestop_registryvia a HookManager accessor rather than relying on the_stop_registryprivate attribute (used by the HTTP routes), but the current wiring is functionally correct.Also applies to: 191-197, 249-253, 268-283
src/gobby/tasks/commits.py (1)
148-245: Diff summarization respects char budget; consider guarding header-only overflowThe summarization logic (per-file stats header, per-file truncation, final hard cap) gives the validator a good overview while enforcing
max_chars. One edge case: if there are so many files that the header alone exceedsmax_chars,remaining_charsbecomes <= 0 andchars_per_fileturns negative, so file hunks are truncated in a slightly arbitrary way before the final global truncation. Not a correctness bug, but you might add a guard likeif remaining_chars <= 0: return "".join(summary_parts)[:max_chars] + "...truncated..."to skip per-file bodies in that scenario.src/gobby/servers/websocket.py (1)
74-80: Consider tighteningstop_registrytyping for better static safety
stop_registryis currently typed asAny; given it’s always expected to be aStopRegistry, consider adding aTYPE_CHECKINGimport and annotating this parameter/attribute accordingly to catch integration mistakes at mypy time. This matches how other core dependencies are typed elsewhere.Also applies to: 89-95
src/gobby/mcp_proxy/tools/tasks.py (1)
105-113: Assumesconfig.get_gobby_tasks_config().validationis always populated
create_task_registrypullsvalidation_config = config.get_gobby_tasks_config().validationand immediately readsauto_generate_on_create/auto_generate_on_expandfrom it. This is fine ifvalidationis guaranteed to be aTaskValidationConfig(via defaults), but will raise if it can beNonein misconfigured setups.If
validationis not strictly required inDaemonConfig, consider either:
- Providing a non-optional default in the config model, or
- Guarding here and falling back to the hardcoded True defaults when
validation_configis falsy.Also applies to: 128-138
src/gobby/servers/routes/mcp.py (1)
306-327: Metrics enrichment logic is sound; consider optional server-level filteringThe enrichment block correctly:
- Fetches all per-tool metrics for the resolved project,
- Indexes them by
(server_name, tool_name),- Populates
call_count,success_rate, andavg_latency_mson each tool, with sensible 0/None defaults when missing.If metrics volume grows large and
server_filteris commonly used, you might want to passserver_name=server_filterintometrics_manager.get_metricsin that case to avoid scanning all metrics, but the current design is functionally correct.src/gobby/workflows/actions.py (2)
102-121: New autonomous/stop-signal dependencies are wired consistentlyThe additions of
stop_registry,progress_tracker,stuck_detector, andwebsocket_servertoActionExecutor.__init__and their registrations in_register_defaultsare coherent and follow the existing “store dependency onselfand wrap via handler” pattern. Given the rest of this class already usesAny | Nonefor injected services, keeping that convention here is acceptable, though you could tighten types viaTYPE_CHECKINGimports later if desired.Also applies to: 135-138, 243-254
1117-1148: Avoid callingtask.exception()on cancelled broadcast tasksIn
_broadcast_autonomous_event, the done callback currently does:task.add_done_callback( lambda t: logger.debug(f"Broadcast {event} failed: {t.exception()}") if t.exception() else None )If the broadcast task is cancelled (e.g., during shutdown),
t.exception()will itself raiseCancelledErrorinside the callback, which can surface as “Task exception was never retrieved”.Safer pattern:
- Check
t.cancelled()first, and only callt.exception()when it is not cancelled.Proposed tweak to the callback
- task.add_done_callback( - lambda t: logger.debug(f"Broadcast {event} failed: {t.exception()}") - if t.exception() - else None - ) + def _on_done(t: asyncio.Task) -> None: + if t.cancelled(): + return + exc = t.exception() + if exc: + logger.debug(f"Broadcast {event} failed: {exc}") + + task.add_done_callback(_on_done)src/gobby/workflows/stop_signal_actions.py (1)
17-83: Stop-signal helpers align well withStopRegistryAPI
check_stop_signal,has_stop_signal,request_stop, andclear_stop_signalall match the semantics ofStopRegistry(usingget_signal(),has_pending_signal(),signal_stop(), andclear()respectively), handle a missing registry defensively, and return structured dicts suitable for workflow engines.One minor assumption is that
state.variablesis always a dict; givenWorkflowStatesets a dict default, that’s reasonable, but if other code ever assignsNoneintovariables, you may want a quickif state.variables is None: state.variables = {}before writing_stop_signal_*keys.Also applies to: 86-163
tests/agents/test_registry.py (2)
925-975: Concurrent add/remove test could assert stronger invariants
test_concurrent_add_and_removecurrently only checks that no exceptions were recorded, but doesn’t assert anything about the final registry state (there’s just a comment about expected counts).Given the add/remove sets don’t overlap, you could safely assert something like
registry.count() == 100afterwards to catch subtle race issues in the internal bookkeeping, without making the test brittle.
1265-1283:test_cleanup_stale_exact_boundarydoesn’t currently validate behaviorThis test sets an agent’s
started_atto exactly one hour ago and then asserts:assert len(removed) >= 0which is always true and doesn’t check any real behavior. Either:
- Decide what the intended contract is at the boundary (e.g.,
age > max_agevs>=), and assert accordingly, or- Drop the test if the implementation is allowed to be fuzzy there.
As-is, it’s effectively just coverage padding.
src/gobby/tasks/external_validator.py (1)
81-97: Mode access can use direct attribute instead of getattr.Since
TaskValidationConfigexplicitly definesexternal_validator_modewith a typedLiteral["llm", "agent"]default, you can access it directly rather than usinggetattrwith a fallback.♻️ Suggested simplification
- mode = getattr(config, "external_validator_mode", "llm") + mode = config.external_validator_modesrc/gobby/workflows/evaluator.py (1)
132-133: Consider using TYPE_CHECKING import for StopRegistry type.
_stop_registryis typed asAny, but since you already useTYPE_CHECKINGforWebhookExecutor, you could also importStopRegistryfor better type safety.♻️ Suggested improvement
if TYPE_CHECKING: from .webhook_executor import WebhookExecutor + from gobby.autonomous.stop_registry import StopRegistry # ... def __init__(self) -> None: """Initialize the condition evaluator.""" self._plugin_conditions: dict[str, Any] = {} self._task_manager: Any = None - self._stop_registry: Any = None + self._stop_registry: StopRegistry | None = None self._webhook_executor: WebhookExecutor | None = Nonesrc/gobby/workflows/autonomous_actions.py (1)
118-124: Consider logging a warning for unrecognized progress types.Silent fallback to
TOOL_CALLfor invalid enum values may hide bugs. A debug log would help troubleshoot issues.♻️ Add debug logging for fallback
if isinstance(progress_type, str): try: progress_type = ProgressType(progress_type) except ValueError: + logger.debug(f"Unknown progress type '{progress_type}', falling back to TOOL_CALL") progress_type = ProgressType.TOOL_CALLsrc/gobby/autonomous/stuck_detector.py (1)
245-252: Tool pattern key generation is safe but could be more robust.The key generation works correctly since
ProgressEvent.detailsdefaults to an empty dict. However, the key format"tool_name:[args_list]"creates string representations of lists which may vary. Consider using a more stable key format.♻️ Consider tuple-based key for stability
if event.tool_name: # Create a key from tool name and key args - key = f"{event.tool_name}:{event.details.get('tool_args_keys', [])}" + args_keys = tuple(sorted(event.details.get('tool_args_keys', []))) + key = f"{event.tool_name}:{args_keys}" tool_counts[key] = tool_counts.get(key, 0) + 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (160)
.gobby/tasks.jsonl.gobby/tasks_meta.jsonAGENTS.mdCLAUDE.mdGEMINI.mdROADMAP.mddocs/architecture/app_decomposition.mddocs/architecture/tasks_decomposition.mddocs/examples/workflows/README.mddocs/examples/workflows/agent-delegation.yamldocs/examples/workflows/parallel-worktree-agents.yamldocs/guides/cli-commands.mddocs/guides/http-endpoints.mddocs/guides/mcp-tools.mddocs/guides/tasks.mddocs/guides/workflow-actions.mddocs/guides/workflows.mddocs/old/automation-summary.mddocs/old/config-settings-audit.mddocs/old/webhook-action-schema.mddocs/plans/completed/TASKS.mddocs/plans/completed/WORKFLOWS.mddocs/plans/enhancements.mddocs/plans/todo-list.mddocs/workflow-actions.mdsrc/gobby/agents/runner.pysrc/gobby/autonomous/__init__.pysrc/gobby/autonomous/progress_tracker.pysrc/gobby/autonomous/stop_registry.pysrc/gobby/autonomous/stuck_detector.pysrc/gobby/cli/__init__.pysrc/gobby/cli/sessions.pysrc/gobby/cli/tasks/hooks.pysrc/gobby/cli/tasks/main.pysrc/gobby/cli/workflows.pysrc/gobby/config/tasks.pysrc/gobby/hooks/hook_manager.pysrc/gobby/mcp_proxy/metrics.pysrc/gobby/mcp_proxy/registries.pysrc/gobby/mcp_proxy/tools/metrics.pysrc/gobby/mcp_proxy/tools/session_messages.pysrc/gobby/mcp_proxy/tools/tasks.pysrc/gobby/mcp_proxy/tools/worktrees.pysrc/gobby/servers/http.pysrc/gobby/servers/routes/dependencies.pysrc/gobby/servers/routes/mcp.pysrc/gobby/servers/routes/sessions.pysrc/gobby/servers/websocket.pysrc/gobby/storage/migrations.pysrc/gobby/storage/tasks.pysrc/gobby/tasks/commits.pysrc/gobby/tasks/expansion.pysrc/gobby/tasks/external_validator.pysrc/gobby/workflows/actions.pysrc/gobby/workflows/autonomous_actions.pysrc/gobby/workflows/evaluator.pysrc/gobby/workflows/loader.pysrc/gobby/workflows/stop_signal_actions.pytests/adapters/test_base.pytests/adapters/test_codex.pytests/adapters/test_gemini.pytests/agents/spawners/__init__.pytests/agents/spawners/test_embedded_spawner.pytests/agents/spawners/test_headless_spawner.pytests/agents/spawners/test_windows_spawner.pytests/agents/test_registry.pytests/agents/test_runner.pytests/agents/test_spawn.pytests/agents/test_spawners.pytests/agents/test_tty_config.pytests/autonomous/__init__.pytests/autonomous/test_autonomous.pytests/cli/installers/__init__.pytests/cli/installers/test_antigravity.pytests/cli/installers/test_claude.pytests/cli/installers/test_codex_installer.pytests/cli/installers/test_gemini_installer.pytests/cli/installers/test_git_hooks_installer.pytests/cli/installers/test_shared.pytests/cli/test_cli_agents.pytests/cli/test_cli_daemon.pytests/cli/test_cli_extensions.pytests/cli/test_cli_init.pytests/cli/test_cli_install.pytests/cli/test_tasks_cli.pytests/config/test_app_config.pytests/hooks/test_event_handlers.pytests/hooks/test_hooks_manager.pytests/hooks/test_plugins.pytests/integration/test_task_expansion_flow.pytests/llm/test_codex_executor.pytests/llm/test_llm_claude.pytests/llm/test_resolver.pytests/mcp_proxy/services/__init__.pytests/mcp_proxy/services/test_system.pytests/mcp_proxy/test_actions.pytests/mcp_proxy/test_internal_registries.pytests/mcp_proxy/test_lazy.pytests/mcp_proxy/test_manager_coverage.pytests/mcp_proxy/test_mcp_tools.pytests/mcp_proxy/test_mcp_tools_session_messages.pytests/mcp_proxy/test_validation_integration.pytests/mcp_proxy/test_validation_mcp_tools.pytests/mcp_proxy/tools/test_agents.pytests/mcp_proxy/tools/test_session_messages_coverage.pytests/mcp_proxy/tools/test_task_expansion.pytests/mcp_proxy/tools/test_tasks_coverage.pytests/mcp_proxy/transports/__init__.pytests/mcp_proxy/transports/test_base.pytests/memory/test_extractor.pytests/memory/test_manager.pytests/servers/test_http_coverage.pytests/servers/test_http_server.pytests/servers/test_mcp_routes.pytests/servers/test_sessions_routes.pytests/sessions/test_analyzer.pytests/sessions/test_sessions_processor_unit.pytests/sessions/test_sessions_transcripts_claude.pytests/sessions/test_sessions_transcripts_codex.pytests/sessions/test_sessions_transcripts_gemini.pytests/sessions/test_summary.pytests/sessions/test_transcript_parsers.pytests/storage/test_labels.pytests/storage/test_storage_agents.pytests/storage/test_storage_mcp.pytests/storage/test_storage_memories.pytests/storage/test_storage_sessions.pytests/storage/test_storage_tasks.pytests/sync/test_skill_sync.pytests/tasks/test_auto_decompose.pytests/tasks/test_commits.pytests/tasks/test_context.pytests/tasks/test_expansion_coverage.pytests/tasks/test_external_validator.pytests/tasks/test_research.pytests/tasks/test_sync_tasks.pytests/tasks/test_task_filters.pytests/tasks/test_validation.pytests/test_runner.pytests/utils/test_project_context.pytests/utils/test_utils_git.pytests/utils/test_utils_metrics.pytests/utils/test_utils_project_init.pytests/workflows/test_actions_coverage.pytests/workflows/test_artifact_actions.pytests/workflows/test_context_actions.pytests/workflows/test_engine_coverage.pytests/workflows/test_git_utils.pytests/workflows/test_hooks.pytests/workflows/test_llm_actions.pytests/workflows/test_loader.pytests/workflows/test_memory_actions.pytests/workflows/test_session_actions.pytests/workflows/test_summary_actions.pytests/workflows/test_task_enforcement.pytests/workflows/test_todo_actions.pytests/workflows/test_webhook_condition.pytests/workflows/test_workflow_actions.pytests/workflows/test_workflow_mcp_actions.pytests/worktrees/test_git.py
💤 Files with no reviewable changes (5)
- docs/workflow-actions.md
- docs/architecture/tasks_decomposition.md
- src/gobby/cli/tasks/main.py
- docs/architecture/app_decomposition.md
- src/gobby/cli/tasks/hooks.py
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
⚙️ CodeRabbit configuration file
src/**/*.py: - Check for proper type hints (project uses strict mypy)
- Ensure async functions are properly awaited
- Watch for potential security issues (SQL injection, command injection)
- Verify proper error handling
Files:
src/gobby/servers/http.pysrc/gobby/config/tasks.pysrc/gobby/servers/routes/dependencies.pysrc/gobby/autonomous/stop_registry.pysrc/gobby/workflows/actions.pysrc/gobby/mcp_proxy/registries.pysrc/gobby/workflows/autonomous_actions.pysrc/gobby/workflows/stop_signal_actions.pysrc/gobby/agents/runner.pysrc/gobby/mcp_proxy/tools/tasks.pysrc/gobby/mcp_proxy/tools/session_messages.pysrc/gobby/workflows/evaluator.pysrc/gobby/autonomous/stuck_detector.pysrc/gobby/tasks/commits.pysrc/gobby/cli/sessions.pysrc/gobby/workflows/loader.pysrc/gobby/servers/routes/sessions.pysrc/gobby/storage/tasks.pysrc/gobby/servers/routes/mcp.pysrc/gobby/storage/migrations.pysrc/gobby/hooks/hook_manager.pysrc/gobby/tasks/expansion.pysrc/gobby/autonomous/progress_tracker.pysrc/gobby/cli/workflows.pysrc/gobby/mcp_proxy/tools/worktrees.pysrc/gobby/autonomous/__init__.pysrc/gobby/mcp_proxy/metrics.pysrc/gobby/tasks/external_validator.pysrc/gobby/cli/__init__.pysrc/gobby/mcp_proxy/tools/metrics.pysrc/gobby/servers/websocket.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use asyncio with pytest configured withasyncio_mode = 'auto'for automatic async test detection
Use pytest markersslow,integration, ande2efor test categorization
Files:
tests/autonomous/__init__.pytests/agents/test_spawners.pytests/agents/spawners/test_embedded_spawner.pytests/agents/test_tty_config.pytests/adapters/test_codex.pytests/adapters/test_base.pytests/agents/spawners/test_headless_spawner.pytests/agents/test_registry.pytests/agents/spawners/test_windows_spawner.pytests/agents/spawners/__init__.pytests/agents/test_spawn.pytests/adapters/test_gemini.pytests/agents/test_runner.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure tests are meaningful, not just coverage padding
- Check for proper mocking and fixtures
- Verify async tests use pytest-asyncio markers
Files:
tests/autonomous/__init__.pytests/agents/test_spawners.pytests/agents/spawners/test_embedded_spawner.pytests/agents/test_tty_config.pytests/adapters/test_codex.pytests/adapters/test_base.pytests/agents/spawners/test_headless_spawner.pytests/agents/test_registry.pytests/agents/spawners/test_windows_spawner.pytests/agents/spawners/__init__.pytests/agents/test_spawn.pytests/adapters/test_gemini.pytests/agents/test_runner.py
src/gobby/mcp_proxy/**
⚙️ CodeRabbit configuration file
src/gobby/mcp_proxy/**: - This is the MCP (Model Context Protocol) proxy layer
- Check for proper tool schema validation
- Ensure tools return appropriate response structures
Files:
src/gobby/mcp_proxy/registries.pysrc/gobby/mcp_proxy/tools/tasks.pysrc/gobby/mcp_proxy/tools/session_messages.pysrc/gobby/mcp_proxy/tools/worktrees.pysrc/gobby/mcp_proxy/metrics.pysrc/gobby/mcp_proxy/tools/metrics.py
src/gobby/hooks/**
⚙️ CodeRabbit configuration file
src/gobby/hooks/**: - Hook handlers must be idempotent
- Check for proper event handling patterns
- Ensure plugins follow the HookPlugin interface
Files:
src/gobby/hooks/hook_manager.py
🧠 Learnings (6)
📚 Learning: 2025-12-17T19:24:27.134Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-17T19:24:27.134Z
Learning: Install and use bd git hooks by running `bd hooks install` to ensure DB ↔ JSONL consistency
Applied to files:
docs/guides/tasks.mddocs/plans/completed/TASKS.md
📚 Learning: 2025-12-17T19:54:30.415Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Applies to .gobby/tasks.jsonl : Include `.gobby/tasks.jsonl` in commits when completing work tracked in Gobby
Applied to files:
docs/guides/tasks.mddocs/plans/completed/TASKS.mdsrc/gobby/mcp_proxy/tools/tasks.py.gobby/tasks_meta.jsonsrc/gobby/tasks/commits.pysrc/gobby/storage/tasks.py
📚 Learning: 2025-12-17T19:54:30.415Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Use Gobby's native task system for ALL task tracking instead of markdown TODOs, task lists, or other tracking methods
Applied to files:
docs/guides/tasks.mddocs/plans/completed/TASKS.md
📚 Learning: 2025-12-17T19:54:30.415Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Do NOT duplicate tracking systems; rely solely on Gobby tasks
Applied to files:
docs/guides/tasks.mddocs/plans/completed/TASKS.md
📚 Learning: 2025-12-17T19:24:40.567Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Use `gobby-*` prefix for internal server names and route them locally via InternalRegistryManager, while proxying other servers via MCPClientManager
Applied to files:
src/gobby/servers/routes/dependencies.pysrc/gobby/servers/routes/mcp.py
📚 Learning: 2025-12-17T19:54:30.415Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.415Z
Learning: Store AI-generated planning and design documents (PLAN.md, IMPLEMENTATION.md, ARCHITECTURE.md, DESIGN.md, etc.) in a dedicated `history/` directory
Applied to files:
ROADMAP.md
🧬 Code graph analysis (28)
tests/agents/test_spawners.py (7)
src/gobby/agents/spawners/base.py (2)
EmbeddedPTYResult(61-85)TerminalType(22-46)src/gobby/agents/spawners/cross_platform.py (3)
AlacrittySpawner(92-145)KittySpawner(19-89)TmuxSpawner(148-266)src/gobby/agents/spawners/embedded.py (3)
EmbeddedSpawner(49-227)spawn(57-160)spawn_agent(162-227)src/gobby/agents/spawners/linux.py (2)
GnomeTerminalSpawner(16-69)KonsoleSpawner(72-125)src/gobby/agents/spawners/macos.py (4)
GhosttySpawner(27-105)ITermSpawner(108-190)TerminalAppSpawner(193-262)escape_applescript(19-24)tests/agents/spawners/test_embedded_spawner.py (1)
spawner(33-35)src/gobby/servers/websocket.py (1)
close(37-37)
tests/agents/spawners/test_embedded_spawner.py (3)
src/gobby/agents/spawners/base.py (1)
EmbeddedPTYResult(61-85)src/gobby/agents/spawners/embedded.py (4)
EmbeddedSpawner(49-227)_get_spawn_utils(30-46)spawn(57-160)spawn_agent(162-227)src/gobby/agents/spawn.py (1)
build_cli_command(147-211)
src/gobby/servers/routes/dependencies.py (2)
src/gobby/servers/routes/mcp.py (1)
mcp_proxy(990-1090)src/gobby/mcp_proxy/metrics.py (1)
ToolMetricsManager(71-620)
src/gobby/autonomous/stop_registry.py (1)
src/gobby/storage/database.py (3)
LocalDatabase(17-99)fetchone(66-69)fetchall(71-74)
tests/agents/test_tty_config.py (1)
src/gobby/agents/tty_config.py (9)
PlatformPreferences(39-73)TerminalConfig(18-36)TTYConfig(123-175)generate_default_tty_config(205-290)get_tty_config(297-307)load_tty_config(178-202)reload_tty_config(310-319)get_terminal_config(135-160)get_preferences(162-175)
src/gobby/workflows/autonomous_actions.py (3)
src/gobby/autonomous/progress_tracker.py (7)
ProgressTracker(87-446)ProgressType(20-33)clear_session(396-414)get_summary(233-315)is_stagnant(317-331)record_event(124-173)is_high_value(68-70)src/gobby/autonomous/stuck_detector.py (5)
StuckDetector(45-371)clear_session(306-326)detect_task_loop(142-187)is_stuck(274-304)record_task_selection(99-140)src/gobby/workflows/definitions.py (1)
WorkflowState(94-130)
src/gobby/workflows/stop_signal_actions.py (2)
src/gobby/autonomous/stop_registry.py (7)
StopRegistry(37-270)acknowledge(156-180)get_signal(113-142)is_pending(32-34)has_pending_signal(144-154)signal_stop(60-111)clear(182-202)src/gobby/workflows/definitions.py (1)
WorkflowState(94-130)
src/gobby/agents/runner.py (4)
src/gobby/agents/session.py (2)
create_child_session(142-206)ChildSessionConfig(23-48)src/gobby/llm/executor.py (1)
AgentResult(63-94)src/gobby/workflows/definitions.py (1)
WorkflowState(94-130)src/gobby/workflows/state_manager.py (1)
save_state(53-97)
src/gobby/mcp_proxy/tools/tasks.py (4)
src/gobby/tasks/commits.py (2)
is_doc_only_diff(120-145)summarize_diff_for_validation(148-245)src/gobby/agents/runner.py (1)
AgentRunner(147-920)src/gobby/config/tasks.py (1)
TaskValidationConfig(173-281)src/gobby/tasks/external_validator.py (1)
run_external_validation(44-97)
src/gobby/mcp_proxy/tools/session_messages.py (3)
src/gobby/cli/sessions.py (2)
_format_turns_for_llm(27-47)sessions(51-53)src/gobby/workflows/actions.py (1)
_format_turns_for_llm(836-838)src/gobby/sessions/summary.py (1)
_format_turns_for_llm(395-426)
src/gobby/workflows/evaluator.py (3)
src/gobby/workflows/definitions.py (1)
WorkflowState(94-130)src/gobby/workflows/webhook_executor.py (1)
json_body(34-46)src/gobby/autonomous/stop_registry.py (1)
has_pending_signal(144-154)
src/gobby/autonomous/stuck_detector.py (5)
tests/autonomous/test_autonomous.py (2)
progress_tracker(85-87)session_id(73-81)src/gobby/autonomous/progress_tracker.py (3)
ProgressTracker(87-446)get_summary(233-315)get_recent_events(416-446)src/gobby/storage/database.py (2)
LocalDatabase(17-99)fetchall(71-74)tests/memory/test_manager.py (1)
db(31-36)src/gobby/workflows/autonomous_actions.py (2)
record_task_selection(223-255)detect_task_loop(142-175)
tests/adapters/test_codex.py (2)
src/gobby/adapters/codex.py (4)
CodexConnectionState(66-72)CodexItem(98-105)CodexThread(76-82)CodexTurn(86-94)src/gobby/hooks/events.py (3)
HookEventType(19-55)HookResponse(111-136)SessionSource(58-64)
tests/adapters/test_base.py (2)
src/gobby/adapters/base.py (1)
BaseAdapter(17-91)src/gobby/hooks/events.py (4)
HookEvent(68-107)HookEventType(19-55)HookResponse(111-136)SessionSource(58-64)
tests/agents/spawners/test_headless_spawner.py (4)
src/gobby/agents/spawners/base.py (2)
HeadlessResult(89-104)get_output(102-104)src/gobby/agents/spawners/headless.py (2)
HeadlessSpawner(38-225)spawn_and_capture(95-158)src/gobby/agents/spawners/embedded.py (3)
_get_spawn_utils(30-46)spawn(57-160)spawn_agent(162-227)src/gobby/agents/spawn.py (2)
build_cli_command(147-211)_create_prompt_file(96-144)
src/gobby/servers/routes/sessions.py (2)
src/gobby/utils/metrics.py (1)
inc_counter(316-328)src/gobby/autonomous/stop_registry.py (3)
signal_stop(60-111)get_signal(113-142)clear(182-202)
tests/agents/test_registry.py (1)
src/gobby/agents/registry.py (12)
RunningAgent(25-86)RunningAgentRegistry(89-398)get_running_agent_registry(406-418)list_all(283-291)remove(192-219)get_by_session(221-235)get_by_pid(237-251)list_by_parent(253-268)list_by_mode(270-281)count_by_parent(303-316)cleanup_by_pids(318-351)add_event_callback(124-134)
tests/agents/spawners/test_windows_spawner.py (2)
src/gobby/agents/spawners/base.py (2)
SpawnResult(50-57)TerminalType(22-46)src/gobby/agents/spawners/windows.py (4)
CmdSpawner(76-136)PowerShellSpawner(139-217)WindowsTerminalSpawner(18-73)WSLSpawner(220-308)
src/gobby/servers/routes/mcp.py (3)
src/gobby/servers/routes/dependencies.py (2)
get_metrics_manager(115-118)get_server(39-47)src/gobby/mcp_proxy/metrics.py (2)
ToolMetricsManager(71-620)get_metrics(156-217)src/gobby/servers/http.py (1)
_resolve_project_id(234-270)
src/gobby/autonomous/progress_tracker.py (2)
src/gobby/storage/database.py (3)
LocalDatabase(17-99)fetchall(71-74)fetchone(66-69)tests/memory/test_manager.py (1)
db(31-36)
tests/agents/test_spawn.py (1)
src/gobby/agents/spawn.py (10)
spawn(285-341)PreparedSpawn(437-459)TerminalSpawner(214-433)_cleanup_all_prompt_files(85-93)_create_prompt_file(96-144)build_cli_command(147-211)prepare_terminal_spawn(462-554)read_prompt_from_env(557-582)spawn_agent(343-417)_write_prompt_file(419-433)
tests/adapters/test_gemini.py (2)
src/gobby/adapters/gemini.py (2)
GeminiAdapter(35-302)normalize_tool_name(127-136)src/gobby/hooks/events.py (4)
HookEvent(68-107)HookEventType(19-55)HookResponse(111-136)SessionSource(58-64)
src/gobby/autonomous/__init__.py (3)
src/gobby/autonomous/progress_tracker.py (4)
ProgressEvent(58-70)ProgressSummary(74-84)ProgressTracker(87-446)ProgressType(20-33)src/gobby/autonomous/stop_registry.py (2)
StopRegistry(37-270)StopSignal(22-34)src/gobby/autonomous/stuck_detector.py (3)
StuckDetectionResult(35-42)StuckDetector(45-371)TaskSelectionEvent(25-31)
src/gobby/mcp_proxy/metrics.py (1)
src/gobby/mcp_proxy/tools/metrics.py (2)
get_failing_tools(104-133)reset_metrics(174-201)
src/gobby/tasks/external_validator.py (3)
src/gobby/config/tasks.py (1)
TaskValidationConfig(173-281)src/gobby/llm/service.py (1)
LLMService(27-238)src/gobby/agents/runner.py (2)
AgentRunner(147-920)AgentConfig(35-111)
src/gobby/cli/__init__.py (1)
src/gobby/cli/workflows.py (1)
workflows(39-41)
src/gobby/mcp_proxy/tools/metrics.py (2)
src/gobby/mcp_proxy/tools/internal.py (1)
tool(75-140)src/gobby/mcp_proxy/metrics.py (2)
get_failing_tools(292-343)reset_metrics(345-384)
src/gobby/servers/websocket.py (3)
tests/autonomous/test_autonomous.py (3)
stop_registry(91-93)session_id(73-81)signal(947-952)src/gobby/autonomous/stop_registry.py (1)
signal_stop(60-111)tests/servers/test_http_server.py (1)
signal_stop(949-954)
🪛 GitHub Actions: CI
tests/agents/test_spawners.py
[error] 508-508: SpawnResult pid is None when success expected.
src/gobby/autonomous/stop_registry.py
[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.
src/gobby/autonomous/stuck_detector.py
[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.
src/gobby/cli/sessions.py
[error] 610-610: Incompatible types in assignment (expression has type "str", variable has type "TextIOWrapper[_WrappedBuffer]") [assignment]
src/gobby/mcp_proxy/tools/worktrees.py
[warning] 57-57: Bandit: B108 hardcoded_tmp_directory: probable insecure usage of temporary directory /tmp.
src/gobby/mcp_proxy/metrics.py
[warning] 189-189: Bandit: B608 hardcoded_sql_expressions: possible SQL injection risk due to string-based query construction.
🪛 markdownlint-cli2 (0.18.1)
GEMINI.md
59-59: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test (Python 3.12)
- GitHub Check: Test (Python 3.11)
| """Stop signal registry for autonomous session management. | ||
|
|
||
| Provides thread-safe stop signal management for autonomous workflows. | ||
| External systems (HTTP, WebSocket, CLI, MCP) can signal sessions to stop | ||
| gracefully, and workflows can check for pending stop signals at step | ||
| transitions. | ||
| """ | ||
|
|
There was a problem hiding this comment.
Pipeline failure: formatting required.
The CI pipeline reports that this file needs reformatting to satisfy ruff format standards. Run ruff format src/gobby/autonomous/stop_registry.py to fix.
🧰 Tools
🪛 GitHub Actions: CI
[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.
🤖 Prompt for AI Agents
In @src/gobby/autonomous/stop_registry.py around lines 1 - 8, The file needs to
be reformatted to satisfy ruff; run the formatter against the module (e.g., run
"ruff format src/gobby/autonomous/stop_registry.py" or run ruff format for the
repo) to apply the required whitespace and style changes so the module-level
docstring and file formatting comply with CI checks; commit the reformatted file
(stop_registry.py) and push.
| """Stuck detection for autonomous session management. | ||
|
|
||
| Provides multi-layer stuck detection for autonomous workflows: | ||
| 1. Task selection loop detection - same tasks being selected repeatedly | ||
| 2. Progress stagnation - no meaningful progress being made | ||
| 3. Tool call patterns - repeated identical tool calls | ||
| """ |
There was a problem hiding this comment.
Pipeline failure: File requires formatting.
The CI pipeline reports this file needs to be reformatted to satisfy ruff format standards. Run ruff format src/gobby/autonomous/stuck_detector.py to fix.
🧰 Tools
🪛 GitHub Actions: CI
[error] 1-1: ruff format check reported: 4 files would be reformatted. File would be reformatted to satisfy formatting standards.
🤖 Prompt for AI Agents
In @src/gobby/autonomous/stuck_detector.py around lines 1 - 7, The file
src/gobby/autonomous/stuck_detector.py fails CI formatting; run the formatter
(ruff format src/gobby/autonomous/stuck_detector.py) or apply ruff-compatible
formatting to the module (ensure the top-level docstring and surrounding
whitespace follow ruff/PEP8 rules) and commit the formatted file so the pipeline
passes.
| class TestAdapterIntegration: | ||
| """Integration tests for adapter behavior with realistic scenarios.""" | ||
|
|
||
| def test_session_lifecycle_simulation(self): | ||
| """Simulate full session lifecycle through adapter.""" | ||
| events_processed = [] | ||
|
|
||
| class LifecycleAdapter(BaseAdapter): | ||
| source = SessionSource.CLAUDE | ||
|
|
||
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | ||
| event_type_map = { | ||
| "session-start": HookEventType.SESSION_START, | ||
| "pre-tool-use": HookEventType.BEFORE_TOOL, | ||
| "post-tool-use": HookEventType.AFTER_TOOL, | ||
| "session-end": HookEventType.SESSION_END, | ||
| } | ||
| hook_type = native_event.get("hook_type") | ||
| if hook_type not in event_type_map: | ||
| return None | ||
|
|
||
| events_processed.append(hook_type) | ||
| return HookEvent( | ||
| event_type=event_type_map[hook_type], | ||
| session_id=native_event.get("session_id", "test-sess"), | ||
| source=SessionSource.CLAUDE, | ||
| timestamp=datetime.now(UTC), | ||
| data=native_event.get("input_data", {}), | ||
| ) | ||
|
|
||
| def translate_from_hook_response(self, response: HookResponse) -> dict: | ||
| return {"continue": response.decision == "allow"} | ||
|
|
||
| adapter = LifecycleAdapter() | ||
| mock_manager = MagicMock() | ||
| mock_manager.handle.return_value = HookResponse(decision="allow") | ||
|
|
||
| # Simulate session lifecycle | ||
| session_events = [ | ||
| {"hook_type": "session-start", "session_id": "sess-1"}, | ||
| {"hook_type": "pre-tool-use", "session_id": "sess-1"}, | ||
| {"hook_type": "post-tool-use", "session_id": "sess-1"}, | ||
| {"hook_type": "pre-tool-use", "session_id": "sess-1"}, | ||
| {"hook_type": "post-tool-use", "session_id": "sess-1"}, | ||
| {"hook_type": "session-end", "session_id": "sess-1"}, | ||
| ] | ||
|
|
||
| for event in session_events: | ||
| result = adapter.handle_native(event, mock_manager) | ||
| assert result["continue"] is True | ||
|
|
||
| assert events_processed == [ | ||
| "session-start", | ||
| "pre-tool-use", | ||
| "post-tool-use", | ||
| "pre-tool-use", | ||
| "post-tool-use", | ||
| "session-end", | ||
| ] | ||
| assert mock_manager.handle.call_count == 6 | ||
|
|
||
| def test_tool_blocking_scenario(self): | ||
| """Simulate tool being blocked by hook manager.""" | ||
|
|
||
| class BlockingAdapter(BaseAdapter): | ||
| source = SessionSource.CLAUDE | ||
|
|
||
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | ||
| return HookEvent( | ||
| event_type=HookEventType.BEFORE_TOOL, | ||
| session_id="blocking-session", | ||
| source=SessionSource.CLAUDE, | ||
| timestamp=datetime.now(UTC), | ||
| data=native_event.get("input_data", {}), | ||
| ) | ||
|
|
||
| def translate_from_hook_response(self, response: HookResponse) -> dict: | ||
| result = {"continue": response.decision == "allow"} | ||
| if response.reason: | ||
| result["stopReason"] = response.reason | ||
| if response.system_message: | ||
| result["systemMessage"] = response.system_message | ||
| return result | ||
|
|
||
| adapter = BlockingAdapter() | ||
| mock_manager = MagicMock() | ||
| mock_manager.handle.return_value = HookResponse( | ||
| decision="deny", | ||
| reason="No task claimed", | ||
| system_message="Please claim a task before editing files", | ||
| ) | ||
|
|
||
| result = adapter.handle_native( | ||
| {"input_data": {"tool_name": "Write"}}, | ||
| mock_manager, | ||
| ) | ||
|
|
||
| assert result["continue"] is False | ||
| assert result["stopReason"] == "No task claimed" | ||
| assert "Please claim a task" in result["systemMessage"] | ||
|
|
||
| def test_context_injection_scenario(self): | ||
| """Simulate context being injected via hook response.""" | ||
|
|
||
| class ContextAdapter(BaseAdapter): | ||
| source = SessionSource.CLAUDE | ||
|
|
||
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | ||
| return HookEvent( | ||
| event_type=HookEventType.SESSION_START, | ||
| session_id="context-session", | ||
| source=SessionSource.CLAUDE, | ||
| timestamp=datetime.now(UTC), | ||
| data={}, | ||
| ) | ||
|
|
||
| def translate_from_hook_response(self, response: HookResponse) -> dict: | ||
| result = {"continue": True} | ||
| if response.context: | ||
| result["result"] = response.context | ||
| return result | ||
|
|
||
| adapter = ContextAdapter() | ||
| mock_manager = MagicMock() | ||
| mock_manager.handle.return_value = HookResponse( | ||
| decision="allow", | ||
| context="## Continuation Context\nYou have 3 pending tasks.", | ||
| ) | ||
|
|
||
| result = adapter.handle_native({}, mock_manager) | ||
|
|
||
| assert result["continue"] is True | ||
| assert "Continuation Context" in result["result"] | ||
| assert "3 pending tasks" in result["result"] | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add integration marker to integration tests.
The TestAdapterIntegration class contains integration-style tests that simulate full lifecycle scenarios. According to the coding guidelines, these should be marked with @pytest.mark.integration.
🏷️ Add integration marker
Add the marker to the class:
+@pytest.mark.integration
class TestAdapterIntegration:
"""Integration tests for adapter behavior with realistic scenarios."""Based on coding guidelines for test categorization.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class TestAdapterIntegration: | |
| """Integration tests for adapter behavior with realistic scenarios.""" | |
| def test_session_lifecycle_simulation(self): | |
| """Simulate full session lifecycle through adapter.""" | |
| events_processed = [] | |
| class LifecycleAdapter(BaseAdapter): | |
| source = SessionSource.CLAUDE | |
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | |
| event_type_map = { | |
| "session-start": HookEventType.SESSION_START, | |
| "pre-tool-use": HookEventType.BEFORE_TOOL, | |
| "post-tool-use": HookEventType.AFTER_TOOL, | |
| "session-end": HookEventType.SESSION_END, | |
| } | |
| hook_type = native_event.get("hook_type") | |
| if hook_type not in event_type_map: | |
| return None | |
| events_processed.append(hook_type) | |
| return HookEvent( | |
| event_type=event_type_map[hook_type], | |
| session_id=native_event.get("session_id", "test-sess"), | |
| source=SessionSource.CLAUDE, | |
| timestamp=datetime.now(UTC), | |
| data=native_event.get("input_data", {}), | |
| ) | |
| def translate_from_hook_response(self, response: HookResponse) -> dict: | |
| return {"continue": response.decision == "allow"} | |
| adapter = LifecycleAdapter() | |
| mock_manager = MagicMock() | |
| mock_manager.handle.return_value = HookResponse(decision="allow") | |
| # Simulate session lifecycle | |
| session_events = [ | |
| {"hook_type": "session-start", "session_id": "sess-1"}, | |
| {"hook_type": "pre-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "post-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "pre-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "post-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "session-end", "session_id": "sess-1"}, | |
| ] | |
| for event in session_events: | |
| result = adapter.handle_native(event, mock_manager) | |
| assert result["continue"] is True | |
| assert events_processed == [ | |
| "session-start", | |
| "pre-tool-use", | |
| "post-tool-use", | |
| "pre-tool-use", | |
| "post-tool-use", | |
| "session-end", | |
| ] | |
| assert mock_manager.handle.call_count == 6 | |
| def test_tool_blocking_scenario(self): | |
| """Simulate tool being blocked by hook manager.""" | |
| class BlockingAdapter(BaseAdapter): | |
| source = SessionSource.CLAUDE | |
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | |
| return HookEvent( | |
| event_type=HookEventType.BEFORE_TOOL, | |
| session_id="blocking-session", | |
| source=SessionSource.CLAUDE, | |
| timestamp=datetime.now(UTC), | |
| data=native_event.get("input_data", {}), | |
| ) | |
| def translate_from_hook_response(self, response: HookResponse) -> dict: | |
| result = {"continue": response.decision == "allow"} | |
| if response.reason: | |
| result["stopReason"] = response.reason | |
| if response.system_message: | |
| result["systemMessage"] = response.system_message | |
| return result | |
| adapter = BlockingAdapter() | |
| mock_manager = MagicMock() | |
| mock_manager.handle.return_value = HookResponse( | |
| decision="deny", | |
| reason="No task claimed", | |
| system_message="Please claim a task before editing files", | |
| ) | |
| result = adapter.handle_native( | |
| {"input_data": {"tool_name": "Write"}}, | |
| mock_manager, | |
| ) | |
| assert result["continue"] is False | |
| assert result["stopReason"] == "No task claimed" | |
| assert "Please claim a task" in result["systemMessage"] | |
| def test_context_injection_scenario(self): | |
| """Simulate context being injected via hook response.""" | |
| class ContextAdapter(BaseAdapter): | |
| source = SessionSource.CLAUDE | |
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | |
| return HookEvent( | |
| event_type=HookEventType.SESSION_START, | |
| session_id="context-session", | |
| source=SessionSource.CLAUDE, | |
| timestamp=datetime.now(UTC), | |
| data={}, | |
| ) | |
| def translate_from_hook_response(self, response: HookResponse) -> dict: | |
| result = {"continue": True} | |
| if response.context: | |
| result["result"] = response.context | |
| return result | |
| adapter = ContextAdapter() | |
| mock_manager = MagicMock() | |
| mock_manager.handle.return_value = HookResponse( | |
| decision="allow", | |
| context="## Continuation Context\nYou have 3 pending tasks.", | |
| ) | |
| result = adapter.handle_native({}, mock_manager) | |
| assert result["continue"] is True | |
| assert "Continuation Context" in result["result"] | |
| assert "3 pending tasks" in result["result"] | |
| @pytest.mark.integration | |
| class TestAdapterIntegration: | |
| """Integration tests for adapter behavior with realistic scenarios.""" | |
| def test_session_lifecycle_simulation(self): | |
| """Simulate full session lifecycle through adapter.""" | |
| events_processed = [] | |
| class LifecycleAdapter(BaseAdapter): | |
| source = SessionSource.CLAUDE | |
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | |
| event_type_map = { | |
| "session-start": HookEventType.SESSION_START, | |
| "pre-tool-use": HookEventType.BEFORE_TOOL, | |
| "post-tool-use": HookEventType.AFTER_TOOL, | |
| "session-end": HookEventType.SESSION_END, | |
| } | |
| hook_type = native_event.get("hook_type") | |
| if hook_type not in event_type_map: | |
| return None | |
| events_processed.append(hook_type) | |
| return HookEvent( | |
| event_type=event_type_map[hook_type], | |
| session_id=native_event.get("session_id", "test-sess"), | |
| source=SessionSource.CLAUDE, | |
| timestamp=datetime.now(UTC), | |
| data=native_event.get("input_data", {}), | |
| ) | |
| def translate_from_hook_response(self, response: HookResponse) -> dict: | |
| return {"continue": response.decision == "allow"} | |
| adapter = LifecycleAdapter() | |
| mock_manager = MagicMock() | |
| mock_manager.handle.return_value = HookResponse(decision="allow") | |
| # Simulate session lifecycle | |
| session_events = [ | |
| {"hook_type": "session-start", "session_id": "sess-1"}, | |
| {"hook_type": "pre-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "post-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "pre-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "post-tool-use", "session_id": "sess-1"}, | |
| {"hook_type": "session-end", "session_id": "sess-1"}, | |
| ] | |
| for event in session_events: | |
| result = adapter.handle_native(event, mock_manager) | |
| assert result["continue"] is True | |
| assert events_processed == [ | |
| "session-start", | |
| "pre-tool-use", | |
| "post-tool-use", | |
| "pre-tool-use", | |
| "post-tool-use", | |
| "session-end", | |
| ] | |
| assert mock_manager.handle.call_count == 6 | |
| def test_tool_blocking_scenario(self): | |
| """Simulate tool being blocked by hook manager.""" | |
| class BlockingAdapter(BaseAdapter): | |
| source = SessionSource.CLAUDE | |
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | |
| return HookEvent( | |
| event_type=HookEventType.BEFORE_TOOL, | |
| session_id="blocking-session", | |
| source=SessionSource.CLAUDE, | |
| timestamp=datetime.now(UTC), | |
| data=native_event.get("input_data", {}), | |
| ) | |
| def translate_from_hook_response(self, response: HookResponse) -> dict: | |
| result = {"continue": response.decision == "allow"} | |
| if response.reason: | |
| result["stopReason"] = response.reason | |
| if response.system_message: | |
| result["systemMessage"] = response.system_message | |
| return result | |
| adapter = BlockingAdapter() | |
| mock_manager = MagicMock() | |
| mock_manager.handle.return_value = HookResponse( | |
| decision="deny", | |
| reason="No task claimed", | |
| system_message="Please claim a task before editing files", | |
| ) | |
| result = adapter.handle_native( | |
| {"input_data": {"tool_name": "Write"}}, | |
| mock_manager, | |
| ) | |
| assert result["continue"] is False | |
| assert result["stopReason"] == "No task claimed" | |
| assert "Please claim a task" in result["systemMessage"] | |
| def test_context_injection_scenario(self): | |
| """Simulate context being injected via hook response.""" | |
| class ContextAdapter(BaseAdapter): | |
| source = SessionSource.CLAUDE | |
| def translate_to_hook_event(self, native_event: dict) -> HookEvent | None: | |
| return HookEvent( | |
| event_type=HookEventType.SESSION_START, | |
| session_id="context-session", | |
| source=SessionSource.CLAUDE, | |
| timestamp=datetime.now(UTC), | |
| data={}, | |
| ) | |
| def translate_from_hook_response(self, response: HookResponse) -> dict: | |
| result = {"continue": True} | |
| if response.context: | |
| result["result"] = response.context | |
| return result | |
| adapter = ContextAdapter() | |
| mock_manager = MagicMock() | |
| mock_manager.handle.return_value = HookResponse( | |
| decision="allow", | |
| context="## Continuation Context\nYou have 3 pending tasks.", | |
| ) | |
| result = adapter.handle_native({}, mock_manager) | |
| assert result["continue"] is True | |
| assert "Continuation Context" in result["result"] | |
| assert "3 pending tasks" in result["result"] |
🤖 Prompt for AI Agents
In @tests/adapters/test_base.py around lines 475 - 609, Add the pytest
integration marker to the TestAdapterIntegration test class: annotate the class
TestAdapterIntegration with @pytest.mark.integration and ensure pytest is
imported in the test module if not already; this will mark all methods
(test_session_lifecycle_simulation, test_tool_blocking_scenario,
test_context_injection_scenario) as integration tests.
| class TestCodexAdapterEventMapping: | ||
| """Tests verifying event type mapping constants.""" | ||
|
|
||
| def test_event_map_contains_all_supported_events(self): | ||
| """EVENT_MAP contains all events we claim to support.""" | ||
| expected_methods = [ | ||
| "thread/started", | ||
| "thread/archive", | ||
| "turn/started", | ||
| "turn/completed", | ||
| "item/commandExecution/requestApproval", | ||
| "item/fileChange/requestApproval", | ||
| "item/completed", | ||
| ] | ||
|
|
||
| for method in expected_methods: | ||
| assert method in CodexAdapter.EVENT_MAP | ||
|
|
||
| def test_tool_item_types_complete(self): | ||
| """TOOL_ITEM_TYPES contains all tool-related item types.""" | ||
| assert "commandExecution" in CodexAdapter.TOOL_ITEM_TYPES | ||
| assert "fileChange" in CodexAdapter.TOOL_ITEM_TYPES | ||
| assert "mcpToolCall" in CodexAdapter.TOOL_ITEM_TYPES | ||
|
|
||
| def test_session_tracking_events_complete(self): | ||
| """SESSION_TRACKING_EVENTS contains necessary events.""" | ||
| assert "thread/started" in CodexAdapter.SESSION_TRACKING_EVENTS | ||
| assert "turn/started" in CodexAdapter.SESSION_TRACKING_EVENTS | ||
| assert "turn/completed" in CodexAdapter.SESSION_TRACKING_EVENTS | ||
| assert "item/completed" in CodexAdapter.SESSION_TRACKING_EVENTS |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add integration marker to API surface tests.
The TestCodexAdapterEventMapping class validates the public API surface and event mapping constants. According to the coding guidelines, integration tests should be marked with @pytest.mark.integration.
🏷️ Add integration marker
+@pytest.mark.integration
class TestCodexAdapterEventMapping:
"""Tests verifying event type mapping constants."""Based on coding guidelines for test categorization.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class TestCodexAdapterEventMapping: | |
| """Tests verifying event type mapping constants.""" | |
| def test_event_map_contains_all_supported_events(self): | |
| """EVENT_MAP contains all events we claim to support.""" | |
| expected_methods = [ | |
| "thread/started", | |
| "thread/archive", | |
| "turn/started", | |
| "turn/completed", | |
| "item/commandExecution/requestApproval", | |
| "item/fileChange/requestApproval", | |
| "item/completed", | |
| ] | |
| for method in expected_methods: | |
| assert method in CodexAdapter.EVENT_MAP | |
| def test_tool_item_types_complete(self): | |
| """TOOL_ITEM_TYPES contains all tool-related item types.""" | |
| assert "commandExecution" in CodexAdapter.TOOL_ITEM_TYPES | |
| assert "fileChange" in CodexAdapter.TOOL_ITEM_TYPES | |
| assert "mcpToolCall" in CodexAdapter.TOOL_ITEM_TYPES | |
| def test_session_tracking_events_complete(self): | |
| """SESSION_TRACKING_EVENTS contains necessary events.""" | |
| assert "thread/started" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| assert "turn/started" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| assert "turn/completed" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| assert "item/completed" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| @pytest.mark.integration | |
| class TestCodexAdapterEventMapping: | |
| """Tests verifying event type mapping constants.""" | |
| def test_event_map_contains_all_supported_events(self): | |
| """EVENT_MAP contains all events we claim to support.""" | |
| expected_methods = [ | |
| "thread/started", | |
| "thread/archive", | |
| "turn/started", | |
| "turn/completed", | |
| "item/commandExecution/requestApproval", | |
| "item/fileChange/requestApproval", | |
| "item/completed", | |
| ] | |
| for method in expected_methods: | |
| assert method in CodexAdapter.EVENT_MAP | |
| def test_tool_item_types_complete(self): | |
| """TOOL_ITEM_TYPES contains all tool-related item types.""" | |
| assert "commandExecution" in CodexAdapter.TOOL_ITEM_TYPES | |
| assert "fileChange" in CodexAdapter.TOOL_ITEM_TYPES | |
| assert "mcpToolCall" in CodexAdapter.TOOL_ITEM_TYPES | |
| def test_session_tracking_events_complete(self): | |
| """SESSION_TRACKING_EVENTS contains necessary events.""" | |
| assert "thread/started" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| assert "turn/started" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| assert "turn/completed" in CodexAdapter.SESSION_TRACKING_EVENTS | |
| assert "item/completed" in CodexAdapter.SESSION_TRACKING_EVENTS |
🤖 Prompt for AI Agents
In @tests/adapters/test_codex.py around lines 1507 - 1536, Add the pytest
integration marker to the TestCodexAdapterEventMapping test class by decorating
the class with @pytest.mark.integration so all its tests are categorized as
integration tests; ensure pytest is imported in the test module if not already
present. Target the TestCodexAdapterEventMapping class (and its methods
test_event_map_contains_all_supported_events, test_tool_item_types_complete,
test_session_tracking_events_complete) when applying the decorator.
| class TestIntegration: | ||
| """Integration tests for full round-trip scenarios.""" | ||
|
|
||
| @pytest.fixture | ||
| def adapter(self): | ||
| """Create a GeminiAdapter instance.""" | ||
| return GeminiAdapter() | ||
|
|
||
| @pytest.fixture | ||
| def mock_hook_manager(self): | ||
| """Create a mock HookManager.""" | ||
| return MagicMock() | ||
|
|
||
| def test_session_lifecycle_roundtrip(self, adapter, mock_hook_manager): | ||
| """Tests full session start/end lifecycle.""" | ||
| # Session start | ||
| mock_hook_manager.handle.return_value = HookResponse( | ||
| decision="allow", | ||
| context="Welcome! You have 3 pending tasks.", | ||
| ) | ||
|
|
||
| start_event = { | ||
| "hook_type": "SessionStart", | ||
| "input_data": { | ||
| "session_id": "gemini-lifecycle-123", | ||
| "cwd": "/home/user/project", | ||
| "timestamp": "2025-01-15T10:00:00Z", | ||
| }, | ||
| } | ||
|
|
||
| start_result = adapter.handle_native(start_event, mock_hook_manager) | ||
|
|
||
| assert start_result["decision"] == "allow" | ||
| assert "pending tasks" in start_result["hookSpecificOutput"]["additionalContext"] | ||
|
|
||
| # Session end | ||
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | ||
|
|
||
| end_event = { | ||
| "hook_type": "SessionEnd", | ||
| "input_data": { | ||
| "session_id": "gemini-lifecycle-123", | ||
| "timestamp": "2025-01-15T11:00:00Z", | ||
| }, | ||
| } | ||
|
|
||
| end_result = adapter.handle_native(end_event, mock_hook_manager) | ||
|
|
||
| assert end_result["decision"] == "allow" | ||
|
|
||
| def test_tool_execution_roundtrip(self, adapter, mock_hook_manager): | ||
| """Tests full tool execution lifecycle.""" | ||
| # Before tool | ||
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | ||
|
|
||
| before_event = { | ||
| "hook_type": "BeforeTool", | ||
| "input_data": { | ||
| "session_id": "gemini-tool-456", | ||
| "tool_name": "WriteFileTool", | ||
| "tool_input": { | ||
| "path": "/tmp/test.txt", | ||
| "content": "Hello, World!", | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| before_result = adapter.handle_native(before_event, mock_hook_manager) | ||
|
|
||
| assert before_result["decision"] == "allow" | ||
|
|
||
| # Verify the tool name was normalized in the HookEvent | ||
| call_args = mock_hook_manager.handle.call_args[0][0] | ||
| assert call_args.metadata["normalized_tool_name"] == "Write" | ||
|
|
||
| # After tool | ||
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | ||
|
|
||
| after_event = { | ||
| "hook_type": "AfterTool", | ||
| "input_data": { | ||
| "session_id": "gemini-tool-456", | ||
| "tool_name": "WriteFileTool", | ||
| "tool_output": {"success": True, "bytes_written": 13}, | ||
| }, | ||
| } | ||
|
|
||
| after_result = adapter.handle_native(after_event, mock_hook_manager) | ||
|
|
||
| assert after_result["decision"] == "allow" | ||
|
|
||
| def test_tool_denied_by_workflow(self, adapter, mock_hook_manager): | ||
| """Tests tool denial scenario.""" | ||
| mock_hook_manager.handle.return_value = HookResponse( | ||
| decision="deny", | ||
| reason="No task claimed. Use gobby-tasks.create_task() first.", | ||
| system_message="File modifications blocked: claim a task first.", | ||
| ) | ||
|
|
||
| native_event = { | ||
| "hook_type": "BeforeTool", | ||
| "input_data": { | ||
| "session_id": "gemini-deny-789", | ||
| "tool_name": "EditFileTool", | ||
| "tool_input": { | ||
| "path": "/src/main.py", | ||
| "edit": "...", | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| result = adapter.handle_native(native_event, mock_hook_manager) | ||
|
|
||
| assert result["decision"] == "deny" | ||
| assert "No task claimed" in result["reason"] | ||
| assert result["systemMessage"] == "File modifications blocked: claim a task first." |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add integration marker to integration tests.
The TestIntegration class contains integration tests that simulate full lifecycle scenarios including session start/end and tool execution. According to the coding guidelines, these should be marked with @pytest.mark.integration.
🏷️ Add integration marker
+@pytest.mark.integration
class TestIntegration:
"""Integration tests for full round-trip scenarios."""Based on coding guidelines for test categorization.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class TestIntegration: | |
| """Integration tests for full round-trip scenarios.""" | |
| @pytest.fixture | |
| def adapter(self): | |
| """Create a GeminiAdapter instance.""" | |
| return GeminiAdapter() | |
| @pytest.fixture | |
| def mock_hook_manager(self): | |
| """Create a mock HookManager.""" | |
| return MagicMock() | |
| def test_session_lifecycle_roundtrip(self, adapter, mock_hook_manager): | |
| """Tests full session start/end lifecycle.""" | |
| # Session start | |
| mock_hook_manager.handle.return_value = HookResponse( | |
| decision="allow", | |
| context="Welcome! You have 3 pending tasks.", | |
| ) | |
| start_event = { | |
| "hook_type": "SessionStart", | |
| "input_data": { | |
| "session_id": "gemini-lifecycle-123", | |
| "cwd": "/home/user/project", | |
| "timestamp": "2025-01-15T10:00:00Z", | |
| }, | |
| } | |
| start_result = adapter.handle_native(start_event, mock_hook_manager) | |
| assert start_result["decision"] == "allow" | |
| assert "pending tasks" in start_result["hookSpecificOutput"]["additionalContext"] | |
| # Session end | |
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | |
| end_event = { | |
| "hook_type": "SessionEnd", | |
| "input_data": { | |
| "session_id": "gemini-lifecycle-123", | |
| "timestamp": "2025-01-15T11:00:00Z", | |
| }, | |
| } | |
| end_result = adapter.handle_native(end_event, mock_hook_manager) | |
| assert end_result["decision"] == "allow" | |
| def test_tool_execution_roundtrip(self, adapter, mock_hook_manager): | |
| """Tests full tool execution lifecycle.""" | |
| # Before tool | |
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | |
| before_event = { | |
| "hook_type": "BeforeTool", | |
| "input_data": { | |
| "session_id": "gemini-tool-456", | |
| "tool_name": "WriteFileTool", | |
| "tool_input": { | |
| "path": "/tmp/test.txt", | |
| "content": "Hello, World!", | |
| }, | |
| }, | |
| } | |
| before_result = adapter.handle_native(before_event, mock_hook_manager) | |
| assert before_result["decision"] == "allow" | |
| # Verify the tool name was normalized in the HookEvent | |
| call_args = mock_hook_manager.handle.call_args[0][0] | |
| assert call_args.metadata["normalized_tool_name"] == "Write" | |
| # After tool | |
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | |
| after_event = { | |
| "hook_type": "AfterTool", | |
| "input_data": { | |
| "session_id": "gemini-tool-456", | |
| "tool_name": "WriteFileTool", | |
| "tool_output": {"success": True, "bytes_written": 13}, | |
| }, | |
| } | |
| after_result = adapter.handle_native(after_event, mock_hook_manager) | |
| assert after_result["decision"] == "allow" | |
| def test_tool_denied_by_workflow(self, adapter, mock_hook_manager): | |
| """Tests tool denial scenario.""" | |
| mock_hook_manager.handle.return_value = HookResponse( | |
| decision="deny", | |
| reason="No task claimed. Use gobby-tasks.create_task() first.", | |
| system_message="File modifications blocked: claim a task first.", | |
| ) | |
| native_event = { | |
| "hook_type": "BeforeTool", | |
| "input_data": { | |
| "session_id": "gemini-deny-789", | |
| "tool_name": "EditFileTool", | |
| "tool_input": { | |
| "path": "/src/main.py", | |
| "edit": "...", | |
| }, | |
| }, | |
| } | |
| result = adapter.handle_native(native_event, mock_hook_manager) | |
| assert result["decision"] == "deny" | |
| assert "No task claimed" in result["reason"] | |
| assert result["systemMessage"] == "File modifications blocked: claim a task first." | |
| @pytest.mark.integration | |
| class TestIntegration: | |
| """Integration tests for full round-trip scenarios.""" | |
| @pytest.fixture | |
| def adapter(self): | |
| """Create a GeminiAdapter instance.""" | |
| return GeminiAdapter() | |
| @pytest.fixture | |
| def mock_hook_manager(self): | |
| """Create a mock HookManager.""" | |
| return MagicMock() | |
| def test_session_lifecycle_roundtrip(self, adapter, mock_hook_manager): | |
| """Tests full session start/end lifecycle.""" | |
| # Session start | |
| mock_hook_manager.handle.return_value = HookResponse( | |
| decision="allow", | |
| context="Welcome! You have 3 pending tasks.", | |
| ) | |
| start_event = { | |
| "hook_type": "SessionStart", | |
| "input_data": { | |
| "session_id": "gemini-lifecycle-123", | |
| "cwd": "/home/user/project", | |
| "timestamp": "2025-01-15T10:00:00Z", | |
| }, | |
| } | |
| start_result = adapter.handle_native(start_event, mock_hook_manager) | |
| assert start_result["decision"] == "allow" | |
| assert "pending tasks" in start_result["hookSpecificOutput"]["additionalContext"] | |
| # Session end | |
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | |
| end_event = { | |
| "hook_type": "SessionEnd", | |
| "input_data": { | |
| "session_id": "gemini-lifecycle-123", | |
| "timestamp": "2025-01-15T11:00:00Z", | |
| }, | |
| } | |
| end_result = adapter.handle_native(end_event, mock_hook_manager) | |
| assert end_result["decision"] == "allow" | |
| def test_tool_execution_roundtrip(self, adapter, mock_hook_manager): | |
| """Tests full tool execution lifecycle.""" | |
| # Before tool | |
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | |
| before_event = { | |
| "hook_type": "BeforeTool", | |
| "input_data": { | |
| "session_id": "gemini-tool-456", | |
| "tool_name": "WriteFileTool", | |
| "tool_input": { | |
| "path": "/tmp/test.txt", | |
| "content": "Hello, World!", | |
| }, | |
| }, | |
| } | |
| before_result = adapter.handle_native(before_event, mock_hook_manager) | |
| assert before_result["decision"] == "allow" | |
| # Verify the tool name was normalized in the HookEvent | |
| call_args = mock_hook_manager.handle.call_args[0][0] | |
| assert call_args.metadata["normalized_tool_name"] == "Write" | |
| # After tool | |
| mock_hook_manager.handle.return_value = HookResponse(decision="allow") | |
| after_event = { | |
| "hook_type": "AfterTool", | |
| "input_data": { | |
| "session_id": "gemini-tool-456", | |
| "tool_name": "WriteFileTool", | |
| "tool_output": {"success": True, "bytes_written": 13}, | |
| }, | |
| } | |
| after_result = adapter.handle_native(after_event, mock_hook_manager) | |
| assert after_result["decision"] == "allow" | |
| def test_tool_denied_by_workflow(self, adapter, mock_hook_manager): | |
| """Tests tool denial scenario.""" | |
| mock_hook_manager.handle.return_value = HookResponse( | |
| decision="deny", | |
| reason="No task claimed. Use gobby-tasks.create_task() first.", | |
| system_message="File modifications blocked: claim a task first.", | |
| ) | |
| native_event = { | |
| "hook_type": "BeforeTool", | |
| "input_data": { | |
| "session_id": "gemini-deny-789", | |
| "tool_name": "EditFileTool", | |
| "tool_input": { | |
| "path": "/src/main.py", | |
| "edit": "...", | |
| }, | |
| }, | |
| } | |
| result = adapter.handle_native(native_event, mock_hook_manager) | |
| assert result["decision"] == "deny" | |
| assert "No task claimed" in result["reason"] | |
| assert result["systemMessage"] == "File modifications blocked: claim a task first." |
🤖 Prompt for AI Agents
In @tests/adapters/test_gemini.py around lines 885 - 1000, The TestIntegration
class lacks the integration marker; add the pytest integration marker by
decorating the TestIntegration class with @pytest.mark.integration (ensure
pytest is imported if not already) so all methods
(test_session_lifecycle_roundtrip, test_tool_execution_roundtrip,
test_tool_denied_by_workflow) are treated as integration tests; place the
decorator immediately above the class definition referencing the TestIntegration
symbol.
| def test_spawn_with_unicode_in_args(self): | ||
| """spawn() handles unicode in arguments.""" | ||
| spawner = HeadlessSpawner() | ||
| result = spawner.spawn( | ||
| ["echo", "hello \u4e16\u754c"], # "hello world" in Chinese | ||
| cwd="/tmp", | ||
| ) | ||
|
|
||
| assert result.success is True | ||
| stdout, _ = result.process.communicate() | ||
| assert "\u4e16\u754c" in stdout or "world" in stdout.lower() |
There was a problem hiding this comment.
Questionable fallback assertion for Unicode test.
Line 1027's fallback or "world" in stdout.lower() doesn't logically relate to testing Chinese characters \u4e16\u754c (世界, meaning "world" in Chinese). If the terminal doesn't support Unicode, this fallback masks the actual test failure. Consider removing it or using a more appropriate assertion.
🐛 Suggested fix
assert result.success is True
stdout, _ = result.process.communicate()
- assert "\u4e16\u754c" in stdout or "world" in stdout.lower()
+ assert "\u4e16\u754c" in stdout🤖 Prompt for AI Agents
In @tests/agents/spawners/test_headless_spawner.py around lines 1017 - 1027, The
test test_spawn_with_unicode_in_args should not accept an unrelated ASCII
fallback; remove the fallback check and assert only that the Unicode substring
"\u4e16\u754c" appears in the captured stdout after calling
HeadlessSpawner.spawn (check result.success and result.process.communicate() as
currently done), or alternatively make the test explicitly skip/mark xfail on
environments that cannot support Unicode; update the assertion to only verify
"\u4e16\u754c" in stdout (and keep the existing result.success check) so the
test fails when Unicode is lost rather than silently passing.
… table - Add migration 40 to rename column using ALTER TABLE RENAME COLUMN - Update all SQL queries to use task_type instead of type - Update Task.from_row() to read from task_type column - Remove outdated comments about DB column mapping This aligns DB column names with Python field names, supporting the safe_update helper for SQL injection remediation.
Remove obsolete type: ignore comments that mypy no longer requires: - json_helpers.py: msgspec decode type ignore - codex.py: fix variable shadowing (api_key -> env_api_key) - mcp.py: t.name attribute type ignore - stdio.py: 12 @mcp.tool() decorator type ignores
PR Review: v0.1.1This is a substantial PR merging dev into main with 88K+ additions across 100+ files. I've reviewed the major changes focusing on the new autonomous execution features, task management enhancements, and session handling improvements. SummaryMajor Features Added:
Overall Assessment: The code quality is high with good separation of concerns, comprehensive test coverage for new autonomous features, and proper error handling. There are a few areas for improvement noted below. Code Quality and Best PracticesStrengths
Areas for ImprovementSecurity: JSON Deserialization
Error Handling
Code Duplication
Input Validation
Potential BugsRace Condition in Stop Signal Creation
Timezone Handling Inconsistency
Performance ConsiderationsDatabase Indexes - Good coverage with partial index on session_stop_signals.acknowledged_at WHERE acknowledged_at IS NULL Query Optimization - ProgressTracker.get_summary makes 4 separate queries, could combine with CTEs for better performance Memory Usage - Loading entire transcript into memory in cli/sessions.py:421-433 could be problematic for very long sessions Security ConcernsCommand Injection - Safe, using list form of subprocess commands not shell=True Path Traversal - Low risk, proper Path.exists checks in place HTTP Endpoint Authorization - Stop signal endpoints lack authentication checks, anyone can stop any session if no upstream middleware Test CoverageExcellent for autonomous features:
Missing tests for:
Architecture and DesignExcellent Separation of Concerns - Three-layer stuck detection is well-designed:
Good Design Patterns - Registry, Tracker, and Strategy patterns used appropriately Recommendations SummaryHigh Priority
Medium Priority
Low Priority
ConclusionRecommendation: Approve with minor fixes This is a well-crafted PR with significant new functionality. The autonomous execution infrastructure is thoughtfully designed with excellent test coverage. Main concerns are:
The issues identified are not blockers but should be addressed before the next release. The autonomous features are production-ready, and the test coverage gives confidence in their reliability. Great work on the comprehensive testing and clean architecture! Review completed by: Claude Sonnet 4.5 |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/gobby/tasks/research.py (1)
143-146: Fix dictionary key access for tool objects returned by mcp_manager.list_tools().The code accesses
t.nameandt.descriptionas object attributes, butmcp_manager.list_tools()returns dictionaries (per the type signaturedict[str, list[dict[str, Any]]]in manager.py). This will raiseAttributeError: 'dict' object has no attribute 'name'at runtime. Use dictionary key access instead:Required change
for t in all_tools: if t["name"] in ("search_web", "google_search", "brave_search"): prompt += f"5. {t['name']}(query): {t['description'][:100]}...\n" breaksrc/gobby/tasks/validation.py (2)
500-500: Critical: Awaiting non-async function causes runtime error.Line 500 attempts to
awaitthe call togather_validation_context, but this method is not declared asasync(line 440). This will cause a runtime error whencontext_filesis provided.🐛 Fix: Remove await or make the method async
Option 1 (Recommended): Remove await since the function is synchronous
# Gather context if provided file_context = "" if context_files: - file_context = await self.gather_validation_context(context_files) + file_context = self.gather_validation_context(context_files)Option 2: Make gather_validation_context async (if future async file I/O is planned)
- async def gather_validation_context(self, file_paths: list[str]) -> str: + async def gather_validation_context(self, file_paths: list[str]) -> str:Then use async file I/O operations inside the method.
440-459: Arbitrary file read via unvalidated file paths ingather_validation_context.The method reads files directly from user-provided paths without validation. Since
file_pathsoriginates fromtask_data["files"](user input passed to the Task class), a caller could include paths like/etc/passwdor../../secretsto read files outside the intended project scope.Implement path validation before reading files:
- Resolve all paths to absolute paths and verify they're within an allowed directory
- Reject paths with
..components or absolute paths outside the project- Maintain an allowlist of safe directories if multiple locations need access
🤖 Fix all issues with AI agents
In @pyproject.toml:
- Line 141: The pyproject dependency line currently pins Black as
"black>=24.0.0"; update this constraint to a more recent stable release by
replacing that token with either "black>=25.0.0" (looser) or "black>=25.12.0"
(recommended/pinned) so the project uses the latest fixes and improvements;
ensure any lockfile or CI config referencing the old version is updated to
match.
In @scripts/cleanup_skills.py:
- Around line 17-20: The script creates LocalDatabase and LocalSkillManager
without running migrations; after instantiating LocalDatabase (symbol:
LocalDatabase) call run_migrations(db) before constructing LocalSkillManager
(symbol: LocalSkillManager) so the schema is applied; update the script to
import or reference run_migrations and invoke run_migrations(db) right after db
= LocalDatabase() and before manager = LocalSkillManager(db=db).
In @src/gobby/autonomous/stuck_detector.py:
- Around line 123-136: The INSERT uses str(context) which stores Python reprs;
change it to json.dumps(context) (and add/import json) so context is stored as
valid JSON in the task_selection_history row, and in get_selection_history
replace ast.literal_eval fallback by using json.loads only: treat NULL/None as
None, wrap json.loads in a try/except to catch JSONDecodeError and log/handle
gracefully, and remove use of ast.literal_eval to avoid security risks; refer to
the insert block around the task_selection_history INSERT and the
get_selection_history function for where to change code and add error handling.
In @src/gobby/utils/machine_id.py:
- Line 118: Remove the unnecessary str() conversion around machineid.id() in the
function that returns the machine id in machine_id.py: replace the return that
currently wraps machineid.id() with a direct return of machineid.id() (i.e.,
drop str()), leaving other conversions for uuid.uuid4() untouched; this keeps
the return type consistent and avoids redundant conversion.
🧹 Nitpick comments (18)
docs/plans/todo-list.md (1)
1-7: Use Gobby tasks instead of markdown TODO lists.According to team conventions documented in prior reviews, markdown TODO lists should not be created; use Gobby tasks instead for tracking planned work.
Please convert these items to Gobby tasks and remove this file, or confirm if this markdown file is intended for a different purpose (e.g., archived planning notes).
src/gobby/sessions/transcripts/claude.py (1)
211-213: Duplicate comment on lines 211-212.The comment
# Check for /clear command markerappears twice consecutively.Remove duplicate comment
- # Check for /clear command marker # Check for /clear command marker return "<command-name>/clear</command-name>" in str(content)src/gobby/cli/daemon.py (1)
161-168: Add type annotation for consistency with line 270.For consistency and to align with strict mypy requirements, this
status_kwargsshould have the same type annotation as the one at line 270.📝 Suggested type annotation
- status_kwargs = { + status_kwargs: dict[str, Any] = { "running": daemon_healthy, "pid": process.pid, "pid_file": str(pid_file), "log_files": str(log_file.parent), "http_port": http_port, "websocket_port": ws_port, }src/gobby/sessions/summary.py (1)
395-426: Consider consolidating duplicate implementations across the codebase.The type annotation update is correct. However, this method has nearly identical implementations in multiple files:
src/gobby/cli/sessions.py(lines 27-47)src/gobby/mcp_proxy/tools/session_messages.py(lines 100-120)The implementations are ~90% identical, with only this file including "thinking" block handling. Consider extracting this to a shared utility module to eliminate duplication and ensure consistency.
Based on relevant code snippets.
♻️ Consolidation approach
Create a shared utility module (e.g.,
src/gobby/sessions/formatting.py) and move this implementation there, then import and use it across all locations. Thesrc/gobby/workflows/actions.pyalready delegates to aformat_turns_for_llmfunction, which suggests this pattern is already partially established.src/gobby/servers/routes/mcp.py (2)
243-250: Consider documenting the project_id resolution behaviorWhen
project_idis not provided,_resolve_project_iduses the HTTP server's current working directory to find.gobby/project.json. In a server context, this directory is where the daemon was started, which may not be intuitive for API consumers.The current implementation handles this gracefully by skipping metrics enrichment if resolution fails. However, the docstring could clarify that:
- Metrics enrichment requires either an explicit
project_idparameter, or- The server must be running from within a Gobby project directory
📝 Suggested docstring enhancement
""" List tools from MCP servers. Args: server_filter: Optional server name to filter by include_metrics: When True, include call_count, success_rate, avg_latency for each tool - project_id: Project ID for metrics lookup (uses current project if not specified) + project_id: Project ID for metrics lookup. If not provided, attempts to infer from + the server's working directory. Metrics enrichment is skipped if no + project context can be determined. Returns: Dict of server names to tool lists """
1212-1213: Minor inconsistency in attribute access patternLine 1212 uses direct attribute access with a
type: ignorecomment, while line 1213 usesgetattr()for the same object. For consistency and to avoid the type: ignore directive, consider usinggetattr()for both:{ - "name": t.name, # type: ignore[attr-defined] + "name": getattr(t, "name", ""), "description": getattr(t, "description", ""), "inputSchema": schema, }This assumes
namecould potentially be missing (requiring a default), but ifnameis guaranteed to exist and an empty string default would be invalid, the current approach withtype: ignoreis acceptable for strict mypy compliance with external SDK types.src/gobby/workflows/templates.py (2)
31-39: Prefer bareraiseto preserve the original traceback.The
str()cast is fine for explicit typing. However,raise ere-raises the exception but may alter the traceback in some edge cases. Using bareraiseis the idiomatic pattern.♻️ Suggested change
except Exception as e: logger.error(f"Error rendering template: {e}", exc_info=True) - # Fallback to original string or raise? - # For workflows, it might be better to fail typically, but let's return error message in string for visibility if strict validation isn't on. - # actually, better to raise so the action fails and handles it. - raise e + raise
45-50: Same suggestion: prefer bareraise.The
str()cast is appropriate. Consider using bareraiseinstead ofraise efor consistency with Python best practices.♻️ Suggested change
except Exception as e: logger.error(f"Error rendering template file '{template_name}': {e}", exc_info=True) - raise e + raisesrc/gobby/cli/workflows.py (1)
193-195: Consider type guard pattern instead of bare assert for session_id.The
assert session_id is not Nonestatements are used for mypy type narrowing after the conditional branch guaranteessession_idis set. While this works, bareassertstatements can be stripped when Python runs with-O(optimize) flag, potentially causing issues in production.Since the preceding logic already raises
SystemExit(1)when no session is found, the assert is effectively unreachable in normal flow and serves only as a type guard for mypy. This is an acceptable pattern for internal CLI code, but for consistency you could consider an explicit check or atyping.cast.src/gobby/mcp_proxy/tools/session_messages.py (1)
101-121: Code duplication withformat_turns_for_llmin summary_actions.py.This helper duplicates logic from
src/gobby/sessions/summary.py(lines 394-425) andsrc/gobby/workflows/summary_actions.py. Notably, this version omits handling for"thinking"blocks that the summary.py version includes:# In summary.py but missing here: elif block.get("type") == "thinking": text_parts.append(f"[Thinking: {block.get('thinking', '')}]")Consider either:
- Consolidating into a shared utility function, or
- Adding the missing
"thinking"block handling for consistency♻️ Add missing thinking block handling
def _format_turns_for_llm(turns: list[dict[str, Any]]) -> str: """Format transcript turns for LLM analysis.""" formatted: list[str] = [] for i, turn in enumerate(turns): message = turn.get("message", {}) role = message.get("role", "unknown") content = message.get("content", "") if isinstance(content, list): text_parts: list[str] = [] for block in content: if isinstance(block, dict): if block.get("type") == "text": text_parts.append(str(block.get("text", ""))) + elif block.get("type") == "thinking": + text_parts.append(f"[Thinking: {block.get('thinking', '')}]") elif block.get("type") == "tool_use": text_parts.append(f"[Tool: {block.get('name', 'unknown')}]") content = " ".join(text_parts) formatted.append(f"[Turn {i + 1} - {role}]: {content}") return "\n\n".join(formatted)scripts/cleanup_skills.py (1)
13-13: Async function performs no async operations.
cleanup_dbis declaredasyncbut calls only synchronous methods (list_skills,delete_skill,input). Consider making it a regular function, or if async is intentional for future use, add a comment.src/gobby/servers/routes/sessions.py (1)
427-434: Consider extracting stop_registry access.The hook_manager/stop_registry validation is repeated across all three endpoints. A helper function could reduce duplication:
Optional helper extraction
def _get_stop_registry(request: Request) -> "StopRegistry": """Get stop registry from request state, raising 503 if unavailable.""" if not hasattr(request.app.state, "hook_manager"): raise HTTPException(status_code=503, detail="Hook manager not available") hook_manager = request.app.state.hook_manager if not hasattr(hook_manager, "_stop_registry") or not hook_manager._stop_registry: raise HTTPException(status_code=503, detail="Stop registry not available") return hook_manager._stop_registryAlso applies to: 487-494, 542-549
src/gobby/servers/websocket.py (1)
434-502: Minor documentation inconsistency:signal_idis actually thesession_id.The docstring at line 450 documents
"signal_id": "uuid"suggesting a unique signal identifier, but the implementation returnssignal.session_id(lines 483, 495). SinceStopSignalis keyed by session (one pending signal per session), this is functionally correct but potentially confusing for API consumers expecting a distinct signal identifier.Consider either:
- Renaming the response field to clarify it's the session_id, or
- Updating the docstring to reflect the actual semantics
src/gobby/workflows/actions.py (1)
1117-1150: Consider movingasyncioimport to module level.The
import asyncioat line 1127 is inside the method body. Since this file already uses async patterns extensively, moving the import to the module level would be more idiomatic and avoid repeated import overhead on each call.The fire-and-forget pattern with
add_done_callbackfor silent error logging is appropriate for non-critical broadcasts.♻️ Suggested change
Move the import to the top of the file with other imports:
import asyncio # at module level, around line 1Then remove line 1127.
src/gobby/autonomous/stop_registry.py (1)
203-242:list_pendinglacks lock protection for consistency.Unlike
signal_stop,acknowledge,clear, andcleanup_stale, thelist_pendingmethod doesn't acquireself._lock. While individual SQLite reads are atomic, if consistency with concurrent writes is needed during enumeration, consider adding locking. For read-only listing operations, this is typically acceptable, but worth noting for completeness.src/gobby/cli/sessions.py (1)
28-48: Code duplication:_format_turns_for_llmexists in multiple modules.This helper function appears to be duplicated across:
src/gobby/sessions/summary.pysrc/gobby/mcp_proxy/tools/session_messages.py- Here in
src/gobby/cli/sessions.pyConsider extracting this to a shared utility module to maintain consistency and reduce maintenance burden.
src/gobby/workflows/evaluator.py (1)
304-323: Webhook conditions require pre-evaluation - document the contract.The comment at lines 305-306 correctly notes that
evaluate_webhook_conditions()must be called first, but callers might miss this. Consider:
- Adding this requirement to the method docstring of
check_exit_conditions- Or raising a more descriptive error when the pre-evaluated result is missing
The current warning log is helpful for debugging but doesn't prevent silent failures in production.
src/gobby/autonomous/stuck_detector.py (1)
257-263: Ensure consistent key generation for tool pattern detection.Line 262 creates a pattern key by concatenating tool name with
tool_args_keyslist converted to string. Converting a list directly to a string (e.g.,str([])→'[]') might not produce consistent keys for comparison purposes, especially with nested structures.♻️ Proposed improvement
# Count tool call patterns tool_counts: dict[str, int] = {} for event in recent_events: if event.tool_name: # Create a key from tool name and key args - key = f"{event.tool_name}:{event.details.get('tool_args_keys', [])}" + args_keys = event.details.get('tool_args_keys', []) + args_key = json.dumps(sorted(args_keys)) if isinstance(args_keys, list) else str(args_keys) + key = f"{event.tool_name}:{args_key}" tool_counts[key] = tool_counts.get(key, 0) + 1This ensures that lists with the same elements in different orders produce the same key, and handles edge cases more reliably.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (136)
.gitignore.gobby/tasks.jsonl.gobby/tasks_meta.jsondocs/plans/todo-list.mdexamples/plugins/code_guardian.pypyproject.tomlscripts/cleanup_skills.pysrc/gobby/adapters/base.pysrc/gobby/adapters/codex.pysrc/gobby/agents/spawners/base.pysrc/gobby/autonomous/progress_tracker.pysrc/gobby/autonomous/stop_registry.pysrc/gobby/autonomous/stuck_detector.pysrc/gobby/cli/agents.pysrc/gobby/cli/daemon.pysrc/gobby/cli/sessions.pysrc/gobby/cli/workflows.pysrc/gobby/config/persistence.pysrc/gobby/hooks/events.pysrc/gobby/hooks/plugins.pysrc/gobby/llm/claude.pysrc/gobby/llm/claude_executor.pysrc/gobby/llm/codex.pysrc/gobby/llm/gemini_executor.pysrc/gobby/mcp_proxy/lazy.pysrc/gobby/mcp_proxy/manager.pysrc/gobby/mcp_proxy/stdio.pysrc/gobby/mcp_proxy/tools/session_messages.pysrc/gobby/mcp_proxy/tools/task_readiness.pysrc/gobby/mcp_proxy/tools/workflows.pysrc/gobby/memory/manager.pysrc/gobby/runner.pysrc/gobby/servers/http.pysrc/gobby/servers/routes/admin.pysrc/gobby/servers/routes/mcp.pysrc/gobby/servers/routes/sessions.pysrc/gobby/servers/websocket.pysrc/gobby/sessions/lifecycle.pysrc/gobby/sessions/manager.pysrc/gobby/sessions/processor.pysrc/gobby/sessions/summary.pysrc/gobby/sessions/transcripts/base.pysrc/gobby/sessions/transcripts/claude.pysrc/gobby/sessions/transcripts/codex.pysrc/gobby/sessions/transcripts/gemini.pysrc/gobby/skills/learner.pysrc/gobby/storage/compaction.pysrc/gobby/storage/database.pysrc/gobby/storage/mcp.pysrc/gobby/storage/memories.pysrc/gobby/storage/migrations.pysrc/gobby/storage/projects.pysrc/gobby/storage/sessions.pysrc/gobby/storage/skills.pysrc/gobby/storage/tasks.pysrc/gobby/storage/workflow_audit.pysrc/gobby/storage/worktrees.pysrc/gobby/sync/memories.pysrc/gobby/sync/skills.pysrc/gobby/sync/tasks.pysrc/gobby/tasks/context.pysrc/gobby/tasks/issue_extraction.pysrc/gobby/tasks/research.pysrc/gobby/tasks/spec_parser.pysrc/gobby/tasks/validation.pysrc/gobby/utils/json_helpers.pysrc/gobby/utils/logging.pysrc/gobby/utils/machine_id.pysrc/gobby/workflows/actions.pysrc/gobby/workflows/engine.pysrc/gobby/workflows/evaluator.pysrc/gobby/workflows/hooks.pysrc/gobby/workflows/loader.pysrc/gobby/workflows/state_manager.pysrc/gobby/workflows/summary_actions.pysrc/gobby/workflows/templates.pytests/agents/spawners/test_headless_spawner.pytests/agents/test_context_integration.pytests/agents/test_spawners.pytests/agents/test_tty_config.pytests/cli/test_cli.pytests/cli/test_tasks_commits.pytests/cli/test_validation_cli.pytests/config/test_mcp_config.pytests/examples/test_code_guardian.pytests/hooks/test_api_messages.pytests/hooks/test_plugins.pytests/hooks/test_webhooks.pytests/integration/test_agent_execution.pytests/integration/test_terminal_mode_worktrees.pytests/integration/test_workflow_tool_filtering.pytests/llm/test_claude_executor.pytests/llm/test_executor.pytests/llm/test_gemini_executor.pytests/llm/test_litellm_executor.pytests/llm/test_llm_codex.pytests/llm/test_llm_litellm.pytests/llm/test_resolver.pytests/mcp_proxy/test_fallback_resolver.pytests/mcp_proxy/test_gobby_daemon_tools.pytests/mcp_proxy/test_mcp_manager.pytests/mcp_proxy/test_semantic_search.pytests/mcp_proxy/test_server_mgmt.pytests/memory/test_context.pytests/memory/test_search_benchmark.pytests/plugins/test_example_notify.pytests/servers/test_http_coverage.pytests/servers/test_mcp_routes.pytests/sessions/test_sessions_processor_integration.pytests/storage/test_audit_coverage.pytests/storage/test_storage_database.pytests/storage/test_storage_migrations.pytests/storage/test_storage_session_tasks.pytests/sync/test_skill_sync.pytests/tasks/test_context.pytests/tasks/test_context_gatherer.pytests/tasks/test_enhanced_validator.pytests/tasks/test_escalation.pytests/tasks/test_expansion_coverage.pytests/tasks/test_issue_extraction.pytests/tasks/test_prompts.pytests/tasks/test_task_validation.pytests/tasks/test_validation_history.pytests/tasks/test_validation_models.pytests/utils/test_utils_daemon_client.pytests/utils/test_utils_status.pytests/workflows/test_artifact_actions.pytests/workflows/test_compact_handoff.pytests/workflows/test_context_sources.pytests/workflows/test_evaluator.pytests/workflows/test_memory_lifecycle.pytests/workflows/test_plugin_action_workflow.pytests/workflows/test_templates.pytests/workflows/test_webhook_executor.pytests/workflows/test_webhook_workflow_integration.pytests/worktrees/test_git.py
✅ Files skipped from review due to trivial changes (9)
- src/gobby/storage/worktrees.py
- src/gobby/mcp_proxy/tools/workflows.py
- src/gobby/mcp_proxy/tools/task_readiness.py
- src/gobby/storage/memories.py
- src/gobby/sync/tasks.py
- src/gobby/servers/routes/admin.py
- src/gobby/workflows/state_manager.py
- src/gobby/storage/skills.py
- src/gobby/mcp_proxy/lazy.py
🚧 Files skipped from review as they are similar to previous changes (5)
- .gobby/tasks_meta.json
- src/gobby/autonomous/progress_tracker.py
- src/gobby/servers/http.py
- src/gobby/storage/tasks.py
- tests/agents/spawners/test_headless_spawner.py
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
⚙️ CodeRabbit configuration file
src/**/*.py: - Check for proper type hints (project uses strict mypy)
- Ensure async functions are properly awaited
- Watch for potential security issues (SQL injection, command injection)
- Verify proper error handling
Files:
src/gobby/sync/memories.pysrc/gobby/llm/gemini_executor.pysrc/gobby/workflows/summary_actions.pysrc/gobby/sessions/transcripts/claude.pysrc/gobby/storage/projects.pysrc/gobby/tasks/issue_extraction.pysrc/gobby/tasks/validation.pysrc/gobby/workflows/engine.pysrc/gobby/sessions/processor.pysrc/gobby/sessions/transcripts/codex.pysrc/gobby/tasks/research.pysrc/gobby/sync/skills.pysrc/gobby/workflows/actions.pysrc/gobby/adapters/base.pysrc/gobby/utils/machine_id.pysrc/gobby/cli/agents.pysrc/gobby/hooks/events.pysrc/gobby/agents/spawners/base.pysrc/gobby/servers/routes/sessions.pysrc/gobby/llm/codex.pysrc/gobby/autonomous/stuck_detector.pysrc/gobby/utils/logging.pysrc/gobby/utils/json_helpers.pysrc/gobby/memory/manager.pysrc/gobby/servers/routes/mcp.pysrc/gobby/storage/compaction.pysrc/gobby/cli/workflows.pysrc/gobby/mcp_proxy/manager.pysrc/gobby/llm/claude.pysrc/gobby/tasks/spec_parser.pysrc/gobby/servers/websocket.pysrc/gobby/workflows/templates.pysrc/gobby/runner.pysrc/gobby/llm/claude_executor.pysrc/gobby/sessions/transcripts/base.pysrc/gobby/tasks/context.pysrc/gobby/autonomous/stop_registry.pysrc/gobby/sessions/lifecycle.pysrc/gobby/cli/daemon.pysrc/gobby/storage/sessions.pysrc/gobby/storage/mcp.pysrc/gobby/storage/database.pysrc/gobby/hooks/plugins.pysrc/gobby/skills/learner.pysrc/gobby/workflows/loader.pysrc/gobby/config/persistence.pysrc/gobby/storage/workflow_audit.pysrc/gobby/cli/sessions.pysrc/gobby/mcp_proxy/stdio.pysrc/gobby/workflows/hooks.pysrc/gobby/sessions/summary.pysrc/gobby/sessions/transcripts/gemini.pysrc/gobby/storage/migrations.pysrc/gobby/sessions/manager.pysrc/gobby/workflows/evaluator.pysrc/gobby/mcp_proxy/tools/session_messages.pysrc/gobby/adapters/codex.py
src/gobby/hooks/**
⚙️ CodeRabbit configuration file
src/gobby/hooks/**: - Hook handlers must be idempotent
- Check for proper event handling patterns
- Ensure plugins follow the HookPlugin interface
Files:
src/gobby/hooks/events.pysrc/gobby/hooks/plugins.py
src/gobby/mcp_proxy/**
⚙️ CodeRabbit configuration file
src/gobby/mcp_proxy/**: - This is the MCP (Model Context Protocol) proxy layer
- Check for proper tool schema validation
- Ensure tools return appropriate response structures
Files:
src/gobby/mcp_proxy/manager.pysrc/gobby/mcp_proxy/stdio.pysrc/gobby/mcp_proxy/tools/session_messages.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use asyncio with pytest configured withasyncio_mode = 'auto'for automatic async test detection
Use pytest markersslow,integration, ande2efor test categorization
Files:
tests/agents/test_context_integration.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure tests are meaningful, not just coverage padding
- Check for proper mocking and fixtures
- Verify async tests use pytest-asyncio markers
Files:
tests/agents/test_context_integration.py
🧠 Learnings (5)
📚 Learning: 2025-12-17T19:24:40.567Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Use `gobby-*` prefix for internal server names and route them locally via InternalRegistryManager, while proxying other servers via MCPClientManager
Applied to files:
src/gobby/servers/routes/mcp.py
📚 Learning: 2025-12-17T19:54:30.424Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T19:54:30.424Z
Learning: Do NOT create markdown TODO lists; use Gobby tasks instead
Applied to files:
docs/plans/todo-list.md
📚 Learning: 2025-12-17T19:24:40.567Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Applies to src/storage/database.py : Use SQLite for session, project, task, and MCP server/tool storage with thread-local database connections in `src/storage/database.py`
Applied to files:
src/gobby/storage/database.py
📚 Learning: 2025-12-17T19:24:40.567Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Applies to src/mcp_proxy/server.py : Use FastMCP server implementation for MCP proxy functionality in `src/mcp_proxy/server.py`
Applied to files:
src/gobby/mcp_proxy/stdio.py
📚 Learning: 2025-12-17T19:24:40.567Z
Learnt from: CR
Repo: GobbyAI/gobby PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T19:24:40.567Z
Learning: Implement MCP tool progressive disclosure with three-step discovery: list_tools() → get_tool_schema() → call_tool()
Applied to files:
src/gobby/mcp_proxy/stdio.py
🧬 Code graph analysis (30)
src/gobby/sessions/transcripts/claude.py (1)
src/gobby/sessions/transcripts/base.py (3)
extract_last_messages(68-81)extract_turns_since_clear(83-101)is_session_boundary(103-113)
src/gobby/tasks/issue_extraction.py (3)
src/gobby/tasks/expansion.py (1)
_extract_json(287-289)src/gobby/mcp_proxy/importer.py (1)
_extract_json(389-409)src/gobby/tasks/validation_models.py (1)
Issue(30-89)
src/gobby/workflows/engine.py (1)
src/gobby/workflows/definitions.py (1)
WorkflowState(94-130)
src/gobby/sessions/transcripts/codex.py (2)
src/gobby/sessions/transcripts/base.py (3)
extract_last_messages(68-81)extract_turns_since_clear(83-101)is_session_boundary(103-113)src/gobby/sessions/transcripts/claude.py (3)
extract_last_messages(43-82)extract_turns_since_clear(84-187)is_session_boundary(189-213)
src/gobby/workflows/actions.py (3)
src/gobby/workflows/autonomous_actions.py (6)
detect_stuck(178-220)detect_task_loop(142-175)get_progress_summary(258-286)record_progress(94-139)start_progress_tracking(20-49)stop_progress_tracking(52-91)src/gobby/workflows/stop_signal_actions.py (3)
check_stop_signal(17-83)clear_stop_signal(143-163)request_stop(109-140)src/gobby/servers/websocket.py (1)
broadcast_autonomous_event(638-670)
src/gobby/adapters/base.py (2)
src/gobby/adapters/codex.py (5)
translate_to_hook_event(911-1006)translate_to_hook_event(1192-1253)translate_from_hook_response(1008-1027)translate_from_hook_response(1255-1273)handle_native(1275-1292)src/gobby/hooks/events.py (2)
HookEvent(68-107)HookResponse(111-136)
src/gobby/autonomous/stuck_detector.py (2)
src/gobby/autonomous/progress_tracker.py (3)
get_summary(234-316)is_stagnant(318-332)get_recent_events(417-447)src/gobby/storage/database.py (2)
execute(63-65)fetchall(76-79)
src/gobby/memory/manager.py (3)
src/gobby/mcp_proxy/schema_hash.py (1)
get_stats(347-384)src/gobby/mcp_proxy/semantic_search.py (1)
get_embedding_stats(366-406)src/gobby/memory/semantic_search.py (1)
get_embedding_stats(361-399)
src/gobby/servers/routes/mcp.py (3)
src/gobby/servers/routes/dependencies.py (1)
get_metrics_manager(115-118)src/gobby/mcp_proxy/metrics.py (2)
ToolMetricsManager(71-620)get_metrics(156-217)src/gobby/servers/http.py (1)
_resolve_project_id(234-270)
src/gobby/storage/compaction.py (1)
src/gobby/memory/manager.py (1)
get_stats(316-350)
src/gobby/cli/workflows.py (2)
src/gobby/agents/runner.py (1)
session_id(137-139)src/gobby/mcp_proxy/lazy.py (1)
get_state(212-222)
src/gobby/tasks/spec_parser.py (5)
src/gobby/storage/sessions.py (1)
to_dict(71-95)src/gobby/storage/worktrees.py (1)
to_dict(59-73)src/gobby/mcp_proxy/metrics.py (1)
to_dict(52-68)src/gobby/memory/extractor.py (1)
to_dict(47-54)src/gobby/agents/registry.py (1)
to_dict(71-86)
tests/agents/test_context_integration.py (3)
tests/workflows/test_webhook_workflow_integration.py (1)
mock_session_manager(128-132)tests/hooks/test_plugins.py (1)
mock_session_manager(639-643)tests/workflows/test_plugin_action_workflow.py (1)
mock_session_manager(129-133)
scripts/cleanup_skills.py (3)
src/gobby/storage/skills.py (1)
LocalSkillManager(69-243)src/gobby/storage/database.py (1)
LocalDatabase(22-166)tests/memory/test_extractor.py (1)
db(15-19)
src/gobby/servers/websocket.py (3)
src/gobby/autonomous/stop_registry.py (1)
signal_stop(60-110)tests/servers/test_http_server.py (1)
signal_stop(949-954)tests/servers/test_sessions_routes.py (1)
signal_stop(707-712)
src/gobby/workflows/templates.py (1)
tests/workflows/test_webhook_workflow_integration.py (1)
render(140-147)
src/gobby/sessions/transcripts/base.py (3)
src/gobby/sessions/transcripts/claude.py (3)
extract_last_messages(43-82)extract_turns_since_clear(84-187)is_session_boundary(189-213)src/gobby/sessions/transcripts/codex.py (3)
extract_last_messages(29-41)extract_turns_since_clear(43-48)is_session_boundary(50-51)src/gobby/sessions/transcripts/gemini.py (3)
extract_last_messages(36-59)extract_turns_since_clear(61-69)is_session_boundary(71-76)
src/gobby/autonomous/stop_registry.py (1)
src/gobby/storage/database.py (4)
LocalDatabase(22-166)execute(63-65)fetchone(71-74)fetchall(76-79)
src/gobby/storage/sessions.py (1)
src/gobby/storage/database.py (3)
safe_update(81-141)fetchall(76-79)fetchone(71-74)
src/gobby/storage/mcp.py (1)
src/gobby/storage/database.py (1)
execute(63-65)
src/gobby/skills/learner.py (2)
src/gobby/llm/codex.py (1)
generate_text(349-376)src/gobby/llm/claude.py (1)
generate_text(397-455)
src/gobby/cli/sessions.py (3)
src/gobby/mcp_proxy/tools/session_messages.py (2)
_format_turns_for_llm(101-121)create_handoff(291-520)src/gobby/sessions/summary.py (1)
_format_turns_for_llm(395-426)src/gobby/sessions/transcripts/claude.py (3)
ClaudeTranscriptParser(19-335)extract_turns_since_clear(84-187)extract_last_messages(43-82)
src/gobby/mcp_proxy/stdio.py (1)
src/gobby/mcp_proxy/tools/internal.py (1)
tool(75-140)
src/gobby/workflows/hooks.py (1)
src/gobby/hooks/events.py (1)
HookEvent(68-107)
src/gobby/sessions/summary.py (3)
src/gobby/cli/sessions.py (1)
_format_turns_for_llm(28-48)src/gobby/mcp_proxy/tools/session_messages.py (1)
_format_turns_for_llm(101-121)src/gobby/workflows/actions.py (1)
_format_turns_for_llm(836-838)
examples/plugins/code_guardian.py (1)
src/gobby/hooks/events.py (2)
HookEvent(68-107)HookResponse(111-136)
src/gobby/sessions/transcripts/gemini.py (3)
src/gobby/sessions/transcripts/base.py (3)
extract_last_messages(68-81)extract_turns_since_clear(83-101)is_session_boundary(103-113)src/gobby/sessions/transcripts/claude.py (3)
extract_last_messages(43-82)extract_turns_since_clear(84-187)is_session_boundary(189-213)src/gobby/sessions/transcripts/codex.py (3)
extract_last_messages(29-41)extract_turns_since_clear(43-48)is_session_boundary(50-51)
src/gobby/workflows/evaluator.py (2)
src/gobby/workflows/webhook_executor.py (2)
WebhookExecutor(58-399)json_body(34-46)src/gobby/autonomous/stop_registry.py (1)
has_pending_signal(143-153)
src/gobby/mcp_proxy/tools/session_messages.py (2)
src/gobby/sessions/summary.py (1)
_format_turns_for_llm(395-426)src/gobby/sessions/transcripts/claude.py (3)
ClaudeTranscriptParser(19-335)extract_turns_since_clear(84-187)extract_last_messages(43-82)
src/gobby/adapters/codex.py (5)
src/gobby/hooks/events.py (1)
HookEvent(68-107)src/gobby/adapters/base.py (2)
translate_to_hook_event(34-47)handle_native(64-93)src/gobby/adapters/claude_code.py (2)
translate_to_hook_event(72-116)handle_native(193-220)src/gobby/adapters/gemini.py (2)
translate_to_hook_event(138-214)handle_native(275-302)src/gobby/hooks/hook_manager.py (1)
HookManager(72-827)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.12)
- GitHub Check: Test (Python 3.11)
- GitHub Check: claude-review
- GitHub Check: Test (Python 3.12)
- GitHub Check: Test (Python 3.11)
| [dependency-groups] | ||
| dev = [ | ||
| "bandit>=1.8.0", | ||
| "black>=24.0.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
latest Black Python formatter version January 2026
💡 Result:
As of January 8, 2026:
- Stable release: black 25.12.0. [1]
- Preview / pre-release (2026 style): black 26.1a1 (pre-release). [2]
Sources: GitHub releases for psf/black [1], release notes for 26.1a1 preview [2].
Update the Black version constraint to a more recent stable version.
Black 24.0.0 is outdated; the latest stable version as of January 2026 is 25.12.0. Update the constraint to black>=25.0.0 or pin to black>=25.12.0 to benefit from recent bug fixes and improvements.
🤖 Prompt for AI Agents
In @pyproject.toml at line 141, The pyproject dependency line currently pins
Black as "black>=24.0.0"; update this constraint to a more recent stable release
by replacing that token with either "black>=25.0.0" (looser) or "black>=25.12.0"
(recommended/pinned) so the project uses the latest fixes and improvements;
ensure any lockfile or CI config referencing the old version is updated to
match.
| from gobby.storage.database import LocalDatabase | ||
|
|
||
| db = LocalDatabase() | ||
| manager = LocalSkillManager(db=db) |
There was a problem hiding this comment.
Missing database migrations before use.
The script instantiates LocalDatabase and LocalSkillManager without running migrations. Per the test fixture pattern in tests/memory/test_extractor.py, run_migrations(db) should be called after database initialization to ensure the schema is up-to-date.
Proposed fix
from gobby.storage.database import LocalDatabase
+from gobby.storage.migrations import run_migrations
db = LocalDatabase()
+run_migrations(db)
manager = LocalSkillManager(db=db)🤖 Prompt for AI Agents
In @scripts/cleanup_skills.py around lines 17 - 20, The script creates
LocalDatabase and LocalSkillManager without running migrations; after
instantiating LocalDatabase (symbol: LocalDatabase) call run_migrations(db)
before constructing LocalSkillManager (symbol: LocalSkillManager) so the schema
is applied; update the script to import or reference run_migrations and invoke
run_migrations(db) right after db = LocalDatabase() and before manager =
LocalSkillManager(db=db).
| with self._lock: | ||
| self.db.execute( | ||
| """ | ||
| INSERT INTO task_selection_history ( | ||
| session_id, task_id, selected_at, context | ||
| ) VALUES (?, ?, ?, ?) | ||
| """, | ||
| ( | ||
| session_id, | ||
| task_id, | ||
| now.isoformat(), | ||
| str(context) if context else None, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Use JSON serialization for context instead of Python string representation.
Line 134 uses str(context) to serialize the context dictionary, which produces Python's repr format (e.g., {'key': 'value'}). This creates a mismatch with the deserialization logic in get_selection_history (lines 366-370), which first attempts ast.literal_eval and then falls back to json.loads.
Using str() is fragile and can fail with complex nested structures. Additionally, ast.literal_eval poses a security concern if the database could contain untrusted data.
🔧 Proposed fix
self.db.execute(
"""
INSERT INTO task_selection_history (
session_id, task_id, selected_at, context
) VALUES (?, ?, ?, ?)
""",
(
session_id,
task_id,
now.isoformat(),
- str(context) if context else None,
+ json.dumps(context) if context else None,
),
)Then simplify the deserialization in get_selection_history:
context = None
if row["context"]:
try:
- context = ast.literal_eval(row["context"])
- except (ValueError, SyntaxError):
- try:
- context = json.loads(row["context"])
- except json.JSONDecodeError:
- logger.warning(
- f"Failed to parse context for task selection: {row['context'][:100]}"
- )
- context = None
+ context = json.loads(row["context"])
+ except json.JSONDecodeError:
+ logger.warning(
+ f"Failed to parse context for task selection: {row['context'][:100]}"
+ )
+ context = NoneBased on coding guidelines, ensure proper error handling and avoid security risks.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @src/gobby/autonomous/stuck_detector.py around lines 123 - 136, The INSERT
uses str(context) which stores Python reprs; change it to json.dumps(context)
(and add/import json) so context is stored as valid JSON in the
task_selection_history row, and in get_selection_history replace
ast.literal_eval fallback by using json.loads only: treat NULL/None as None,
wrap json.loads in a try/except to catch JSONDecodeError and log/handle
gracefully, and remove use of ast.literal_eval to avoid security risks; refer to
the insert block around the task_selection_history INSERT and the
get_selection_history function for where to change code and add error handling.
| import machineid | ||
|
|
||
| return machineid.id() | ||
| return str(machineid.id()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's view the complete file context around line 118
cat -n src/gobby/utils/machine_id.py | sed -n '110,135p'Repository: GobbyAI/gobby
Length of output: 964
🏁 Script executed:
# Check if there's a requirements file or setup.py that specifies the version
fd -e "requirements*.txt" -e "setup.py" -e "pyproject.toml" -e "poetry.lock" | head -10 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'Repository: GobbyAI/gobby
Length of output: 39
🌐 Web query:
py-machineid library machineid.id() return type documentation
💡 Result:
machineid.id() returns a string (the raw machine GUID). See the py-machineid README and PyPI docs showing id() -> str and notes about the optional winregistry kwarg. [1][2]
Sources
- py-machineid GitHub README. [1]
- py-machineid on PyPI. [2]
Remove redundant str() conversion on line 118.
The py-machineid library's machineid.id() function already returns a string. The str() wrapper is unnecessary and creates an inconsistency with the intent of similar conversions elsewhere in the code (lines 121, 125, which correctly convert uuid.uuid4() objects that require string conversion).
Replace:
return str(machineid.id())With:
return machineid.id()🤖 Prompt for AI Agents
In @src/gobby/utils/machine_id.py at line 118, Remove the unnecessary str()
conversion around machineid.id() in the function that returns the machine id in
machine_id.py: replace the return that currently wraps machineid.id() with a
direct return of machineid.id() (i.e., drop str()), leaving other conversions
for uuid.uuid4() untouched; this keeps the return type consistent and avoids
redundant conversion.
- Fix cleanup_stale time calculation in stop_registry.py - Fix stuck_detector.py time comparison in detect_task_loop - Fix mcp.py metrics enrichment error handling and tool guards - Fix websocket.py StopSignal attribute references - Move collections import to module level in learner.py - Add Windows skip marker to test_headless_spawner.py - Add integration marker to TestMacOSIntegration - Add integration/slow markers to TestEmbeddedSpawnerUnix - Fix test_cli_daemon.py HOME isolation - Add integration marker to TestExecutorCreationWithConfig - Rename misleading test_concurrent_read_simulation test - Add assertion for clear_session not called in test - Update test_session_actions.py docstring - Fix test_webhook_condition.py assertion for error list
PR Review: v0.1.1Thank you for this comprehensive release! This is a major PR (88K additions, 268 files) merging significant autonomous session features from dev to main. Here is my detailed review: 🎯 SummaryThis PR introduces autonomous workflow capabilities, including progress tracking, stuck detection, graceful stop signals, and enhanced handoff generation. The features are well-architected and thoroughly tested. ✅ Strengths1. Excellent Code Quality
2. Strong Architecture
3. Robust Database Migrations
4. Security Considerations
|
…oped session refs - Add migration 76 to change sessions.seq_num index from global to project-scoped - Add migration 77 to re-backfill session seq_num values per project - Update seq_num assignment in sessions.py to be per-project - Update resolve_session_reference() to accept optional project_id parameter - Update both BASELINE_SCHEMA and BASELINE_SCHEMA_V2 for new databases This enables different projects to have independent session numbering (#1, #2, etc.) matching how tasks already work. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- clones.py: Wrap merge_branch in try/finally so temp ref cleanup (branch -D) runs even if merge raises (#16) - migrations.py: Make migration 147 idempotent with PRAGMA table_info check before ALTER TABLE ADD COLUMN (#25) - pipelines.py: Escape SQL wildcards (%, _) in LIKE patterns with ESCAPE clause (#26) - pipeline_executor.py: Namespace child pipeline outputs under "output" key to prevent clobbering wrapper keys like "status" (#28) - pipeline_heartbeat.py: Catch exceptions in _has_alive_agents to prevent killing the heartbeat loop (#29) - lifecycle_monitor.py: Log swallowed exceptions in orphan completion instead of bare except/pass (#2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The post-#12138 tests/workflows/test_pipeline_executor.py hit 3,164 lines, violating CLAUDE.md rule #2 (files must stay under 1,000 lines). Split into seven focused modules with shared fixtures lifted into tests/workflows/conftest.py: - test_pipeline_executor_core.py (427 lines) — Init, Execute, StepExecution, ExecStep, PromptStep - test_pipeline_executor_nested.py (534 lines) — nested pipelines, dict-form invoke_pipeline, depth/cycle - test_pipeline_executor_conditions.py (252 lines) — condition evaluation, default input merging - test_pipeline_executor_approvals.py (657 lines) — approval gate, approve/reject - test_pipeline_executor_child_session.py (565 lines) — pipeline child session, MCP session resolution, parent_session_id - test_pipeline_executor_outputs.py (423 lines) — build_outputs, coerce, emit_event, notify_completion, close_session - test_pipeline_executor_errors.py (179 lines) — execute() error paths, wait step Fixture sharing: mock_db, mock_execution_manager, mock_llm_service, mock_template_engine, mock_webhook_notifier, simple_pipeline, pipeline_with_prompt, pipeline_with_inputs are now in tests/workflows/conftest.py. Other workflow test modules (test_mcp_step, test_pipeline_resume, test_summary_actions, test_webhook_executor) define module-local overrides for these names — pytest fixture resolution keeps those isolated. Verified all still pass. Behavior-preserving: 118 tests (same count as post-#12138+#12211 monolith), 0 tests added or removed. One shell command rewritten (exec_step_captures_stderr) from python -c to sh -c to avoid escaped-quote JSON-proxy issues — same stderr capture semantics, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… callers New shared primitive at web/src/components/ui/SegmentedControl.tsx, next to the existing Switch primitive. Generic over a string-union value type, full radiogroup keyboard nav (ArrowLeft/Right wrap, Home jumps to first, End jumps to last), per-option onClick escape hatch for callers that need a side effect alongside state mutation, and locked-in subdued bg-accent/15 text-accent active styling — saturated bg-accent fill is intentionally not exposed as a variant because that treatment was a Pass 3 anti-pattern, not a useful option. Two sizes: sm (~23px, default, text-xs px-2 py-1) and md (~28px, text-sm px-2.5 py-1.5). Wider variants can be added when a third size has a real caller, not preemptively. Test suite at web/src/components/ui/__tests__/SegmentedControl.test.tsx covers click, per-option onClick co-firing, ArrowLeft/Right wrap, Home/End, disabled state, and an active-class regression guard so a future "let's add a saturated variant" PR can't slip through unnoticed. Migrated four simple callers, dropping ~50 lines of duplicated class chains and hand-rolled keyboard plumbing: - chat/ModeSelector.tsx — Plan/Act/Auto, was already subdued, now keyboard-navigable for free. - dashboard/TimeRangePills.tsx — was using saturated bg-accent fill, now demoted to subdued tint (fixes accent-violation #1). - activity/PipelinesTab.tsx — was using saturated bg-accent + a 30+ line hand-rolled roving-tabindex implementation. Now subdued + nav comes from the primitive (fixes #2 + drops the keyboard helpers). - workflows/ReportsPage.tsx — was saturated, now subdued (fixes #3). ProjectSelector and the SessionsTab Live/Expired select land in the next commit. The activity-panel header layout fix lands in commit 3.
…imit Pull search, indexing/lifecycle, and cross-reference logic out of the 1,702-line MemoryManager god-class into three focused services under src/gobby/memory/services/. MemoryManager keeps every public method as a thin facade wired through the new services, so all external callers and test surfaces (including the static MemoryManager._rrf_merge / _rrf_scores helpers and the patched _fetch_all_project_memories / _search_graph_for_memories hooks) keep working. CrossrefRebuildError moves to services/crossref.py and is re-exported from manager for backward compatibility. The huge search_memories body — formerly 312 lines with an inline build_results closure — is now split into _search_with_graph, _search_qdrant_fts, and a shared _build_results method on SearchService. manager.py drops from 1,702 to 920 lines, well under the 1,000-line monolith threshold from CLAUDE.md guideline #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catching up main to current
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.