Feature v10.1#5
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces the Phase 10.1 feature set: dynamic plugin/model configuration, personal assistant (HITL) integrations, multi-agent orchestration upgrades, and expanded evaluation/red-team test tooling.
Changes:
- Added plugin + personal assistant API routes and frontend UI (Plugins tab, Draft Cards + HITL send).
- Improved orchestration reliability/perf (agentic engine guardrails, Redis checkpointing/tests, pooled Ollama streaming, media pipeline offloading).
- Added extensive eval, red-team, and integration test assets (csv case matrices, runners, Playwright specs).
Reviewed changes
Copilot reviewed 70 out of 75 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/verify_plugins_api.py | Adds a script-style check for plugin endpoints (currently pytest-collectable). |
| tests/test_web_search.py | Adds a standalone DuckDuckGo tool test script (ignored by pytest config). |
| tests/test_tools_integration.py | Updates integration test to mock new orchestration/repo dependencies and validate JSON response. |
| tests/test_reflection.py | Adds unit tests for new ReflectionHandler behavior. |
| tests/test_mcp_server_management.py | Adds a multi-phase script for MCP server CRUD via plugin API (currently pytest-collectable). |
| tests/test_mcp_integration.py | Updates MCP integration script to refresh and query remote tools. |
| tests/test_mcp_config.py | Updates MCP config tests for async get_mcp_servers() and settings-manager mocking. |
| tests/test_checkpointing.py | Adds tests for Redis-backed checkpoint save/load/clear. |
| tests/test_agent_handoff.py | Adds tests for agent config selection and phase-based prompt substitution. |
| tests/redteam/tools/test_mcp_payloads.py | Adds red-team payload tests for MCP tools. |
| tests/redteam/frontend/test_frontend_concurrency.spec.ts | Adds Playwright red-team tests for frontend concurrency resilience. |
| tests/redteam/conftest.py | Adds shared red-team fixtures (mock provider, repos, app overrides). |
| tests/redteam/backend/test_thread_isolation.py | Adds red-team tests for thread isolation, concurrency interleaving, and overflow payloads. |
| tests/redteam/agentic/test_agentic_cycles.py | Adds red-team tests for cycle detection, circuit breaker, and tool hallucination handling. |
| tests/evals/test.csv | Adds an eval case matrix for benchmark runner. |
| tests/evals/run_phase10_orchestrator_live.py | Adds a live websocket eval runner with reporting. |
| tests/evals/run_benchmarks.py | Adds a mock/live benchmark runner that records tool trajectories. |
| tests/evals/phase10_report.md | Adds a generated mock eval report artifact. |
| tests/evals/phase10_live_report.md | Adds a generated live eval report artifact. |
| tests/evals/phase10_cases.csv | Adds generated Phase 10 case matrix for eval runner. |
| tests/evals/golden_db.csv | Adds fixtures backing the eval suite. |
| tests/evals/generate_phase10_cases.py | Adds generator script for Phase 10 eval cases. |
| tests/evals/check_phase10_report.py | Adds a CI gate for eval pass rate threshold. |
| tests/conftest.py | Configures pytest to ignore certain script-style tests. |
| src/chatbot_ai_system/tools/registry.py | Registers GetCurrentTimeTool in the default tool registry. |
| src/chatbot_ai_system/tools/init.py | Makes default tool registration tolerant to duplicates (via direct _tools access). |
| src/chatbot_ai_system/services/tool_reliability.py | Adds Redis-backed tool reliability tracking and ranking. |
| src/chatbot_ai_system/services/reflection.py | Adds reflection-based retry parser/LLM prompt flow. |
| src/chatbot_ai_system/services/media_pipeline.py | Offloads CPU-bound media work + adds singleton Whisper model caching. |
| src/chatbot_ai_system/services/agents.py | Adds AgentConfig registry and phase-aware tool-executor agent selection. |
| src/chatbot_ai_system/services/agentic_engine.py | Adds retries/timeouts, cycle detection, tool whitelist, HITL gating, and fail-closed behavior. |
| src/chatbot_ai_system/server/routes.py | Uses dynamic model/provider settings, atomic sequence numbers, request_id correlation, and rollback on errors. |
| src/chatbot_ai_system/server/plugin_routes.py | Adds plugin routes for model activation and MCP server management. |
| src/chatbot_ai_system/server/personal_routes.py | Adds personal integration config/connect endpoints and HITL /send. |
| src/chatbot_ai_system/server/multimodal_routes.py | Uses settings_manager to source the active model for voice flow. |
| src/chatbot_ai_system/server/main.py | Migrates startup/shutdown to lifespan and mounts new routers. |
| src/chatbot_ai_system/repositories/conversation.py | Adds DB-derived next sequence number + pgvector feature detection guards. |
| src/chatbot_ai_system/providers/ollama.py | Adds pooled httpx client reuse for streaming and dynamic model fallback. |
| src/chatbot_ai_system/prompts.py | Centralizes system/router/synthesis/verification/reflection prompts. |
| src/chatbot_ai_system/personal/constants.py | Adds HITL tool name list and platform config schemas. |
| src/chatbot_ai_system/models/schemas.py | Switches timestamps to timezone-aware UTC. |
| src/chatbot_ai_system/database/models.py | Replaces deprecated utcnow usage and adds SystemSetting model. |
| src/chatbot_ai_system/config/settings_manager.py | Adds DB-backed dynamic settings with validation hooks. |
| src/chatbot_ai_system/config/settings.py | Updates Settings config to pydantic v2 SettingsConfigDict and aliases. |
| src/chatbot_ai_system/config/mcp_server_config.py | Makes MCP server loading async; adds personal + dynamic server configs. |
| scripts/live_audit_matrix.py | Adds a live audit runner script to validate end-to-end behavior. |
| pyproject.toml | Adds media/tooling dependencies (Pillow, pydub, faster-whisper, opencv-headless). |
| frontend/test-results/.last-run.json | Adds a Playwright results artifact (likely unintended commit). |
| frontend/package.json | Adds Playwright as a dev dependency. |
| frontend/components/Sidebar.tsx | Adds tab switching UI (Chats vs Plugins). |
| frontend/components/ChatArea.tsx | Adds DraftCard UI for HITL tools and send callback wiring. |
| frontend/app/page.tsx | Adds PluginsDashboard routing and HITL send flow, plus request_id correlation. |
| docs/phase_7.1.md | Adds documentation for hardening and concurrency fixes. |
| docs/phase_7.0.md | Adds documentation for model integration and plugin dashboard. |
| docs/phase_10.1.md | Adds documentation for orchestrator graph + multi-agent + checkpointing. |
| docs/phase9.0.md | Adds documentation for personal assistant integrations and HITL flow. |
| docs/phase10.0_testing.md | Adds documentation for Phase 10 test plan and eval approach. |
| docs/phase10.0.md | Adds documentation for routing/tool reliability upgrades. |
| docs/personal_platform_integration.md | Adds specification doc for personal platform integrations. |
| alembic/versions/cdc18a2dc7b3_add_system_settings_table.py | Adds migration for system_settings table. |
| .pgvector_build | Adds a subproject pointer for pgvector build. |
| .github/workflows/phase10_eval.yml | Adds a GitHub Actions job to run Phase 10 evals and enforce pass rate. |
Files not reviewed (1)
- frontend/package-lock.json: Language not supported
Comments suppressed due to low confidence (15)
tests/verify_plugins_api.py:1
- This is pytest-collectable (function name starts with
test_) but it performs a real HTTP call to localhost and has no assertions; it will be flaky/fail in CI. Move it toscripts/(or rename to avoidtest_prefix) and/or add it totests/conftest.py::collect_ignore, or convert it into a proper pytest test using FastAPITestClientwith deterministic assertions.
tests/test_mcp_server_management.py:1 - This is a script-style integration runner that hits a live server, but it lives under
tests/and matches pytest discovery (test_*.py). It should either be moved toscripts/or added totests/conftest.py::collect_ignoreto prevent unintended execution during unit test runs.
tests/test_mcp_server_management.py:1 - This is a script-style integration runner that hits a live server, but it lives under
tests/and matches pytest discovery (test_*.py). It should either be moved toscripts/or added totests/conftest.py::collect_ignoreto prevent unintended execution during unit test runs.
tests/test_mcp_server_management.py:1 - This is a script-style integration runner that hits a live server, but it lives under
tests/and matches pytest discovery (test_*.py). It should either be moved toscripts/or added totests/conftest.py::collect_ignoreto prevent unintended execution during unit test runs.
tests/test_mcp_server_management.py:1 - This is a script-style integration runner that hits a live server, but it lives under
tests/and matches pytest discovery (test_*.py). It should either be moved toscripts/or added totests/conftest.py::collect_ignoreto prevent unintended execution during unit test runs.
tests/test_mcp_server_management.py:1 - This is a script-style integration runner that hits a live server, but it lives under
tests/and matches pytest discovery (test_*.py). It should either be moved toscripts/or added totests/conftest.py::collect_ignoreto prevent unintended execution during unit test runs.
tests/conftest.py:1 - Given the addition of other script-style files under
tests/(e.g.,verify_plugins_api.py,test_mcp_server_management.py, possibly others), this ignore list likely needs to be expanded to keep pytest runs hermetic. Consider adding the new script-style runners here or relocating them underscripts/.
src/chatbot_ai_system/repositories/conversation.py:1 - Binding
conversation_idasstr(conversation_id)can cause Postgres type mismatch (uuid = text) with a UUID column, especially in rawtext()queries where the bind param isn't typed. Pass the UUID object directly (or cast:cid::uuid/ use SQLAlchemyselect(func.max(...))against the model column) to ensure correct typing.
src/chatbot_ai_system/services/media_pipeline.py:1 - Inside
async def, preferasyncio.get_running_loop()overget_event_loop()(which is deprecated/behavior-changed in newer Python versions). Update these call sites to avoid runtime warnings/errors under Python 3.12+.
src/chatbot_ai_system/services/media_pipeline.py:1 - Inside
async def, preferasyncio.get_running_loop()overget_event_loop()(which is deprecated/behavior-changed in newer Python versions). Update these call sites to avoid runtime warnings/errors under Python 3.12+.
src/chatbot_ai_system/services/media_pipeline.py:1 - Inside
async def, preferasyncio.get_running_loop()overget_event_loop()(which is deprecated/behavior-changed in newer Python versions). Update these call sites to avoid runtime warnings/errors under Python 3.12+.
src/chatbot_ai_system/services/media_pipeline.py:1 - The class-level
ThreadPoolExecutoris never shut down. In long-running processes (and especially during local dev reloads/tests), this can leak threads/resources. Consider adding a shutdown hook (e.g., aclose()/shutdown()classmethod) and calling it from FastAPI lifespan shutdown.
src/chatbot_ai_system/services/reflection.py:1 - This fallback regex only matches JSON objects without nested braces, so it will never capture a typical tool call like
{\"name\": \"x\", \"arguments\": { ... }}when it's not in a code block. Consider replacing this with a brace-balancing extraction or a more robust JSON-snippet finder so nestedargumentsobjects can be parsed.
src/chatbot_ai_system/tools/init.py:1 - This relies on the private attribute
registry._tools. Prefer a public API (e.g.,registry.has_tool(name)), or makeToolRegistry.register()idempotent (no-op on duplicates) to avoid reaching into internal state.
src/chatbot_ai_system/server/main.py:1 - This commented-out shutdown code is now misleading since Redis shutdown is handled in the new
lifespan()context manager. Consider removing the commented line to avoid confusion.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.