Skip to content

Feature v10.1#5

Merged
Manthya merged 13 commits into
mainfrom
feature_v10.1
Mar 10, 2026
Merged

Feature v10.1#5
Manthya merged 13 commits into
mainfrom
feature_v10.1

Conversation

@Manthya
Copy link
Copy Markdown
Owner

@Manthya Manthya commented Mar 10, 2026

No description provided.

@Manthya Manthya requested a review from Copilot March 10, 2026 16:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the Phase 10.1 feature set: dynamic plugin/model configuration, personal assistant (HITL) integrations, multi-agent orchestration upgrades, and expanded evaluation/red-team test tooling.

Changes:

  • Added plugin + personal assistant API routes and frontend UI (Plugins tab, Draft Cards + HITL send).
  • Improved orchestration reliability/perf (agentic engine guardrails, Redis checkpointing/tests, pooled Ollama streaming, media pipeline offloading).
  • Added extensive eval, red-team, and integration test assets (csv case matrices, runners, Playwright specs).

Reviewed changes

Copilot reviewed 70 out of 75 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/verify_plugins_api.py Adds a script-style check for plugin endpoints (currently pytest-collectable).
tests/test_web_search.py Adds a standalone DuckDuckGo tool test script (ignored by pytest config).
tests/test_tools_integration.py Updates integration test to mock new orchestration/repo dependencies and validate JSON response.
tests/test_reflection.py Adds unit tests for new ReflectionHandler behavior.
tests/test_mcp_server_management.py Adds a multi-phase script for MCP server CRUD via plugin API (currently pytest-collectable).
tests/test_mcp_integration.py Updates MCP integration script to refresh and query remote tools.
tests/test_mcp_config.py Updates MCP config tests for async get_mcp_servers() and settings-manager mocking.
tests/test_checkpointing.py Adds tests for Redis-backed checkpoint save/load/clear.
tests/test_agent_handoff.py Adds tests for agent config selection and phase-based prompt substitution.
tests/redteam/tools/test_mcp_payloads.py Adds red-team payload tests for MCP tools.
tests/redteam/frontend/test_frontend_concurrency.spec.ts Adds Playwright red-team tests for frontend concurrency resilience.
tests/redteam/conftest.py Adds shared red-team fixtures (mock provider, repos, app overrides).
tests/redteam/backend/test_thread_isolation.py Adds red-team tests for thread isolation, concurrency interleaving, and overflow payloads.
tests/redteam/agentic/test_agentic_cycles.py Adds red-team tests for cycle detection, circuit breaker, and tool hallucination handling.
tests/evals/test.csv Adds an eval case matrix for benchmark runner.
tests/evals/run_phase10_orchestrator_live.py Adds a live websocket eval runner with reporting.
tests/evals/run_benchmarks.py Adds a mock/live benchmark runner that records tool trajectories.
tests/evals/phase10_report.md Adds a generated mock eval report artifact.
tests/evals/phase10_live_report.md Adds a generated live eval report artifact.
tests/evals/phase10_cases.csv Adds generated Phase 10 case matrix for eval runner.
tests/evals/golden_db.csv Adds fixtures backing the eval suite.
tests/evals/generate_phase10_cases.py Adds generator script for Phase 10 eval cases.
tests/evals/check_phase10_report.py Adds a CI gate for eval pass rate threshold.
tests/conftest.py Configures pytest to ignore certain script-style tests.
src/chatbot_ai_system/tools/registry.py Registers GetCurrentTimeTool in the default tool registry.
src/chatbot_ai_system/tools/init.py Makes default tool registration tolerant to duplicates (via direct _tools access).
src/chatbot_ai_system/services/tool_reliability.py Adds Redis-backed tool reliability tracking and ranking.
src/chatbot_ai_system/services/reflection.py Adds reflection-based retry parser/LLM prompt flow.
src/chatbot_ai_system/services/media_pipeline.py Offloads CPU-bound media work + adds singleton Whisper model caching.
src/chatbot_ai_system/services/agents.py Adds AgentConfig registry and phase-aware tool-executor agent selection.
src/chatbot_ai_system/services/agentic_engine.py Adds retries/timeouts, cycle detection, tool whitelist, HITL gating, and fail-closed behavior.
src/chatbot_ai_system/server/routes.py Uses dynamic model/provider settings, atomic sequence numbers, request_id correlation, and rollback on errors.
src/chatbot_ai_system/server/plugin_routes.py Adds plugin routes for model activation and MCP server management.
src/chatbot_ai_system/server/personal_routes.py Adds personal integration config/connect endpoints and HITL /send.
src/chatbot_ai_system/server/multimodal_routes.py Uses settings_manager to source the active model for voice flow.
src/chatbot_ai_system/server/main.py Migrates startup/shutdown to lifespan and mounts new routers.
src/chatbot_ai_system/repositories/conversation.py Adds DB-derived next sequence number + pgvector feature detection guards.
src/chatbot_ai_system/providers/ollama.py Adds pooled httpx client reuse for streaming and dynamic model fallback.
src/chatbot_ai_system/prompts.py Centralizes system/router/synthesis/verification/reflection prompts.
src/chatbot_ai_system/personal/constants.py Adds HITL tool name list and platform config schemas.
src/chatbot_ai_system/models/schemas.py Switches timestamps to timezone-aware UTC.
src/chatbot_ai_system/database/models.py Replaces deprecated utcnow usage and adds SystemSetting model.
src/chatbot_ai_system/config/settings_manager.py Adds DB-backed dynamic settings with validation hooks.
src/chatbot_ai_system/config/settings.py Updates Settings config to pydantic v2 SettingsConfigDict and aliases.
src/chatbot_ai_system/config/mcp_server_config.py Makes MCP server loading async; adds personal + dynamic server configs.
scripts/live_audit_matrix.py Adds a live audit runner script to validate end-to-end behavior.
pyproject.toml Adds media/tooling dependencies (Pillow, pydub, faster-whisper, opencv-headless).
frontend/test-results/.last-run.json Adds a Playwright results artifact (likely unintended commit).
frontend/package.json Adds Playwright as a dev dependency.
frontend/components/Sidebar.tsx Adds tab switching UI (Chats vs Plugins).
frontend/components/ChatArea.tsx Adds DraftCard UI for HITL tools and send callback wiring.
frontend/app/page.tsx Adds PluginsDashboard routing and HITL send flow, plus request_id correlation.
docs/phase_7.1.md Adds documentation for hardening and concurrency fixes.
docs/phase_7.0.md Adds documentation for model integration and plugin dashboard.
docs/phase_10.1.md Adds documentation for orchestrator graph + multi-agent + checkpointing.
docs/phase9.0.md Adds documentation for personal assistant integrations and HITL flow.
docs/phase10.0_testing.md Adds documentation for Phase 10 test plan and eval approach.
docs/phase10.0.md Adds documentation for routing/tool reliability upgrades.
docs/personal_platform_integration.md Adds specification doc for personal platform integrations.
alembic/versions/cdc18a2dc7b3_add_system_settings_table.py Adds migration for system_settings table.
.pgvector_build Adds a subproject pointer for pgvector build.
.github/workflows/phase10_eval.yml Adds a GitHub Actions job to run Phase 10 evals and enforce pass rate.
Files not reviewed (1)
  • frontend/package-lock.json: Language not supported
Comments suppressed due to low confidence (15)

tests/verify_plugins_api.py:1

  • This is pytest-collectable (function name starts with test_) but it performs a real HTTP call to localhost and has no assertions; it will be flaky/fail in CI. Move it to scripts/ (or rename to avoid test_ prefix) and/or add it to tests/conftest.py::collect_ignore, or convert it into a proper pytest test using FastAPI TestClient with deterministic assertions.
    tests/test_mcp_server_management.py:1
  • This is a script-style integration runner that hits a live server, but it lives under tests/ and matches pytest discovery (test_*.py). It should either be moved to scripts/ or added to tests/conftest.py::collect_ignore to prevent unintended execution during unit test runs.
    tests/test_mcp_server_management.py:1
  • This is a script-style integration runner that hits a live server, but it lives under tests/ and matches pytest discovery (test_*.py). It should either be moved to scripts/ or added to tests/conftest.py::collect_ignore to prevent unintended execution during unit test runs.
    tests/test_mcp_server_management.py:1
  • This is a script-style integration runner that hits a live server, but it lives under tests/ and matches pytest discovery (test_*.py). It should either be moved to scripts/ or added to tests/conftest.py::collect_ignore to prevent unintended execution during unit test runs.
    tests/test_mcp_server_management.py:1
  • This is a script-style integration runner that hits a live server, but it lives under tests/ and matches pytest discovery (test_*.py). It should either be moved to scripts/ or added to tests/conftest.py::collect_ignore to prevent unintended execution during unit test runs.
    tests/test_mcp_server_management.py:1
  • This is a script-style integration runner that hits a live server, but it lives under tests/ and matches pytest discovery (test_*.py). It should either be moved to scripts/ or added to tests/conftest.py::collect_ignore to prevent unintended execution during unit test runs.
    tests/conftest.py:1
  • Given the addition of other script-style files under tests/ (e.g., verify_plugins_api.py, test_mcp_server_management.py, possibly others), this ignore list likely needs to be expanded to keep pytest runs hermetic. Consider adding the new script-style runners here or relocating them under scripts/.
    src/chatbot_ai_system/repositories/conversation.py:1
  • Binding conversation_id as str(conversation_id) can cause Postgres type mismatch (uuid = text) with a UUID column, especially in raw text() queries where the bind param isn't typed. Pass the UUID object directly (or cast :cid::uuid / use SQLAlchemy select(func.max(...)) against the model column) to ensure correct typing.
    src/chatbot_ai_system/services/media_pipeline.py:1
  • Inside async def, prefer asyncio.get_running_loop() over get_event_loop() (which is deprecated/behavior-changed in newer Python versions). Update these call sites to avoid runtime warnings/errors under Python 3.12+.
    src/chatbot_ai_system/services/media_pipeline.py:1
  • Inside async def, prefer asyncio.get_running_loop() over get_event_loop() (which is deprecated/behavior-changed in newer Python versions). Update these call sites to avoid runtime warnings/errors under Python 3.12+.
    src/chatbot_ai_system/services/media_pipeline.py:1
  • Inside async def, prefer asyncio.get_running_loop() over get_event_loop() (which is deprecated/behavior-changed in newer Python versions). Update these call sites to avoid runtime warnings/errors under Python 3.12+.
    src/chatbot_ai_system/services/media_pipeline.py:1
  • The class-level ThreadPoolExecutor is never shut down. In long-running processes (and especially during local dev reloads/tests), this can leak threads/resources. Consider adding a shutdown hook (e.g., a close()/shutdown() classmethod) and calling it from FastAPI lifespan shutdown.
    src/chatbot_ai_system/services/reflection.py:1
  • This fallback regex only matches JSON objects without nested braces, so it will never capture a typical tool call like {\"name\": \"x\", \"arguments\": { ... }} when it's not in a code block. Consider replacing this with a brace-balancing extraction or a more robust JSON-snippet finder so nested arguments objects can be parsed.
    src/chatbot_ai_system/tools/init.py:1
  • This relies on the private attribute registry._tools. Prefer a public API (e.g., registry.has_tool(name)), or make ToolRegistry.register() idempotent (no-op on duplicates) to avoid reaching into internal state.
    src/chatbot_ai_system/server/main.py:1
  • This commented-out shutdown code is now misleading since Redis shutdown is handled in the new lifespan() context manager. Consider removing the commented line to avoid confusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/chatbot_ai_system/config/settings_manager.py
Comment thread frontend/components/Sidebar.tsx
Comment thread frontend/components/Sidebar.tsx
Comment thread frontend/test-results/.last-run.json
@Manthya Manthya merged commit a3c5c26 into main Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants