Add MCP stdio bridge and database discovery tools#10
Conversation
Add a database discovery agent prototype that uses LLMs to explore databases through the MCP Query endpoint. Includes: - Rich CLI (discover_cli.py): Working async CLI with Rich TUI, proper MCP tools/call JSON-RPC method, and full tracing support - FastAPI_deprecated_POC: Early prototype with incorrect MCP protocol, kept for reference only The Rich CLI version implements a multi-expert agent architecture: - Planner: Chooses next tasks based on state - Structural Expert: Analyzes table structure and relationships - Statistical Expert: Profiles tables and columns - Semantic Expert: Infers domain meaning - Query Expert: Validates access patterns
LLM improvements: - Add configurable chat path (LLM_CHAT_PATH or --llm-chat-path) to support non-standard endpoints like Z.ai's /api/coding/paas/v4 - Add optional JSON mode (LLM_JSON_MODE or --llm-json-mode) for models that support native JSON output - Enhanced tracing: log HTTP status and response body snippet on every request - Safer JSON parsing: treat empty content as error with helpful message - Better error messages with diagnostic hints Code cleanup: - Remove intent command (simplify CLI) - Remove user_intent reading and passing - Simplify stopping logic (just run max_iterations) - Clean up formatting and remove unused code
Add a Python stdio-based MCP server that bridges to ProxySQL's HTTPS MCP endpoint, enabling Claude Code to use ProxySQL MCP tools directly. The bridge: - Implements stdio MCP server protocol (for Claude Code) - Acts as MCP client to ProxySQL's HTTPS endpoint - Supports initialize, tools/list, tools/call methods - Handles authentication via Bearer tokens - Configurable via environment variables Usage: - Configure in Claude Code MCP settings - Set PROXYSQL_MCP_ENDPOINT environment variable - Optional: PROXYSQL_MCP_TOKEN for auth
Add PROXYSQL_MCP_DEBUG environment variable to enable verbose logging of all stdio communication and ProxySQL HTTP requests/responses.
ProxySQL MCP wraps tool responses in {"result": {...}, "success": true}.
The bridge now unwraps this to return just the actual result to Claude Code.
This fixes the LLM error 'The prompt parameter was not received normally'
which occurred because the LLM was receiving the malformed nested structure.
- Unwrap ProxySQL's {"success": ..., "result": ...} wrapper in tool responses
for MCP protocol compliance
- Fix proxysql.cfg missing closing brace for mcp_variables section
…bridge - Redirect stderr to /tmp/proxysql_mcp_bridge.log for debugging - Add extreme verbosity with timestamps for all stdin/stdout/HTTP traffic - CRITICAL FIX: Set stdout to line-buffered mode to prevent responses from being buffered and never reaching Claude Code (causing timeouts) - Log all HTTP requests/responses to ProxySQL MCP server - Log all message handling and unwrapping operations
… buffering The previous sys.stdout.reconfigure(line_buffering=True) didn't work when stderr is redirected. Now we create a new io.TextIOWrapper around sys.stdout.buffer with line_buffering=False, ensuring immediate flush. Also sets PYTHONUNBUFFERED=1 for extra safety.
…timeout - Revert the stdout replacement changes (was probably not the issue) - Add detailed logging to _writeline to see exactly what's happening when writing to stdout
The MCP protocol requires tool call results to be wrapped in content items
with type and text fields. This matches what other MCP servers do.
Before: {"result": [{"name": "testdb", ...}]}
After: {"result": [{"type": "text", "text": "[{\"name\": \"testdb\", ...}]"}]}
This should fix the issue where Claude Code was timing out waiting for responses.
Remove all the unwrapping and TextContent wrapping logic that was added. Go back to the original simple pass-through that just returns the result from ProxySQL directly. The original format was correct.
The TextIOWrapper may have buffering issues when stderr is redirected. Writing directly to the binary buffer with encoded bytes ensures immediate delivery of responses to Claude Code.
The sys.stderr redirection may be interfering with stdout/stdin pipes. Now logging writes directly to a file handle instead of through sys.stderr.
Restore to exact original code from commit 01c182c. The original code is clean and simple - just passes through responses. Only added minimal file-based logging for debugging.
Added _log() calls to track: - stdout writes (bytes and content preview) - tools/call handler (name, response, result) - main startup Log is written to /tmp/proxysql_mcp_bridge.log
Per MCP spec (https://modelcontextprotocol.io/specification/2025-11-25/server/tools): - Tool call responses MUST have a "content" array - Each content item has "type" and "text" fields - Response includes "isError" boolean This was the root cause - Claude Code was waiting for the correct format.
The ProxySQL MCP server now wraps tool results in the correct MCP format: - result.content: array of content items (type: "text", text: "...") - result.isError: boolean Per MCP spec: https://modelcontextprotocol.io/specification/2025-11-25/server/tools Also simplified the bridge to pass through results directly since the server now returns the correct format.
Added documentation for: - Log file location (/tmp/proxysql_mcp_bridge.log) - What information is logged - How to use logs for debugging
Summary of ChangesHello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the interaction capabilities with ProxySQL's MCP endpoint by introducing a stdio bridge for Claude Code and a robust set of database discovery tools. It also establishes a framework for LLM-driven database exploration, with both a FastAPI-based prototype (now deprecated) and a more refined Rich CLI implementation, designed to intelligently gather and synthesize database metadata while adhering to MCP standards. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Caution Review failedThe pull request is closed. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a multi-component AI/Discovery stack: DiscoveryAgent prototypes (FastAPI deprecated POC, Rich CLI, headless scripts), a ProxySQL MCP stdio→HTTPS bridge, AI feature C++ components (LLM bridge, Anomaly Detector, Vector storage, AI manager, AI tool handler), MCP tool-response wrapping, many docs, tests, and utility scripts across Python, C++, and shell. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as CLI / FastAPI Client
participant Orch as Orchestrator
participant LLM as LLM (Planner / Experts)
participant MCP as MCP Client
participant Catalog as MCP Catalog / tools
Client->>Orch: start run (schema?)
Orch->>MCP: list schemas & tables
MCP-->>Orch: schemas/tables
Orch->>MCP: fetch intent & catalog snippets
MCP-->>Orch: context data
loop iterations
Orch->>LLM: plan tasks (context + intent)
LLM-->>Orch: planned tasks
loop per task
Orch->>LLM: expert.act(context)
LLM->>MCP: tools/call (allowed tool)
MCP-->>LLM: tool results
LLM->>Orch: reflect (writes, insights, questions)
Orch->>Catalog: upsert catalog writes / questions
end
Orch->>Orch: evaluate insights / decide stop
end
Orch->>Client: emit events / final status
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an MCP stdio bridge and a comprehensive set of database discovery tools. The changes include a deprecated FastAPI-based proof-of-concept and a new, more robust Rich CLI-based implementation. The new CLI tool is well-structured, featuring good separation of concerns, robust error handling in the clients, and detailed tracing capabilities. The documentation is also thorough and helpful.
My review focuses on the new discover_cli.py implementation, with several key suggestions for improving correctness, performance, and usability. I've also noted a significant anti-pattern in the deprecated agent_app.py that should be avoided in future code.
| @@ -0,0 +1,577 @@ | |||
| \ | |||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
Silently catching a broad Exception and returning None is a dangerous pattern. It hides all potential errors (e.g., network issues, invalid JSON from MCP, permission errors) and makes debugging extremely difficult. Even for a deprecated proof-of-concept, it's better to at least log the exception before returning. This will provide visibility into why _get_user_intent might be failing and is a good practice to follow.
| schemas_res = await self.mcp.call_tool("list_schemas", {"page_size": 50}) | ||
| schemas = schemas_res if isinstance(schemas_res, list) else normalize_list(schemas_res, ("schemas", "items", "result")) | ||
| if not schemas: | ||
| raise MCPError("No schemas returned by MCP list_schemas") | ||
|
|
||
| chosen_schema = schema or item_name(schemas[0]) | ||
| ui.last_event = f"Selected schema: {chosen_schema}" | ||
|
|
||
| tables_res = await self.mcp.call_tool("list_tables", {"schema": chosen_schema, "page_size": 500}) | ||
| tables = tables_res if isinstance(tables_res, list) else normalize_list(tables_res, ("tables", "items", "result")) | ||
| table_names = [item_name(t) for t in tables] | ||
| if not table_names: | ||
| raise MCPError(f"No tables returned by MCP list_tables(schema={chosen_schema})") |
There was a problem hiding this comment.
The agent currently only fetches the first page of schemas and tables, using hardcoded page_size values. If the database has more schemas or tables than the page size, the agent will miss them, leading to incomplete discovery. The MCP list_schemas and list_tables tools likely support pagination via a page_token (as mentioned in doc/MCP/Tool_Discovery_Guide.md). The implementation should be updated to loop through all pages of results.
Here is an example of how you could implement pagination for schemas:
# Example for schemas
all_schemas = []
page_token = None
while True:
params = {"page_size": 50}
if page_token:
params["page_token"] = page_token
schemas_res = await self.mcp.call_tool("list_schemas", params)
schemas_page = schemas_res if isinstance(schemas_res, list) else normalize_list(schemas_res, ("schemas", "items", "result"))
all_schemas.extend(schemas_page)
# Assuming the response contains a next_page_token
page_token = schemas_res.get("next_page_token") if isinstance(schemas_res, dict) else None
if not page_token:
break
if not all_schemas:
raise MCPError("No schemas returned by MCP list_schemas")
# Use all_schemas instead of schemas| class TraceLogger: | ||
| def __init__(self, path: Optional[str]): | ||
| self.path = path | ||
|
|
||
| def write(self, record: Dict[str, Any]): | ||
| if not self.path: | ||
| return | ||
| rec = dict(record) | ||
| rec["ts"] = time.time() | ||
| with open(self.path, "a", encoding="utf-8") as f: | ||
| f.write(json.dumps(rec, ensure_ascii=False) + "\n") |
There was a problem hiding this comment.
The TraceLogger opens and closes the trace file for every write call. For frequent logging, this can become a performance bottleneck due to I/O overhead. I suggest refactoring the class to open the file once upon initialization and close it when the application exits.
You will also need to add a close() method to the class and call trace.close() from a finally block in the cmd_run function to ensure the file is closed properly on exit.
| class TraceLogger: | |
| def __init__(self, path: Optional[str]): | |
| self.path = path | |
| def write(self, record: Dict[str, Any]): | |
| if not self.path: | |
| return | |
| rec = dict(record) | |
| rec["ts"] = time.time() | |
| with open(self.path, "a", encoding="utf-8") as f: | |
| f.write(json.dumps(rec, ensure_ascii=False) + "\n") | |
| class TraceLogger: | |
| def __init__(self, path: Optional[str]): | |
| self.path = path | |
| self._file = open(self.path, "a", encoding="utf-8") if self.path else None | |
| def write(self, record: Dict[str, Any]): | |
| if not self._file: | |
| return | |
| rec = dict(record) | |
| rec["ts"] = time.time() | |
| self._file.write(json.dumps(rec, ensure_ascii=False) + "\n") | |
| def close(self): | |
| if self._file: | |
| self._file.close() | |
| self._file = None |
| if ui.last_event: | ||
| events.append(ui.last_event) | ||
| if ui.last_error: | ||
| events.append("\\n") |
| if args.debug: | ||
| tb = traceback.format_exc() | ||
| trace.write({"type": "error.traceback", "traceback": tb}) | ||
| ui.last_error += "\\n" + tb |
There was a problem hiding this comment.
The traceback is being appended to the error string with a double backslash (\\n). This will render as a literal \n in the UI instead of a newline. To ensure the traceback appears on new lines for readability, you should use a single backslash (\n).
| ui.last_error += "\\n" + tb | |
| ui.last_error += "\n" + tb |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In @scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/agent_app.py:
- Line 499: The RUNS dict stores completed runs indefinitely causing memory
leaks; update the runner() shutdown/finally logic to remove its entry from RUNS
after completion by scheduling a delayed cleanup (e.g., use
asyncio.get_event_loop().call_later or an asyncio.create_task that awaits a
sleep then RUNS.pop(run_id, None)) so events can be consumed before deletion;
ensure cleanup is added alongside existing shutdown calls to mcp.close() and
llm.close(), and reference the RUNS key using run_id so the exact entry is
removed safely.
In @scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/requirements.txt:
- Around line 1-5: The requirements pin fastapi==0.115.0 (in requirements.txt)
which pulls vulnerable Starlette versions; either delete this deprecated POC
file entirely or update the dependency pins: replace fastapi==0.115.0 with a
current FastAPI release that depends on Starlette >=0.47.2 (or add an explicit
starlette>=0.47.2 pin to requirements.txt to override the transitive
dependency), bump/align related packages (e.g., pydantic) as needed, then run
the app/tests/CI to verify compatibility and remove the deprecated artifact if
you choose not to maintain it.
In @scripts/mcp/DiscoveryAgent/Rich/discover_cli.py:
- Around line 1-2: Remove the stray backslash so the shebang line is the very
first line of the script; specifically, delete the leading "\" before
"#!/usr/bin/env python3" in discover_cli.py so the interpreter directive is
recognized when executing the file directly and retain the file's executable
bit.
- Around line 471-473: The code appends a literal "\\n" instead of an actual
newline before logging ui.last_error; update the events.append call so it uses a
real newline string ("\n") (i.e., replace the "\\n" argument with "\n") so Rich
renders a line break before events.append(ui.last_error, style="bold red").
- Around line 571-573: Change the KeyboardInterrupt handler to use a real
newline and exception chaining: catch the interrupt as an exception (e.g.,
"except KeyboardInterrupt as e"), call Console().print with
"\n[yellow]Interrupted[/yellow]" (not an escaped "\\n"), and re-raise
SystemExit(130) using "from e" so the SystemExit chains from the original
KeyboardInterrupt.
- Around line 526-527: The code appends a literal backslash-n to ui.last_error;
change the concatenation in the error path where trace.write({"type":
"error.traceback", "traceback": tb}) is used so that ui.last_error += "\\n" + tb
becomes use an actual newline character (replace the escaped "\\n" with "\n")
when updating ui.last_error; ensure this change is applied where ui.last_error
is set so the stored error message contains real newlines.
In @scripts/mcp/DiscoveryAgent/Rich/README.md:
- Around line 56-66: The README.md contains an inconsistent reference to
requirements_cli.txt while the directory actually has requirements.txt; update
the second install snippet or remove it: either change "pip install -r
requirements_cli.txt" to "pip install -r requirements.txt" (or to the actual
renamed file name if you intentionally renamed it) or delete the alternative
block entirely so both instructions consistently reference requirements.txt;
look for the lines in README.md that mention requirements_cli.txt and replace or
remove them accordingly.
In @scripts/mcp/DiscoveryAgent/Rich/requirements.txt:
- Around line 1-4: Update the pinned package versions in requirements.txt to the
requested newer releases: change httpx from 0.27.0 to 0.28.1, pydantic from
2.8.2 to 2.12.5, python-dotenv from 1.0.1 to 1.2.1, and rich from 13.7.1 to
14.2.0; edit the requirements.txt entry lines for httpx, pydantic,
python-dotenv, and rich to use the new version strings and save the file so the
project installs the updated packages.
🧹 Nitpick comments (14)
scripts/mcp/DiscoveryAgent/.gitignore (2)
8-10: Consider narrowing the*.jsonlpattern if needed.The blanket
*.jsonlpattern will ignore all JSONL files in the workspace. If there are legitimate JSONL files (such as test fixtures, reference data, or configuration) that should be version-controlled, consider using a more specific pattern liketrace*.jsonlinstead.♻️ Alternative approach
# Trace files (optional - comment out if you want to commit traces) -trace.jsonl -*.jsonl +trace*.jsonl
12-15: Add.envfiles and Python build artifacts to .gitignore.Consider adding patterns for:
- Environment files (
.env,.env.*) — Important for preventing accidental credential commits, especially since the MCP bridge connects to ProxySQL endpoints- Python build/distribution artifacts (
dist/,build/,*.egg-info/,.eggs/) — Standard best practice📦 Suggested additions
# IDE .vscode/ .idea/ *.swp + +# Environment files +.env +.env.* + +# Python build artifacts +dist/ +build/ +*.egg-info/ +.eggs/scripts/mcp/DiscoveryAgent/Rich/README.md (2)
82-86: Clarify MCP_INSECURE_TLS configuration example.Line 84 sets
MCP_INSECURE_TLS="1"but line 85 is commented out with a different purpose (auth token). This might confuse users about whetherMCP_INSECURE_TLSis optional.📝 Suggested clarification
export MCP_ENDPOINT="https://127.0.0.1:6071/mcp/query" -export MCP_AUTH_TOKEN="YOUR_TOKEN" -export MCP_INSECURE_TLS="1" -# export MCP_AUTH_TOKEN="..." # if your MCP needs auth +export MCP_INSECURE_TLS="1" # for self-signed certificates +# export MCP_AUTH_TOKEN="YOUR_TOKEN" # if your MCP needs auth
190-194: Document how to configure allow-lists for Query expert.Line 193 mentions removing
run_sql_readonlyfrom the "Query expert allow-list" but doesn't explain where or how this allow-list is configured. Users following the safety guidance won't know how to implement this restriction.📝 Suggested addition
Add a subsection after line 194 explaining where the allow-list is configured:
To modify the allow-list, edit `discover_cli.py` and locate the Query expert's `allowed_tools` configuration. Remove `run_sql_readonly` from the list.Or provide the exact location if available in the implementation.
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py (4)
101-111: Consider adding error handling for file writes.If the trace file path is invalid or the disk is full, the
writemethod will raise an exception that could interrupt the main flow. For a debugging logger, you may want to silently handle write failures.♻️ Suggested improvement
def write(self, record: Dict[str, Any]): if not self.path: return - rec = dict(record) - rec["ts"] = time.time() - with open(self.path, "a", encoding="utf-8") as f: - f.write(json.dumps(rec, ensure_ascii=False) + "\n") + try: + rec = dict(record) + rec["ts"] = time.time() + with open(self.path, "a", encoding="utf-8") as f: + f.write(json.dumps(rec, ensure_ascii=False) + "\n") + except OSError: + pass # Silently ignore trace write failures
228-232: Chain exception for better debuggability.When re-raising as
LLMError, chain from the original exception to preserve the stack trace for debugging.♻️ Proposed fix
try: content = data["choices"][0]["message"]["content"] - except Exception: + except (KeyError, IndexError, TypeError) as e: self.trace.write({"type": "llm.unexpected_schema", "keys": list(data.keys())}) - raise LLMError(f"Unexpected LLM response schema. Keys={list(data.keys())}. Body={body_snip}") + raise LLMError(f"Unexpected LLM response schema. Keys={list(data.keys())}. Body={body_snip}") from e
261-269: Chain exceptions in repair error paths.Both error paths should chain from the original exception for better debugging context.
♻️ Proposed fix
if r2.status_code != 200: - raise LLMError(f"LLM repair HTTP {r2.status_code}: {r2.text}") + raise LLMError(f"LLM repair HTTP {r2.status_code}: {r2.text}") from None data2 = r2.json() content2 = data2.get("choices", [{}])[0].get("message", {}).get("content", "") self.trace.write({"type": "llm.repair.raw", "content": content2}) if not str(content2).strip(): - raise LLMError("LLM repair returned empty content (gateway misconfig or unsupported endpoint).") + raise LLMError("LLM repair returned empty content (gateway misconfig or unsupported endpoint).") from None
428-433: Consider adding resilience for individual tool call failures.A failing tool call will currently abort the entire run. For a discovery agent, you may want to catch and log individual failures while continuing with other tools/tasks.
♻️ Optional improvement for resilience
tool_results: List[Dict[str, Any]] = [] for call in act.tool_calls: ui.last_event = f"MCP tool: {call.name}" ui.tool_calls += 1 - res = await self.mcp.call_tool(call.name, call.args) - tool_results.append({"tool": call.name, "args": call.args, "result": res}) + try: + res = await self.mcp.call_tool(call.name, call.args) + tool_results.append({"tool": call.name, "args": call.args, "result": res}) + except MCPError as e: + self.trace.write({"type": "mcp.error", "tool": call.name, "error": str(e)}) + tool_results.append({"tool": call.name, "args": call.args, "error": str(e)})scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/agent_app.py (5)
1-12: Consider adding a deprecation notice at the top of the file.Since this is in a
FastAPI_deprecated_POCdirectory, adding a prominent deprecation notice at the top of the code file would help prevent accidental use.♻️ Suggested improvement
+""" +DEPRECATED: This FastAPI-based implementation is a non-working proof-of-concept. +Please use the Rich CLI version at ../Rich/discover_cli.py instead. +""" + import asyncio import json
22-43: MCPClient lacks tool name validation.Unlike the Rich CLI version, this
MCPClientdoesn't validate tool names againstKNOWN_MCP_TOOLSbefore calling. The validation happens later in_execute_tool_calls, but thecallmethod can be invoked directly without validation.Since this is deprecated, this is a minor observation. The Rich CLI version handles this better.
89-92: Chain exception for better debuggability.Same issue as in the Rich version - chain the original exception when re-raising.
♻️ Proposed fix
try: content = data["choices"][0]["message"]["content"] - except Exception: - raise LLMError(f"Unexpected LLM response: {data}") + except (KeyError, IndexError, TypeError) as e: + raise LLMError(f"Unexpected LLM response: {data}") from e
571-600: New MCP connection created per request.
upsert_intentandlist_questionseach create a newMCPClientconnection. For a prototype this is acceptable, but note the overhead for production use.Consider using a shared client or connection pool if this code is to be maintained.
316-327: Catch specific exceptions instead of bare Exception.The
_get_user_intentmethod catches all exceptions silently. Consider catching more specific exceptions to avoid masking unexpected errors.♻️ Proposed fix
async def _get_user_intent(self) -> Optional[Dict[str, Any]]: # Convention: kind="intent", key="intent/<run_id>" try: res = await self.mcp.call("catalog_get", {"kind": "intent", "key": f"intent/{self.run_id}"}) if not res: return None doc = res.get("document") if not doc: return None return json.loads(doc) - except Exception: + except (MCPError, json.JSONDecodeError, KeyError, TypeError): return Nonescripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/README.md (1)
248-250: Update or remove placeholder license section.The "Prototype / internal use" license placeholder should be resolved before any public release. Either add an actual license or remove this section if it will remain internal.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
scripts/mcp/DiscoveryAgent/.gitignorescripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/DEPRECATED.mdscripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/README.mdscripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/TODO.mdscripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/agent_app.pyscripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/requirements.txtscripts/mcp/DiscoveryAgent/Rich/README.mdscripts/mcp/DiscoveryAgent/Rich/TODO.mdscripts/mcp/DiscoveryAgent/Rich/discover_cli.pyscripts/mcp/DiscoveryAgent/Rich/requirements.txt
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/agent_app.py (1)
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py (7)
MCPError(113-114)MCPClient(116-158)close(157-158)close(184-185)LLMError(160-161)LLMClient(163-274)chat_json(187-274)
🪛 LanguageTool
scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/README.md
[grammar] ~222-~222: Ensure spelling is correct
Context: ...ch expert can only request tools in its allow‑list. This is enforced server‑side: - preve...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~228-~228: Ensure spelling is correct
Context: ...r predictable You can tighten or relax allow‑lists in ALLOWED_TOOLS. --- ## Notes on M...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/TODO.md
40-40: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
305-305: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 OSV Scanner (2.3.1)
scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/requirements.txt
[HIGH] 1-1: starlette 0.38.6: Starlette has possible denial-of-service vector when parsing large files in multipart forms
[HIGH] 1-1: starlette 0.38.6: Starlette Denial of service (DoS) via multipart/form-data
🪛 Ruff (0.14.11)
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py
2-2: Shebang should be at the beginning of the file
(EXE005)
136-136: Avoid specifying long messages outside the exception class
(TRY003)
139-139: Avoid specifying long messages outside the exception class
(TRY003)
144-144: Avoid specifying long messages outside the exception class
(TRY003)
153-153: Avoid specifying long messages outside the exception class
(TRY003)
221-221: Avoid specifying long messages outside the exception class
(TRY003)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
230-230: Do not catch blind exception: Exception
(BLE001)
232-232: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
232-232: Avoid specifying long messages outside the exception class
(TRY003)
239-239: Avoid specifying long messages outside the exception class
(TRY003)
243-243: Do not catch blind exception: Exception
(BLE001)
262-262: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
262-262: Avoid specifying long messages outside the exception class
(TRY003)
269-269: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
269-269: Avoid specifying long messages outside the exception class
(TRY003)
274-274: Avoid specifying long messages outside the exception class
(TRY003)
374-374: Avoid specifying long messages outside the exception class
(TRY003)
376-376: Avoid specifying long messages outside the exception class
(TRY003)
397-397: Avoid specifying long messages outside the exception class
(TRY003)
406-406: Avoid specifying long messages outside the exception class
(TRY003)
520-520: Do not catch blind exception: Exception
(BLE001)
573-573: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/agent_app.py
36-36: Avoid specifying long messages outside the exception class
(TRY003)
39-39: Avoid specifying long messages outside the exception class
(TRY003)
86-86: Avoid specifying long messages outside the exception class
(TRY003)
91-91: Do not catch blind exception: Exception
(BLE001)
92-92: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
92-92: Avoid specifying long messages outside the exception class
(TRY003)
97-97: Do not catch blind exception: Exception
(BLE001)
111-111: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
111-111: Avoid specifying long messages outside the exception class
(TRY003)
117-117: Avoid specifying long messages outside the exception class
(TRY003)
269-269: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
269-269: Avoid specifying long messages outside the exception class
(TRY003)
286-286: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
286-286: Avoid specifying long messages outside the exception class
(TRY003)
326-326: Do not catch blind exception: Exception
(BLE001)
349-349: Avoid specifying long messages outside the exception class
(TRY003)
351-351: Avoid specifying long messages outside the exception class
(TRY003)
375-375: Do not catch blind exception: Exception
(BLE001)
395-395: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
395-395: Avoid specifying long messages outside the exception class
(TRY003)
403-403: Avoid specifying long messages outside the exception class
(TRY003)
412-412: Avoid specifying long messages outside the exception class
(TRY003)
448-448: Do not catch blind exception: Exception
(BLE001)
549-549: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (12)
scripts/mcp/DiscoveryAgent/.gitignore (1)
1-6: LGTM!Standard Python virtual environment and cache patterns are correctly specified.
scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/DEPRECATED.md (1)
1-18: LGTM! Clear deprecation notice.The documentation effectively communicates that this is a non-working prototype, explains the technical issue (incorrect MCP protocol usage), and directs users to the working Rich CLI implementation. This is exactly the kind of clear guidance needed for deprecated code.
scripts/mcp/DiscoveryAgent/Rich/TODO.md (1)
1-68: LGTM! Well-structured roadmap for future enhancements.The TODO document provides a clear, thematic roadmap organized into logical enhancement areas. The progression from immediate improvements (catalog consistency, expert strategies) to future optimizations (concurrency, MCP enhancements) is well-considered.
Minor observation: Line 46 mentions Textual as an optional UI upgrade from Rich. If this becomes a priority, remember to add it to requirements.txt or create a separate requirements-dev.txt.
scripts/mcp/DiscoveryAgent/Rich/README.md (2)
1-200: Overall: Comprehensive and well-structured documentation.The README provides excellent coverage of installation, configuration, usage, and troubleshooting. The multi-expert architecture explanation is clear, and the examples for different LLM providers (OpenAI, Z.ai, local servers) are helpful.
Key strengths:
- Clear separation of MCP and LLM configuration
- Practical debugging guidance with trace JSONL format
- Safety notes about SQL execution
- Environment variable and CLI flag flexibility
105-111: No action required. Z.ai is a publicly accessible LLM provider with the documented endpoint (https://api.z.ai/api/coding/paas/v4) correctly configured for coding workflows, and the GLM-4.7 model is available and properly referenced.scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/TODO.md (1)
1-346: LGTM!Well-structured planning document with clear implementation guidance, rationale, and prioritized next steps. The document provides valuable context for evolving the prototype into a robust system.
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py (5)
50-63: LGTM!Tool allowlists are well-defined with appropriate restrictions per expert role, enforcing the principle of least privilege for each agent type.
65-99: LGTM!Pydantic models are well-structured with proper use of
Field(default_factory=...)for list defaults, avoiding mutable default argument pitfalls.
116-158: LGTM!MCPClient implementation is robust with proper JSON-RPC handling, bearer token authentication, TLS configuration, and tracing. The tool validation against
KNOWN_MCP_TOOLSprovides good guardrails.
277-314: LGTM!System prompts are well-structured with clear rules, output schemas, and constraints. The separation between planner and expert roles is well-defined.
316-330: LGTM!Correct use of
__post_init__to handle mutable default forplanned_tasks, avoiding the common mutable default argument pitfall.scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/README.md (1)
1-250: LGTM!Comprehensive and well-structured documentation. The limitations section honestly documents the prototype's constraints. Consider adding a prominent deprecation notice at the top referencing the working Rich CLI version.
|
|
||
| app = FastAPI(title="Database Discovery Agent (LLM + Multi-Expert)") | ||
|
|
||
| RUNS: Dict[str, Dict[str, Any]] = {} |
There was a problem hiding this comment.
In-memory RUNS dict will leak memory.
Completed runs are never removed from RUNS, leading to unbounded memory growth. Consider adding a cleanup mechanism or TTL for completed runs.
🧹 Suggested approach
Add cleanup after run completion:
# In runner() after the run completes:
async def runner():
try:
orch = Orchestrator(run_id, mcp, llm, emit)
await orch.run(schema=req.schema, max_iterations=req.max_iterations, tasks_per_iter=req.tasks_per_iter)
except Exception as e:
await emit("run", "error", {"error": str(e)})
finally:
await mcp.close()
await llm.close()
# Schedule cleanup after a delay to allow event consumption
asyncio.get_event_loop().call_later(60, lambda: RUNS.pop(run_id, None))🤖 Prompt for AI Agents
In @scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/agent_app.py at line 499,
The RUNS dict stores completed runs indefinitely causing memory leaks; update
the runner() shutdown/finally logic to remove its entry from RUNS after
completion by scheduling a delayed cleanup (e.g., use
asyncio.get_event_loop().call_later or an asyncio.create_task that awaits a
sleep then RUNS.pop(run_id, None)) so events can be consumed before deletion;
ensure cleanup is added alongside existing shutdown calls to mcp.close() and
llm.close(), and reference the RUNS key using run_id so the exact entry is
removed safely.
| fastapi==0.115.0 | ||
| uvicorn[standard]==0.30.6 | ||
| httpx==0.27.0 | ||
| pydantic==2.8.2 | ||
| python-dotenv==1.0.1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the file exists and verify its contents
if [ -f "scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/requirements.txt" ]; then
echo "=== File exists. Contents: ==="
cat "scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/requirements.txt"
else
echo "File not found at specified path"
# Try to find it
find . -name "requirements.txt" -path "*DiscoveryAgent*" 2>/dev/null
fiRepository: ProxySQL/proxysql-vec
Length of output: 189
🌐 Web query:
GHSA-2c2j-9gv5-cj73 Starlette DoS multipart
💡 Result:
GHSA-2c2j-9gv5-cj73 is a GitHub advisory (CVE-2025-54121) for Starlette describing a moderate-severity DoS: parsing multipart/form-data with large files can block the event loop when a SpooledTemporaryFile rolls to disk, preventing new connections. Affected versions: < 0.47.2. Fix: upgrade to starlette ≥ 0.47.2 (PR/commit linked in advisory). [1][2]
References:
- GitHub advisory (GHSA-2c2j-9gv5-cj73 / Starlette multipart DoS). [1]
- OSV / CVE entry (CVE-2025-54121) and related links. [2]
🌐 Web query:
GHSA-f96h-pmfr-66vw Starlette DoS multipart form-data
💡 Result:
Summary: GHSA-f96h-pmfr-66vw (CVE-2024-47874) is a high-severity DoS in Starlette where multipart/form-data parts without a filename are treated as text fields and buffered with no size limit, allowing an attacker to exhaust memory and crash or hang servers. Affects starlette < 0.40.0; fixed in 0.40.0. Mitigation: upgrade to >= 0.40.0 (or apply the upstream patch adding max_part_size). [1][2][3]
References
- GitHub security advisory (GHSA-f96h-pmfr-66vw). [1]
- Fix commit adding max_part_size (fd038f3 / v0.40.0). [2]
- Vendor/OS advisories (Debian/Ubuntu CVE pages). [3]
🏁 Script executed:
#!/bin/bash
# Check what version of Starlette is a dependency of FastAPI 0.115.0
# We'll fetch from PyPI and parse the metadata
echo "=== Checking Starlette dependency for FastAPI 0.115.0 ==="
curl -s "https://pypi.org/pypi/fastapi/0.115.0/json" | jq '.info.requires_dist[]' | grep -i starletteRepository: ProxySQL/proxysql-vec
Length of output: 150
🌐 Web query:
FastAPI 0.115.0 Starlette dependency version
💡 Result:
FastAPI 0.115.0 requires Starlette >=0.37.2,<0.39.0. [1][2]
Sources:
- GitLab dependency resolution output showing fastapi (0.115.0) -> starlette (>=0.37.2,<0.39.0). [1]
- Fedora package changelog for FastAPI 0.115.0 (package metadata). [2]
Address security vulnerabilities in dependencies.
Static analysis has flagged vulnerabilities in Starlette (a transitive dependency of FastAPI):
- GHSA-f96h-pmfr-66vw (HIGH severity): DoS via multipart/form-data parts without filename, unbounded buffering. Affects Starlette < 0.40.0.
- GHSA-2c2j-9gv5-cj73 (MODERATE severity): DoS when parsing large files in multipart forms causes event loop blocking. Affects Starlette < 0.47.2.
FastAPI 0.115.0 requires Starlette >=0.37.2,<0.39.0, which falls within both vulnerable ranges.
While this code is marked as deprecated and should not be run, keeping vulnerable dependencies in the repository presents supply chain risks and may trigger security scanners in CI/CD pipelines. Consider either removing this deprecated POC entirely or updating the dependencies to address these vulnerabilities.
🧰 Tools
🪛 OSV Scanner (2.3.1)
[HIGH] 1-1: starlette 0.38.6: Starlette has possible denial-of-service vector when parsing large files in multipart forms
[HIGH] 1-1: starlette 0.38.6: Starlette Denial of service (DoS) via multipart/form-data
🤖 Prompt for AI Agents
In @scripts/mcp/DiscoveryAgent/FastAPI_deprecated_POC/requirements.txt around
lines 1 - 5, The requirements pin fastapi==0.115.0 (in requirements.txt) which
pulls vulnerable Starlette versions; either delete this deprecated POC file
entirely or update the dependency pins: replace fastapi==0.115.0 with a current
FastAPI release that depends on Starlette >=0.47.2 (or add an explicit
starlette>=0.47.2 pin to requirements.txt to override the transitive
dependency), bump/align related packages (e.g., pydantic) as needed, then run
the app/tests/CI to verify compatibility and remove the deprecated artifact if
you choose not to maintain it.
| except KeyboardInterrupt: | ||
| Console().print("\\n[yellow]Interrupted[/yellow]") | ||
| raise SystemExit(130) |
There was a problem hiding this comment.
Same escaped newline issue and missing exception chaining.
Two issues here: the escaped newline and the SystemExit should chain from the KeyboardInterrupt.
🐛 Proposed fix
except KeyboardInterrupt:
- Console().print("\\n[yellow]Interrupted[/yellow]")
- raise SystemExit(130)
+ Console().print("\n[yellow]Interrupted[/yellow]")
+ raise SystemExit(130) from None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except KeyboardInterrupt: | |
| Console().print("\\n[yellow]Interrupted[/yellow]") | |
| raise SystemExit(130) | |
| except KeyboardInterrupt: | |
| Console().print("\n[yellow]Interrupted[/yellow]") | |
| raise SystemExit(130) from None |
🧰 Tools
🪛 Ruff (0.14.11)
573-573: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In @scripts/mcp/DiscoveryAgent/Rich/discover_cli.py around lines 571 - 573,
Change the KeyboardInterrupt handler to use a real newline and exception
chaining: catch the interrupt as an exception (e.g., "except KeyboardInterrupt
as e"), call Console().print with "\n[yellow]Interrupted[/yellow]" (not an
escaped "\\n"), and re-raise SystemExit(130) using "from e" so the SystemExit
chains from the original KeyboardInterrupt.
Python bridge (scripts/mcp/proxysql_mcp_stdio_bridge.py): - Make log file path configurable via PROXYSQL_MCP_BRIDGE_LOG env var - Add httpx.RequestError exception handling for network issues - Fix asyncio.CancelledError not being re-raised (HIGH priority) - Replace deprecated asyncio.get_event_loop() with get_running_loop() C++ server (lib/MCP_Endpoint.cpp): - Refactor handle_tools_call() to reduce code duplication - Handle string responses directly without calling .dump() - Single shared wrapping block for all response types Per review: #11
Fix: MCP stdio bridge - Make tool responses MCP-compliant and add debugging
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @scripts/mcp/proxysql_mcp_stdio_bridge.py:
- Around line 336-345: The endpoint environment default makes the subsequent
emptiness check dead; choose one approach: either remove the default in the
os.getenv call for PROXYSQL_MCP_ENDPOINT so endpoint must be provided (change
endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://...") to endpoint =
os.getenv("PROXYSQL_MCP_ENDPOINT") and keep the existing validation that writes
to stderr and exits), or keep the default and remove the validation block that
checks if not endpoint (delete the if not endpoint: sys.stderr.write(...)
sys.exit(1) lines). Ensure references to the variable name endpoint and the env
var PROXYSQL_MCP_ENDPOINT are preserved so behavior remains clear.
- Around line 38-42: The module currently opens _log_file at import time using
_log_file_path and defines _log which writes to it; move the file-open out of
global scope and make opening defensive: defer opening the file until main() or
first call to _log, wrap the open in try/except to catch IOErrors and fallback
to sys.stderr (or disable file logging), and ensure the file handle (_log_file)
is closed on exit (use atexit.register or close in main/shutdown). Update _log
to handle the case where _log_file is None or fallback is used so writes don’t
raise, and reference the existing symbols _log_file_path, _log_file, and _log
when making these changes.
In @scripts/mcp/STDIO_BRIDGE_README.md:
- Around line 33-37: The README incorrectly marks PROXYSQL_MCP_ENDPOINT as
required while the script supplies a default; update the table row for
`PROXYSQL_MCP_ENDPOINT` to indicate Required = No (or remove "Yes") and set the
Default column to `https://127.0.0.1:6071/mcp/query` so the documentation
matches the actual behavior of the script, referencing the
`PROXYSQL_MCP_ENDPOINT` variable name.
- Line 48: The example JSON hardcodes a personal path in the "args" array;
replace the literal
"/home/rene/proxysql-vec/scripts/mcp/proxysql_mcp_stdio_bridge.py" with a
generic placeholder (e.g. "./scripts/mcp/proxysql_mcp_stdio_bridge.py" or
"<path_to>/scripts/mcp/proxysql_mcp_stdio_bridge.py") so the "args" entry no
longer contains a developer-specific absolute path; update the "args" value
accordingly in the example.
🧹 Nitpick comments (2)
scripts/mcp/proxysql_mcp_stdio_bridge.py (2)
53-84: Unused_initializedflag.The
_initializedflag is set toTrueon line 84 but never checked anywhere. Either use it to guard against duplicate initialization or remove the dead code.
206-206: Remove extraneous f-string prefix.This f-string has no placeholders.
Fix
- _log(f"WRITE stdout: flushed") + _log("WRITE stdout: flushed")
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/MCP_Endpoint.cppscripts/mcp/STDIO_BRIDGE_README.mdscripts/mcp/proxysql_mcp_stdio_bridge.pysrc/proxysql.cfg
🧰 Additional context used
🪛 Gitleaks (8.30.0)
scripts/mcp/proxysql_mcp_stdio_bridge.py
[high] 348-348: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 markdownlint-cli2 (0.18.1)
scripts/mcp/STDIO_BRIDGE_README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.11)
scripts/mcp/proxysql_mcp_stdio_bridge.py
38-38: Probable insecure usage of temporary file or directory: "/tmp/proxysql_mcp_bridge.log"
(S108)
90-90: Avoid specifying long messages outside the exception class
(TRY003)
119-119: Do not catch blind exception: Exception
(BLE001)
124-124: Use explicit conversion flag
Replace with conversion flag
(RUF010)
188-188: Do not catch blind exception: Exception
(BLE001)
203-203: Use explicit conversion flag
Replace with conversion flag
(RUF010)
206-206: f-string without any placeholders
Remove extraneous f prefix
(F541)
257-257: Unused method argument: params
(ARG002)
355-355: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (2)
src/proxysql.cfg (1)
70-71: LGTM!The closing brace and blank line properly terminate the
mcp_variablesblock and provide clear separation before thegenai_variablessection, following the libconfig grammar structure.lib/MCP_Endpoint.cpp (1)
344-380: LGTM - Clean MCP response wrapping implementation.The unwrapping of ProxySQL's internal
{"success": ..., "result": ...}format and re-wrapping into MCP-compliant structure withcontentarray andisErrorboolean is well-implemented.One minor observation: the spec URL comment on line 367 references "2025-11-25" but the protocol version used elsewhere (line 228, and in the Python bridge) is "2024-11-05". Consider verifying which spec version applies.
| _log_file_path = os.getenv("PROXYSQL_MCP_BRIDGE_LOG", "/tmp/proxysql_mcp_bridge.log") | ||
| _log_file = open(_log_file_path, "a", buffering=1) | ||
| def _log(msg): | ||
| _log_file.write(f"[{datetime.now().strftime('%H:%M:%S.%f')[:-3]}] {msg}\n") | ||
| _log_file.flush() |
There was a problem hiding this comment.
Log file opened at module load without error handling or cleanup.
The log file is opened globally at import time. If the file cannot be created (permissions, disk full), the script crashes before reaching main(). Consider wrapping in try/except or deferring the open.
Additionally, the file handle is never explicitly closed, though this is typically acceptable for long-running processes.
Suggested improvement
-_log_file_path = os.getenv("PROXYSQL_MCP_BRIDGE_LOG", "/tmp/proxysql_mcp_bridge.log")
-_log_file = open(_log_file_path, "a", buffering=1)
+_log_file_path = os.getenv("PROXYSQL_MCP_BRIDGE_LOG", "/tmp/proxysql_mcp_bridge.log")
+_log_file = None
+
+def _init_log():
+ global _log_file
+ if _log_file is None:
+ try:
+ _log_file = open(_log_file_path, "a", buffering=1)
+ except OSError:
+ _log_file = None
+
def _log(msg):
- _log_file.write(f"[{datetime.now().strftime('%H:%M:%S.%f')[:-3]}] {msg}\n")
- _log_file.flush()
+ _init_log()
+ if _log_file:
+ _log_file.write(f"[{datetime.now().strftime('%H:%M:%S.%f')[:-3]}] {msg}\n")
+ _log_file.flush()🧰 Tools
🪛 Ruff (0.14.11)
38-38: Probable insecure usage of temporary file or directory: "/tmp/proxysql_mcp_bridge.log"
(S108)
🤖 Prompt for AI Agents
In @scripts/mcp/proxysql_mcp_stdio_bridge.py around lines 38 - 42, The module
currently opens _log_file at import time using _log_file_path and defines _log
which writes to it; move the file-open out of global scope and make opening
defensive: defer opening the file until main() or first call to _log, wrap the
open in try/except to catch IOErrors and fallback to sys.stderr (or disable file
logging), and ensure the file handle (_log_file) is closed on exit (use
atexit.register or close in main/shutdown). Update _log to handle the case where
_log_file is None or fallback is used so writes don’t raise, and reference the
existing symbols _log_file_path, _log_file, and _log when making these changes.
| endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://127.0.0.1:6071/mcp/query") | ||
| token = os.getenv("PROXYSQL_MCP_TOKEN", "") | ||
| insecure_ssl = os.getenv("PROXYSQL_MCP_INSECURE_SSL", "0").lower() in ("1", "true", "yes") | ||
|
|
||
| _log(f"START: endpoint={endpoint}, insecure_ssl={insecure_ssl}") | ||
|
|
||
| # Validate endpoint | ||
| if not endpoint: | ||
| sys.stderr.write("Error: PROXYSQL_MCP_ENDPOINT environment variable is required\n") | ||
| sys.exit(1) |
There was a problem hiding this comment.
Unreachable validation due to default value.
The endpoint variable has a default value on line 336, so the validation on lines 343-345 checking if endpoint is empty will never trigger. Either remove the default to make the variable truly required, or remove the dead validation code.
Option A: Make endpoint truly required (remove default)
- endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://127.0.0.1:6071/mcp/query")
+ endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT")Option B: Remove dead validation (keep default)
endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://127.0.0.1:6071/mcp/query")
token = os.getenv("PROXYSQL_MCP_TOKEN", "")
insecure_ssl = os.getenv("PROXYSQL_MCP_INSECURE_SSL", "0").lower() in ("1", "true", "yes")
_log(f"START: endpoint={endpoint}, insecure_ssl={insecure_ssl}")
-
- # Validate endpoint
- if not endpoint:
- sys.stderr.write("Error: PROXYSQL_MCP_ENDPOINT environment variable is required\n")
- sys.exit(1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://127.0.0.1:6071/mcp/query") | |
| token = os.getenv("PROXYSQL_MCP_TOKEN", "") | |
| insecure_ssl = os.getenv("PROXYSQL_MCP_INSECURE_SSL", "0").lower() in ("1", "true", "yes") | |
| _log(f"START: endpoint={endpoint}, insecure_ssl={insecure_ssl}") | |
| # Validate endpoint | |
| if not endpoint: | |
| sys.stderr.write("Error: PROXYSQL_MCP_ENDPOINT environment variable is required\n") | |
| sys.exit(1) | |
| endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT") | |
| token = os.getenv("PROXYSQL_MCP_TOKEN", "") | |
| insecure_ssl = os.getenv("PROXYSQL_MCP_INSECURE_SSL", "0").lower() in ("1", "true", "yes") | |
| _log(f"START: endpoint={endpoint}, insecure_ssl={insecure_ssl}") | |
| # Validate endpoint | |
| if not endpoint: | |
| sys.stderr.write("Error: PROXYSQL_MCP_ENDPOINT environment variable is required\n") | |
| sys.exit(1) |
| endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://127.0.0.1:6071/mcp/query") | |
| token = os.getenv("PROXYSQL_MCP_TOKEN", "") | |
| insecure_ssl = os.getenv("PROXYSQL_MCP_INSECURE_SSL", "0").lower() in ("1", "true", "yes") | |
| _log(f"START: endpoint={endpoint}, insecure_ssl={insecure_ssl}") | |
| # Validate endpoint | |
| if not endpoint: | |
| sys.stderr.write("Error: PROXYSQL_MCP_ENDPOINT environment variable is required\n") | |
| sys.exit(1) | |
| endpoint = os.getenv("PROXYSQL_MCP_ENDPOINT", "https://127.0.0.1:6071/mcp/query") | |
| token = os.getenv("PROXYSQL_MCP_TOKEN", "") | |
| insecure_ssl = os.getenv("PROXYSQL_MCP_INSECURE_SSL", "0").lower() in ("1", "true", "yes") | |
| _log(f"START: endpoint={endpoint}, insecure_ssl={insecure_ssl}") |
🤖 Prompt for AI Agents
In @scripts/mcp/proxysql_mcp_stdio_bridge.py around lines 336 - 345, The
endpoint environment default makes the subsequent emptiness check dead; choose
one approach: either remove the default in the os.getenv call for
PROXYSQL_MCP_ENDPOINT so endpoint must be provided (change endpoint =
os.getenv("PROXYSQL_MCP_ENDPOINT", "https://...") to endpoint =
os.getenv("PROXYSQL_MCP_ENDPOINT") and keep the existing validation that writes
to stderr and exits), or keep the default and remove the validation block that
checks if not endpoint (delete the if not endpoint: sys.stderr.write(...)
sys.exit(1) lines). Ensure references to the variable name endpoint and the env
var PROXYSQL_MCP_ENDPOINT are preserved so behavior remains clear.
…ssist Critical fixes: - Remove stray backslash at start of discover_cli.py (causes syntax error) - Fix escaped newlines (\\n) to real newlines (\n) in discover_cli.py Documentation fixes: - Update README.md: remove incorrect requirements_cli.txt reference - Update README.md: use generic path placeholder instead of /home/rene/... - Update STDIO_BRIDGE_README.md: mark PROXYSQL_MCP_ENDPOINT as optional with default Dependency updates: - Update package versions: httpx 0.27.0->0.28.1, pydantic 2.8.2->2.12.5, python-dotenv 1.0.1->1.2.1, rich 13.7.1->14.2.0 Per review: #10
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py (3)
227-273: Consider improving exception handling for better debugging.The blind
except Exceptioncatches (lines 229, 242) and missing exception chaining (lines 231, 261, 268) could obscure root causes during debugging. This is functional but could be cleaner.♻️ Suggested improvements
try: content = data["choices"][0]["message"]["content"] - except Exception: + except (KeyError, IndexError, TypeError) as e: self.trace.write({"type": "llm.unexpected_schema", "keys": list(data.keys())}) - raise LLMError(f"Unexpected LLM response schema. Keys={list(data.keys())}. Body={body_snip}") + raise LLMError(f"Unexpected LLM response schema. Keys={list(data.keys())}. Body={body_snip}") from eSimilarly for the repair section:
if r2.status_code != 200: - raise LLMError(f"LLM repair HTTP {r2.status_code}: {r2.text}") + raise LLMError(f"LLM repair HTTP {r2.status_code}: {r2.text}") from None # ... if not str(content2).strip(): - raise LLMError("LLM repair returned empty content (gateway misconfig or unsupported endpoint).") + raise LLMError("LLM repair returned empty content (gateway misconfig or unsupported endpoint).") from None
315-329: Minor type annotation improvement.The
planned_tasksfield should useOptional[List[PlannedTask]]to match theNonedefault, even though the__post_init__ensures it's always a list after initialization.♻️ Suggested fix
@dataclass class UIState: run_id: str phase: str = "init" iteration: int = 0 - planned_tasks: List[PlannedTask] = None + planned_tasks: Optional[List[PlannedTask]] = None last_event: str = "" last_error: str = "" tool_calls: int = 0 catalog_writes: int = 0 insights: int = 0
408-413: User intent is alwaysNonein the planner call.The
planner()method acceptsuser_intentbut it's hardcoded toNonein therun()loop. If user intent is meant to guide discovery, consider parameterizing this viacmd_runargs or making it configurable.If this is intentional for the initial implementation, consider adding a TODO or removing the parameter from the method signature to avoid confusion.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
scripts/mcp/DiscoveryAgent/Rich/README.mdscripts/mcp/DiscoveryAgent/Rich/discover_cli.pyscripts/mcp/DiscoveryAgent/Rich/requirements.txtscripts/mcp/STDIO_BRIDGE_README.md
✅ Files skipped from review due to trivial changes (1)
- scripts/mcp/DiscoveryAgent/Rich/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/mcp/DiscoveryAgent/Rich/requirements.txt
- scripts/mcp/STDIO_BRIDGE_README.md
🧰 Additional context used
🪛 Ruff (0.14.11)
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py
135-135: Avoid specifying long messages outside the exception class
(TRY003)
138-138: Avoid specifying long messages outside the exception class
(TRY003)
143-143: Avoid specifying long messages outside the exception class
(TRY003)
152-152: Avoid specifying long messages outside the exception class
(TRY003)
220-220: Avoid specifying long messages outside the exception class
(TRY003)
225-225: Avoid specifying long messages outside the exception class
(TRY003)
229-229: Do not catch blind exception: Exception
(BLE001)
231-231: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
231-231: Avoid specifying long messages outside the exception class
(TRY003)
238-238: Avoid specifying long messages outside the exception class
(TRY003)
242-242: Do not catch blind exception: Exception
(BLE001)
261-261: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
261-261: Avoid specifying long messages outside the exception class
(TRY003)
268-268: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
268-268: Avoid specifying long messages outside the exception class
(TRY003)
273-273: Avoid specifying long messages outside the exception class
(TRY003)
373-373: Avoid specifying long messages outside the exception class
(TRY003)
375-375: Avoid specifying long messages outside the exception class
(TRY003)
396-396: Avoid specifying long messages outside the exception class
(TRY003)
405-405: Avoid specifying long messages outside the exception class
(TRY003)
519-519: Do not catch blind exception: Exception
(BLE001)
572-572: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (9)
scripts/mcp/DiscoveryAgent/Rich/discover_cli.py (9)
1-63: LGTM!Clear module docstring documenting configuration options, appropriate imports for async HTTP/Pydantic/Rich, and well-structured constants for tool validation with per-expert access control.
64-98: LGTM!Pydantic models are well-structured with appropriate field defaults and type annotations. The use of
ExpertNameLiteral ensures proper validation of expert types.
100-110: LGTM!The per-write file opening pattern is appropriate for a trace logger—it ensures durability on crash and avoids holding file handles. The trade-off of I/O overhead is acceptable for debugging purposes.
112-157: LGTM!Solid MCP client implementation with proper JSON-RPC 2.0 structure, tool validation against
KNOWN_MCP_TOOLS, and appropriate error handling. The detailed exception messages (flagged by static analysis) are acceptable for CLI debugging context.
276-313: LGTM!Well-structured system prompts with clear JSON output schemas and rules. The double-brace escaping for
.format()substitution is correct.
331-344: LGTM!Defensive helper functions that handle variable MCP response formats gracefully.
444-480: LGTM!Clean Rich UI rendering with proper null handling and a well-structured layout. The task list limiting to 10 items prevents UI overflow.
540-575: LGTM!Clean CLI structure with shared common arguments via
parentsand proper exit code handling for keyboard interrupts. The subcommand pattern is extensible for future commands.
531-538: Implementation is correct—Live re-renders the Layout on each refresh cycle.The
Livewidget keeps a reference to theLayoutobject returned byrender(ui)and automatically re-renders it on each refresh cycle (every ~125ms at 8fps). Since theLayoutreferences the mutableuiobject through field access (ui.phase,ui.iteration, etc.), mutations made by therunner()task are automatically reflected in subsequent renders without requiringLive.update(). Rich does not cache the renderable output; it evaluates it on every refresh.
The catalog_search() and catalog_list() methods in MySQL_Catalog.cpp were manually building JSON strings by concatenating raw TEXT from SQLite without proper escaping. This caused parse errors when stored JSON contained quotes, backslashes, or newlines. Changes: - MySQL_Catalog.cpp: Use nlohmann::json to build proper nested JSON in search() and list() methods instead of manual concatenation - MySQL_Tool_Handler.cpp: Add try-catch for JSON parsing in catalog_get() - test_catalog.sh: Fix MCP URL path, add jq extraction for MCP protocol responses, add 3 special character tests (CAT013-CAT015) Test Results: All 15 catalog tests pass, including new tests that verify special characters (quotes, backslashes) are preserved.
Implements a 4-agent collaborative system using Claude Code's Task tool and MCP catalog for comprehensive database analysis: - Structural Agent: Maps tables, relationships, indexes, constraints - Statistical Agent: Profiles data distributions, patterns, anomalies - Semantic Agent: Infers business domain and entity types - Query Agent: Analyzes access patterns and optimization Agents collaborate via MCP catalog across 4 rounds: 1. Blind exploration → 2. Pattern recognition → 3. Hypothesis testing → 4. Final synthesis Includes simple_discovery.py demo and comprehensive documentation.
Implement scripts for running Claude Code in non-interactive mode to perform comprehensive database discovery on any database. Files added: - headless_db_discovery.sh: Bash script implementation - headless_db_discovery.py: Python script implementation (recommended) - HEADLESS_DISCOVERY_README.md: Comprehensive documentation Features: - Works with any database accessible via MCP - Database-agnostic discovery prompt - Comprehensive analysis: structure, data, semantics, performance - Markdown report output with ERD, data quality score, recommendations - CI/CD integration ready - Supports custom MCP server configuration - Configurable timeout, output, verbosity Usage: python scripts/headless_db_discovery.py --database mydb
Add comprehensive error details to help users debug NL2SQL conversion issues. Changes: - Add error_code, error_details, http_status_code, provider_used fields to NL2SQLResult - Add NL2SQLErrorCode enum with structured error codes: * SUCCESS, ERR_API_KEY_MISSING, ERR_API_KEY_INVALID, ERR_TIMEOUT * ERR_CONNECTION_FAILED, ERR_RATE_LIMITED, ERR_SERVER_ERROR * ERR_EMPTY_RESPONSE, ERR_INVALID_RESPONSE, ERR_SQL_INJECTION_DETECTED * ERR_VALIDATION_FAILED, ERR_UNKNOWN_PROVIDER, ERR_REQUEST_TOO_LARGE - Add nl2sql_error_code_to_string() function for error code conversion - Add format_error_context() helper to create detailed error messages including: * Query (truncated if too long) * Schema name * Provider attempted * Endpoint URL * Specific error message - Add set_error_details() helper to populate error fields - Update error handling in convert() to use new error details - Track provider_used in successful conversions This provides much better debugging information when NL2SQL conversions fail, making it easier to identify misconfigurations and connectivity issues. Fixes #1 - Improve Error Messages
Add comprehensive structured logging for NL2SQL LLM API calls with request correlation, timing metrics, and detailed error context. Changes: - Add request_id field to NL2SQLRequest with UUID-like auto-generation - Add structured logging macros: * LOG_LLM_REQUEST: Logs URL, model, prompt length with request ID * LOG_LLM_RESPONSE: Logs HTTP status, duration_ms, response preview * LOG_LLM_ERROR: Logs error phase, message, and status code - Update call_generic_openai() signature to accept req_id parameter - Update call_generic_anthropic() signature to accept req_id parameter - Add timing metrics to both LLM call functions using clock_gettime() - Replace existing debug logging with structured logging macros - Update convert() to pass request_id to LLM calls Request IDs are generated as UUID-like strings (e.g., "12345678-9abc-def0-1234-567890abcdef") and are included in all log messages for correlation. This allows tracking a single NL2SQL request through all log lines from request to response. Timing is measured using CLOCK_MONOTONIC for accurate duration tracking of LLM API calls, reported in milliseconds. This provides much better debugging capability when troubleshooting NL2SQL issues, as administrators can now: - Correlate all log lines for a single request - See exact timing of LLM API calls - Identify which phase of processing failed - Track request/response metrics Fixes #2 - Add Structured Logging
This commit adds configurable retry logic with exponential backoff for NL2SQL LLM API calls. Changes: - Add retry configuration to NL2SQLRequest (max_retries, retry_backoff_ms, retry_multiplier, retry_max_backoff_ms) - Add is_retryable_error() to identify retryable HTTP/CURL errors - Add sleep_with_jitter() for exponential backoff with 10% jitter - Add call_generic_openai_with_retry() wrapper - Add call_generic_anthropic_with_retry() wrapper - Update NL2SQL_Converter::convert() to use retry wrappers Default retry behavior: - 3 retries with 1000ms initial backoff - 2.0x multiplier, 30000ms max backoff - Retries on empty responses (transient failures) Part of: Phase 3 of NL2SQL improvement plan
This commit adds comprehensive unit tests for the AI configuration validation functions used in AI_Features_Manager. Changes: - Add test/tap/tests/ai_validation-t.cpp with 61 unit tests - Test URL format validation (validate_url_format) - Test API key format validation (validate_api_key_format) - Test numeric range validation (validate_numeric_range) - Test provider name validation (validate_provider_name) - Test edge cases and boundary conditions The test file is self-contained with its own copies of the validation functions to avoid complex linking dependencies on libproxysql. Test Categories: - URL validation: 15 tests (http://, https:// protocols) - API key validation: 14 tests (OpenAI, Anthropic formats) - Numeric range: 13 tests (min/max boundaries) - Provider name: 8 tests (openai, anthropic) - Edge cases: 11 tests (NULL handling, long values) All 61 tests pass successfully. Part of: Phase 4 of NL2SQL improvement plan
This commit updates the NL2SQL documentation to reflect the new features added in v0.2.0: README.md changes: - Added Request Configuration section with retry parameters - Added Error Handling section with error code table - Added Request Correlation section with log format examples - Updated Results section with error columns - Updated Troubleshooting with retry behavior documentation - Added v0.2.0 to Version History API.md changes: - Updated NL2SQLRequest struct with request_id and retry config fields - Updated NL2SQLResult struct with error details fields - Added NL2SQLErrorCode enum documentation - Updated Result Format with new columns - Expanded Error Codes section with structured error codes TESTING.md changes: - Added Validation Tests to test suite overview - Documented ai_validation-t.cpp test categories - Added instructions for running validation tests - Documented all 61 test cases across 5 categories
Add comprehensive TAP unit tests for NL2SQL internal functions: - Error code conversion (5 tests): Validate nl2sql_error_code_to_string() covers all 13 defined error codes plus UNKNOWN_ERROR - SQL validation patterns (17 tests): Test validate_and_score_sql() * Valid SELECT queries (4 tests) * Non-SELECT queries (4 tests) * Injection pattern detection (4 tests) * Edge cases (4 tests): empty, lone keyword, semicolons, complex queries - Request ID generation (12 tests): Test UUID-like ID generation * Format validation (20 assertions for 10 IDs) * Uniqueness (100 IDs checked for duplicates) * Hexadecimal character validation - Prompt building (8 tests): Test build_prompt() * Basic prompt structure (3 tests) * Schema context inclusion (3 tests) * Section ordering (1 test) * Special character handling (2 tests) Note: Tests are self-contained with standalone implementations matching the logic in NL2SQL_Converter.cpp.
…ndling, and performance monitoring - Rename validate_provider_name to validate_provider_format for clarity - Add null checks and error handling for all strdup() operations - Enhance error messages with more context and HTTP status codes - Implement performance monitoring with timing metrics for LLM calls and cache operations - Add comprehensive test coverage for edge cases, retry scenarios, and performance - Extend status variables to track performance metrics - Update MySQL session to report timing information to AI manager
Multi-Agent Database Discovery System using Claude Code
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/mcp/test_catalog.sh (1)
18-60: Guardjqparse failures to preserve fallback underset -e.
Line 54 usesjqinside a command substitution without safeguard. Underset -e, if the response isn't valid JSON orjqis unavailable, the script aborts immediately before the fallback logic (lines 57–60) can execute. Add|| trueto prevent premature exit.🩹 Guard to preserve fallback
- extracted=$(echo "${response}" | jq -r 'if .result.content[0].text then .result.content[0].text else . end' 2>/dev/null) + extracted=$(echo "${response}" | jq -r 'if .result.content[0].text then .result.content[0].text else . end' 2>/dev/null || true)
🤖 Fix all issues with AI agents
In `@doc/multi_agent_database_discovery.md`:
- Around line 153-160: Add a blank line before the "#### True Statistics (After
Deduplication)" heading and another blank line after the table (after the "|
Revenue | $10,886.67 | $3,628.85 |" row) so the block is separated by empty
lines to satisfy markdownlint MD058; locate the heading and the table in the doc
and insert the blank lines accordingly.
- Around line 9-32: The code block containing the ASCII architecture diagram
needs a language tag to satisfy MD040 and improve formatting; update the fenced
code block that currently starts with ``` (the ASCII diagram block in
multi_agent_database_discovery.md) to use ```text so the diagram is explicitly
tagged as text and renders consistently.
In
`@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/examples/DATABASE_DISCOVERY_REPORT.md`:
- Line 484: Replace the final emphasized line "*Report generated by multi-agent
discovery system via MCP server on 2026-01-14*" with a proper heading or plain
text to satisfy markdownlint MD036; for example change the emphasized asterisks
to a heading like "### Report generated by multi-agent discovery system via MCP
server on 2026-01-14" or remove the asterisks to make it plain text so the line
is no longer treated as an emphasized heading.
- Around line 15-23: Insert a blank line immediately before the "### Key
Metrics" table and another blank line immediately after the table so the table
is separated from surrounding paragraphs; update the section around the "### Key
Metrics" heading and the pipe-delimited table (the rows starting with "| Metric
| Value | Notes |" through the last "| **Production Readiness** | 5-30% | NOT
READY |") to ensure there is an empty line above the table and an empty line
below it to satisfy markdownlint MD058.
- Around line 31-65: The fenced code blocks containing the database ASCII
diagram (the block that begins with "testdb" and the other similar blocks
referenced in the comment) are missing language identifiers; update each
triple-backtick fence to include an appropriate language tag (e.g., use ```text)
so the blocks satisfy MD040 and improve readability—ensure you change every
fenced block noted (the "testdb" block and the other blocks referenced in the
comment) by adding the language tag to the opening fence.
In `@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py`:
- Around line 106-111: When args.mcp_file is provided but the file does not
exist, instead of only calling log_error and returning (None, None), abort the
process with a non-zero exit so the script fails immediately; locate the branch
that checks args.mcp_file and os.path.isfile(args.mcp_file) and replace the
current log_error + return None behavior with a termination (e.g., call
sys.exit(1) or raise SystemExit after logging) so the process stops when the
requested MCP configuration file is missing.
- Around line 113-121: The current bridge path resolution uses incorrect
relative segments and doesn't validate the final path; update the logic that
sets proxysql_endpoint/script_dir/bridge_path to search upward for the real
scripts/mcp/proxysql_mcp_stdio_bridge.py (e.g., check script_dir,
script_dir.parent, and script_dir.parents for a
"scripts/mcp/proxysql_mcp_stdio_bridge.py" location) and set bridge_path to that
discovered file, and if none found call out the error (raise or
processLogger.error + exit) instead of continuing with a non-existent path;
update the variables referenced (proxysql_endpoint, script_dir, bridge_path)
accordingly so the bridge file is resolved to
scripts/mcp/proxysql_mcp_stdio_bridge.py and existence is validated before use.
In `@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.sh`:
- Around line 179-180: BRIDGE_PATH is computed incorrectly: from SCRIPT_DIR
(scripts/mcp/DiscoveryAgent/ClaudeCode_Headless) the current
"../mcp/proxysql_mcp_stdio_bridge.py" points to a non-existent
scripts/mcp/DiscoveryAgent/mcp path; update BRIDGE_PATH to point two levels up
to the actual bridge at scripts/mcp/proxysql_mcp_stdio_bridge.py by using
"$SCRIPT_DIR/../../proxysql_mcp_stdio_bridge.py" (i.e., change the BRIDGE_PATH
assignment that references SCRIPT_DIR).
In `@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/HEADLESS_DISCOVERY_README.md`:
- Line 275: Update the broken markdown link on the line that references
"Multi-Agent Database Discovery System" so it points to the correct location of
the doc: change the href from ../doc/multi_agent_database_discovery.md to
../../../doc/multi_agent_database_discovery.md in HEADLESS_DISCOVERY_README.md
(the line containing the link text "Multi-Agent Database Discovery System") so
the link resolves to the repository root doc/multi_agent_database_discovery.md.
- Around line 159-160: Update the incorrect npm package name used for installing
Claude Code: replace occurrences of "@anthropics/claude-code" with the official
package "@anthropic-ai/claude-code" in the README examples—specifically update
the GitHub Actions snippet that shows "npm install -g `@anthropics/claude-code`"
and the bash installation section so both use "npm install -g
`@anthropic-ai/claude-code`".
- Around line 276-277: Update the two external links in
HEADLESS_DISCOVERY_README.md: replace the Claude Code URL
`https://docs.anthropic.com/claude-code` with
`https://docs.anthropic.com/en/docs/claude-code/overview` and change the MCP URL
`https://modelcontextprotocol.io/` to `https://modelcontextprotocol.io/docs`;
modify the two list entries shown in the diff so they point to the new corrected
URLs.
In `@scripts/mcp/test_catalog.sh`:
- Around line 302-367: Update the script's help/usage text that currently lists
tests up to CAT012 to include CAT013, CAT014, and CAT015; add concise entries
matching the new tests' purposes (CAT013: "Upsert special characters", CAT014:
"Get special chars entry", CAT015: "Cleanup special chars") in the same
help/usage block or function (where the existing CAT001–CAT012 lines are
defined) so the --help output accurately reflects the new tests.
In `@simple_discovery.py`:
- Around line 134-143: Remove the unnecessary f-string prefixes on print
statements that don't interpolate variables: change print(f"\nStructure:"),
print(f"\nData Insights:"), print(f"\nBusiness Understanding:"), print(f" →
This is MySQL's own user management table."), and print(f" → Not application
user data - this is database admin accounts.") to plain strings (e.g.,
print("\nStructure:")) while keeping the other prints that reference
structure[0] and data[0] (like print(f" - Columns: {',
'.join(structure[0]['document']['columns'])}") and print(f" -
{data[0]['document']['insight']}")) unchanged.
🧹 Nitpick comments (8)
scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/examples/DATABASE_QUESTION_CAPABILITIES.md (2)
138-138: Minor grammar: hyphenate compound adjective."7.5 hour gap" should be "7.5-hour gap" when used as a compound adjective before a noun.
📝 Proposed fix
-| **Timeline Analysis** | "When did the duplication happen?", "Why is there a 7.5 hour gap between batches?", "Reconstruct the event timeline" | +| **Timeline Analysis** | "When did the duplication happen?", "Why is there a 7.5-hour gap between batches?", "Reconstruct the event timeline" |
287-287: Minor grammar: hyphenate compound adjective.Same hyphenation issue as line 138.
📝 Proposed fix
-- "Why is there a 7.5 hour gap between data loads?" → Manual intervention (lunch break → evening session) +- "Why is there a 7.5-hour gap between data loads?" → Manual intervention (lunch break → evening session)scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.sh (6)
86-88: Consider redirecting error logs to stderr.The Python counterpart
log_errorwrites tosys.stderr, but this Bash version writes to stdout. For consistency and proper stream handling, error messages should go to stderr.Suggested fix
log_error() { - echo -e "${RED}[ERROR]${NC} $1" + echo -e "${RED}[ERROR]${NC} $1" >&2 }
125-128: Consider validating TIMEOUT is a positive integer.The
TIMEOUTvalue is passed directly to thetimeoutcommand without validation. Non-numeric or negative values would cause a confusing error at runtime.Suggested validation
-t|--timeout) TIMEOUT="$2" + if ! [[ "$TIMEOUT" =~ ^[0-9]+$ ]] || [ "$TIMEOUT" -le 0 ]; then + log_error "Timeout must be a positive integer: $TIMEOUT" + exit 1 + fi shift 2 ;;
183-206: Consider usingjqfor JSON construction to handle special characters.Paths with special characters (quotes, backslashes) could produce malformed JSON. While unlikely for typical paths, using
jqwould be more robust.Alternative using jq (if available)
# Build JSON safely with jq if available if command -v jq &>/dev/null; then jq -n \ --arg bridge "$BRIDGE_PATH" \ --arg endpoint "$PROXYSQL_MCP_ENDPOINT" \ --arg token "${PROXYSQL_MCP_TOKEN:-}" \ --arg insecure "${PROXYSQL_MCP_INSECURE_SSL:-}" \ '{mcpServers: {proxysql: { command: "python3", args: [$bridge], env: ({PROXYSQL_MCP_ENDPOINT: $endpoint} + (if $token != "" then {PROXYSQL_MCP_TOKEN: $token} else {} end) + (if $insecure == "1" then {PROXYSQL_MCP_INSECURE_SSL: "1"} else {} end)) }}}' > "$MCP_CONFIG_FILE" else # Fall back to heredoc approach ... fi
325-325: Quote$CLAUDE_CMDto handle paths with spaces.If the Claude executable path contains spaces (e.g., set via
CLAUDE_PATH), the unquoted variable will cause word splitting issues.Suggested fix
-if timeout "${TIMEOUT}s" $CLAUDE_CMD --print --no-session-persistence --permission-mode bypassPermissions $MCP_ARGS <<< "$DISCOVERY_PROMPT" > "$OUTPUT_FILE" 2>&1; then +if timeout "${TIMEOUT}s" "$CLAUDE_CMD" --print --no-session-persistence --permission-mode bypassPermissions $MCP_ARGS <<< "$DISCOVERY_PROMPT" > "$OUTPUT_FILE" 2>&1; thenNote:
$MCP_ARGSis intentionally unquoted for word splitting into--mcp-configand the path argument.
359-363: Redundant cleanup - already handled by EXIT trap.The cleanup function registered with
trap cleanup EXIT(line 54) already handles this. This explicit cleanup is redundant, though the verbose logging adds visibility. Consider either removing this block or moving the verbose logging into the cleanup function.Option: Enhance cleanup function with verbose logging
cleanup() { if [ -n "$MCP_CONFIG_FILE" ] && [[ "$MCP_CONFIG_FILE" == /tmp/tmp.* ]]; then rm -f "$MCP_CONFIG_FILE" 2>/dev/null || true + [ "$VERBOSE" -eq 1 ] && echo -e "${BLUE}[VERBOSE]${NC} Cleaned up temp MCP config: $MCP_CONFIG_FILE" fi }Then remove lines 359-363.
143-148: Consider also checking executable permission.The script checks if the Claude binary exists with
-f, but doesn't verify it's executable. This could lead to a less clear error message if the file exists but isn't executable.Suggested enhancement
-if [ ! -f "$CLAUDE_CMD" ]; then +if [ ! -x "$CLAUDE_CMD" ]; then log_error "Claude Code not found at: $CLAUDE_CMD" - log_error "Set CLAUDE_PATH environment variable or ensure claude is in ~/.local/bin/" + log_error "Set CLAUDE_PATH environment variable or ensure claude is executable in ~/.local/bin/" exit 1 fi
| ``` | ||
| ┌─────────────────────────────────────────────────────────────────────┐ | ||
| │ Main Agent (Orchestrator) │ | ||
| │ - Launches 4 specialized subagents in parallel │ | ||
| │ - Coordinates via MCP catalog │ | ||
| │ - Synthesizes final report │ | ||
| └────────────────┬────────────────────────────────────────────────────┘ | ||
| │ | ||
| ┌────────────┼────────────┬────────────┬────────────┐ | ||
| │ │ │ │ │ | ||
| ▼ ▼ ▼ ▼ ▼ | ||
| ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ | ||
| │Struct. │ │Statist.│ │Semantic│ │Query │ │ MCP │ | ||
| │ Agent │ │ Agent │ │ Agent │ │ Agent │ │Catalog │ | ||
| └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ | ||
| │ │ │ │ │ | ||
| └────────────┴────────────┴────────────┴────────────┘ | ||
| │ | ||
| ▼ ▼ | ||
| ┌─────────┐ ┌─────────────┐ | ||
| │ Database│ │ Catalog │ | ||
| │ (testdb)│ │ (Shared Mem)│ | ||
| └─────────┘ └─────────────┘ | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the ASCII diagram block.
Use ```text for the architecture diagram to satisfy MD040 and improve formatting consistency.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@doc/multi_agent_database_discovery.md` around lines 9 - 32, The code block
containing the ASCII architecture diagram needs a language tag to satisfy MD040
and improve formatting; update the fenced code block that currently starts with
``` (the ASCII diagram block in multi_agent_database_discovery.md) to use
```text so the diagram is explicitly tagged as text and renders consistently.
| #### True Statistics (After Deduplication) | ||
| | Metric | Current | Actual | | ||
| |--------|---------|--------| | ||
| | Customers | 15 | 5 | | ||
| | Products | 15 | 5 | | ||
| | Orders | 15 | 5 | | ||
| | Order Items | 27 | 9 | | ||
| | Revenue | $10,886.67 | $3,628.85 | |
There was a problem hiding this comment.
Surround the table with blank lines.
markdownlint MD058 expects blank lines before/after tables. Please add them around this “True Statistics” table.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
154-154: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
In `@doc/multi_agent_database_discovery.md` around lines 153 - 160, Add a blank
line before the "#### True Statistics (After Deduplication)" heading and another
blank line after the table (after the "| Revenue | $10,886.67 | $3,628.85 |"
row) so the block is separated by empty lines to satisfy markdownlint MD058;
locate the heading and the table in the doc and insert the blank lines
accordingly.
| ### Key Metrics | ||
| | Metric | Value | Notes | | ||
| |--------|-------|-------| | ||
| | **Schema** | testdb | E-commerce domain | | ||
| | **Tables** | 4 base + 1 view | customers, orders, order_items, products | | ||
| | **Records** | 72 apparent / 24 unique | 3:1 duplication ratio | | ||
| | **Storage** | ~160KB | 67% wasted on duplicates | | ||
| | **Data Quality Score** | 25/100 | CRITICAL | | ||
| | **Production Readiness** | 5-30% | NOT READY | |
There was a problem hiding this comment.
Add blank lines around the Key Metrics table.
markdownlint flags MD058 when tables aren’t surrounded by blank lines. Please insert a blank line before and after this table for consistent rendering.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
16-16: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
In
`@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/examples/DATABASE_DISCOVERY_REPORT.md`
around lines 15 - 23, Insert a blank line immediately before the "### Key
Metrics" table and another blank line immediately after the table so the table
is separated from surrounding paragraphs; update the section around the "### Key
Metrics" heading and the pipe-delimited table (the rows starting with "| Metric
| Value | Notes |" through the last "| **Production Readiness** | 5-30% | NOT
READY |") to ensure there is an empty line above the table and an empty line
below it to satisfy markdownlint MD058.
| ``` | ||
| testdb | ||
| ├── customers (Dimension) | ||
| │ ├── id (PK, int) | ||
| │ ├── name (varchar) | ||
| │ ├── email (varchar, indexed) | ||
| │ └── created_at (timestamp) | ||
| │ | ||
| ├── products (Dimension) | ||
| │ ├── id (PK, int) | ||
| │ ├── name (varchar) | ||
| │ ├── category (varchar, indexed) | ||
| │ ├── price (decimal(10,2)) | ||
| │ ├── stock (int) | ||
| │ └── created_at (timestamp) | ||
| │ | ||
| ├── orders (Transaction/Fact) | ||
| │ ├── id (PK, int) | ||
| │ ├── customer_id (int, indexed → customers) | ||
| │ ├── order_date (date) | ||
| │ ├── total (decimal(10,2)) | ||
| │ ├── status (varchar, indexed) | ||
| │ └── created_at (timestamp) | ||
| │ | ||
| ├── order_items (Junction/Detail) | ||
| │ ├── id (PK, int) | ||
| │ ├── order_id (int, indexed → orders) | ||
| │ ├── product_id (int, indexed → products) | ||
| │ ├── quantity (int) | ||
| │ ├── price (decimal(10,2)) | ||
| │ └── created_at (timestamp) | ||
| │ | ||
| └── customer_orders (View) | ||
| └── Aggregation of customers + orders | ||
| ``` |
There was a problem hiding this comment.
Specify a language for fenced code blocks.
Several fenced blocks are missing a language tag; use text (or another appropriate language) to satisfy MD040 and improve readability.
Also applies to: 69-74, 378-383, 386-393, 396-403, 406-413, 455-458
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In
`@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/examples/DATABASE_DISCOVERY_REPORT.md`
around lines 31 - 65, The fenced code blocks containing the database ASCII
diagram (the block that begins with "testdb" and the other similar blocks
referenced in the comment) are missing language identifiers; update each
triple-backtick fence to include an appropriate language tag (e.g., use ```text)
so the blocks satisfy MD040 and improve readability—ensure you change every
fenced block noted (the "testdb" block and the other blocks referenced in the
comment) by adding the language tag to the opening fence.
|
|
||
| --- | ||
|
|
||
| *Report generated by multi-agent discovery system via MCP server on 2026-01-14* |
There was a problem hiding this comment.
Use a heading instead of emphasized text.
The final emphasized line is treated as a heading by markdownlint (MD036). Consider converting it to a proper heading level (e.g., ### Report generated ...) or plain text.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
484-484: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In
`@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/examples/DATABASE_DISCOVERY_REPORT.md`
at line 484, Replace the final emphasized line "*Report generated by multi-agent
discovery system via MCP server on 2026-01-14*" with a proper heading or plain
text to satisfy markdownlint MD036; for example change the emphasized asterisks
to a heading like "### Report generated by multi-agent discovery system via MCP
server on 2026-01-14" or remove the asterisks to make it plain text so the line
is no longer treated as an emphasized heading.
| - name: Install Claude Code | ||
| run: npm install -g @anthropics/claude-code |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Claude Code CLI npm package name installation
💡 Result:
The official npm package is @anthropic-ai/claude-code. Install globally (Node.js 18+):
npm install -g @anthropic-ai/claude-code
Verify:
claude --version
(Alternative installs: Homebrew or the one‑line install script.) [1][2]
🏁 Script executed:
# Search for references to npm packages in the repository related to Claude Code
rg -i "claude-code|@anthropics" --type md --type json --type yaml -A 2 -B 2Repository: ProxySQL/proxysql-vec
Length of output: 1954
Update npm package name to match official Claude Code package.
The installation command uses @anthropics/claude-code, but the official npm package is @anthropic-ai/claude-code (note: anthropic-ai, not anthropics). Update the commands to:
npm install -g `@anthropic-ai/claude-code`This occurs in the GitHub Actions example (lines 159-160) and the bash installation section.
🤖 Prompt for AI Agents
In `@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/HEADLESS_DISCOVERY_README.md`
around lines 159 - 160, Update the incorrect npm package name used for
installing Claude Code: replace occurrences of "@anthropics/claude-code" with
the official package "@anthropic-ai/claude-code" in the README
examples—specifically update the GitHub Actions snippet that shows "npm install
-g `@anthropics/claude-code`" and the bash installation section so both use "npm
install -g `@anthropic-ai/claude-code`".
|
|
||
| ## Related Documentation | ||
|
|
||
| - [Multi-Agent Database Discovery System](../doc/multi_agent_database_discovery.md) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for the referenced documentation file
fd -t f "multi_agent_database_discovery.md"
# Also check the directory structure of the mcp folder
echo "=== Directory structure of scripts/mcp/ ==="
fd -t d . scripts/mcp/ | head -20Repository: ProxySQL/proxysql-vec
Length of output: 363
Fix the broken relative path to the documentation file.
The link ../doc/multi_agent_database_discovery.md is incorrect. From scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/, this path resolves to scripts/mcp/doc/, which does not exist. The actual file is located at the repository root: doc/multi_agent_database_discovery.md.
Use the correct relative path: ../../../doc/multi_agent_database_discovery.md
🤖 Prompt for AI Agents
In `@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/HEADLESS_DISCOVERY_README.md`
at line 275, Update the broken markdown link on the line that references
"Multi-Agent Database Discovery System" so it points to the correct location of
the doc: change the href from ../doc/multi_agent_database_discovery.md to
../../../doc/multi_agent_database_discovery.md in HEADLESS_DISCOVERY_README.md
(the line containing the link text "Multi-Agent Database Discovery System") so
the link resolves to the repository root doc/multi_agent_database_discovery.md.
| - [Claude Code Documentation](https://docs.anthropic.com/claude-code) | ||
| - [MCP Specification](https://modelcontextprotocol.io/) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Claude Code documentation site https://docs.anthropic.com/claude-code
💡 Result:
The official Claude Code documentation is hosted by Anthropic at: docs.anthropic.com/en/docs/claude-code/overview — it contains quickstart, CLI commands, workflow guides, IDE setup, hosting options, security, and troubleshooting. [1][2]
Sources:
- Anthropic — Claude Code documentation (Overview). [docs.anthropic.com/en/docs/claude-code/overview]. [1]
- Anthropic — Claude Code product page. [anthropic.com/claude-code]. [2]
🌐 Web query:
Model Context Protocol specification https://modelcontextprotocol.io/
💡 Result:
The Model Context Protocol (MCP) is an open-source standard for connecting LLM-based applications to external data sources, tools, and workflows. It defines a JSON-RPC–based client/server protocol, a TypeScript-first schema (also available as JSON Schema) as the source of truth, lifecycle/capability negotiation, resources (data, prompts, tools) exposure, and transport/authorization guidance. MCP aims to make integrations reusable across models and hosts (inspired by the Language Server Protocol). See the official docs and spec for full details and versioned revisions. [1][2][3]
Key links:
- Intro / docs: modelcontextprotocol.io/docs (overview and getting started). [1]
- Formal specification (versioned, JSON‑RPC + TypeScript schema). [2]
- Official GitHub (spec, schema, docs repo). [3]
Update external documentation URLs to correct paths.
The documentation links need correction:
- Line 276: Change
https://docs.anthropic.com/claude-codetohttps://docs.anthropic.com/en/docs/claude-code/overview - Line 277: Consider updating
https://modelcontextprotocol.io/tohttps://modelcontextprotocol.io/docsfor better user guidance
🤖 Prompt for AI Agents
In `@scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/HEADLESS_DISCOVERY_README.md`
around lines 276 - 277, Update the two external links in
HEADLESS_DISCOVERY_README.md: replace the Claude Code URL
`https://docs.anthropic.com/claude-code` with
`https://docs.anthropic.com/en/docs/claude-code/overview` and change the MCP URL
`https://modelcontextprotocol.io/` to `https://modelcontextprotocol.io/docs`;
modify the two list entries shown in the diff so they point to the new corrected
URLs.
| # Test 13: Special characters in document (JSON parsing bug test) | ||
| local payload13 | ||
| payload13='{ | ||
| "jsonrpc": "2.0", | ||
| "method": "tools/call", | ||
| "params": { | ||
| "name": "catalog_upsert", | ||
| "arguments": { | ||
| "kind": "test", | ||
| "key": "special_chars", | ||
| "document": "{\"description\": \"Test with \\\"quotes\\\" and \\\\backslashes\\\\\"}", | ||
| "tags": "test,special", | ||
| "links": "" | ||
| } | ||
| }, | ||
| "id": 13 | ||
| }' | ||
|
|
||
| if test_catalog "CAT013" "Upsert special characters" "${payload13}" '"success"[[:space:]]*:[[:space:]]*true'; then | ||
| passed=$((passed + 1)) | ||
| else | ||
| failed=$((failed + 1)) | ||
| fi | ||
|
|
||
| # Test 14: Verify special characters can be read back | ||
| local payload14 | ||
| payload14='{ | ||
| "jsonrpc": "2.0", | ||
| "method": "tools/call", | ||
| "params": { | ||
| "name": "catalog_get", | ||
| "arguments": { | ||
| "kind": "test", | ||
| "key": "special_chars" | ||
| } | ||
| }, | ||
| "id": 14 | ||
| }' | ||
|
|
||
| if test_catalog "CAT014" "Get special chars entry" "${payload14}" 'quotes'; then | ||
| passed=$((passed + 1)) | ||
| else | ||
| failed=$((failed + 1)) | ||
| fi | ||
|
|
||
| # Test 15: Cleanup special chars entry | ||
| local payload15 | ||
| payload15='{ | ||
| "jsonrpc": "2.0", | ||
| "method": "tools/call", | ||
| "params": { | ||
| "name": "catalog_delete", | ||
| "arguments": { | ||
| "kind": "test", | ||
| "key": "special_chars" | ||
| } | ||
| }, | ||
| "id": 15 | ||
| }' | ||
|
|
||
| if test_catalog "CAT015" "Cleanup special chars" "${payload15}" '"success"[[:space:]]*:[[:space:]]*true'; then | ||
| passed=$((passed + 1)) | ||
| else | ||
| failed=$((failed + 1)) | ||
| fi | ||
|
|
There was a problem hiding this comment.
Update help text to include CAT013–CAT015.
The new special‑character tests (Line 302-367) are great, but the --help list still stops at CAT012. Please add CAT013–CAT015 there for clarity.
🤖 Prompt for AI Agents
In `@scripts/mcp/test_catalog.sh` around lines 302 - 367, Update the script's
help/usage text that currently lists tests up to CAT012 to include CAT013,
CAT014, and CAT015; add concise entries matching the new tests' purposes
(CAT013: "Upsert special characters", CAT014: "Get special chars entry", CAT015:
"Cleanup special chars") in the same help/usage block or function (where the
existing CAT001–CAT012 lines are defined) so the --help output accurately
reflects the new tests.
| print(f"\nStructure:") | ||
| print(f" - Columns: {', '.join(structure[0]['document']['columns'])}") | ||
| print(f" - Primary Key: {structure[0]['document']['primary_key']}") | ||
| print(f"\nData Insights:") | ||
| print(f" - {data[0]['document']['actual_row_count']} rows") | ||
| print(f" - {data[0]['document']['insight']}") | ||
| print(f"\nBusiness Understanding:") | ||
| print(f" → This is MySQL's own user management table.") | ||
| print(f" → Contains {data[0]['document']['actual_row_count']} system accounts.") | ||
| print(f" → Not application user data - this is database admin accounts.") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd simple_discovery.pyRepository: ProxySQL/proxysql-vec
Length of output: 84
🏁 Script executed:
sed -n '130,145p' simple_discovery.pyRepository: ProxySQL/proxysql-vec
Length of output: 795
🏁 Script executed:
rg 'print\(f"' simple_discovery.py | head -20Repository: ProxySQL/proxysql-vec
Length of output: 879
🌐 Web query:
Ruff F541 rule f-string without interpolations
💡 Result:
F541 = "f-string missing placeholders" — Ruff flags f-strings that contain no { } expressions. It’s derived from Pyflakes and meant to catch either accidental/fake f-strings or confusing code; the fix is to use a normal string or actually interpolate. [1][2]
Quick examples/fixes
- Problem: f"Hello, world!" → Fix: "Hello, world!" (remove the leading f) or add a placeholder: f"Hello, {name}!" . [1][3]
How to silence/disable
- Per-line/file: add a noqa comment, e.g. end the line with # noqa: F541 or add # ruff: noqa in the file to ignore all. [2]
- Project-wide: add F541 to the ignore list in pyproject.toml / ruff.toml under [tool.ruff.lint] ignore = ["F541"]. [2][4]
References
- Ruff rule F541 (f-string-missing-placeholders). [1]
- Ruff error-suppression / configuration docs (noqa, ignore, ruff.toml/pyproject.toml). [2][4]
- Flake8 / Pyflakes discussion / users’ fixes (same behaviour: remove f or add placeholders). [3]
Remove unused f-string prefixes.
These strings contain no interpolations, so the f prefix is unnecessary (Ruff F541):
print(f"\nStructure:")print(f"\nData Insights:")print(f"\nBusiness Understanding:")print(f" → This is MySQL's own user management table.")print(f" → Not application user data - this is database admin accounts.")
♻️ Suggested change
- print(f"\nStructure:")
+ print("\nStructure:")
- print(f"\nData Insights:")
+ print("\nData Insights:")
- print(f"\nBusiness Understanding:")
+ print("\nBusiness Understanding:")
- print(f" → This is MySQL's own user management table.")
+ print(" → This is MySQL's own user management table.")
- print(f" → Not application user data - this is database admin accounts.")
+ print(" → Not application user data - this is database admin accounts.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"\nStructure:") | |
| print(f" - Columns: {', '.join(structure[0]['document']['columns'])}") | |
| print(f" - Primary Key: {structure[0]['document']['primary_key']}") | |
| print(f"\nData Insights:") | |
| print(f" - {data[0]['document']['actual_row_count']} rows") | |
| print(f" - {data[0]['document']['insight']}") | |
| print(f"\nBusiness Understanding:") | |
| print(f" → This is MySQL's own user management table.") | |
| print(f" → Contains {data[0]['document']['actual_row_count']} system accounts.") | |
| print(f" → Not application user data - this is database admin accounts.") | |
| print("\nStructure:") | |
| print(f" - Columns: {', '.join(structure[0]['document']['columns'])}") | |
| print(f" - Primary Key: {structure[0]['document']['primary_key']}") | |
| print("\nData Insights:") | |
| print(f" - {data[0]['document']['actual_row_count']} rows") | |
| print(f" - {data[0]['document']['insight']}") | |
| print("\nBusiness Understanding:") | |
| print(" → This is MySQL's own user management table.") | |
| print(f" → Contains {data[0]['document']['actual_row_count']} system accounts.") | |
| print(" → Not application user data - this is database admin accounts.") |
🧰 Tools
🪛 Ruff (0.14.13)
134-134: f-string without any placeholders
Remove extraneous f prefix
(F541)
137-137: f-string without any placeholders
Remove extraneous f prefix
(F541)
140-140: f-string without any placeholders
Remove extraneous f prefix
(F541)
141-141: f-string without any placeholders
Remove extraneous f prefix
(F541)
143-143: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@simple_discovery.py` around lines 134 - 143, Remove the unnecessary f-string
prefixes on print statements that don't interpolate variables: change
print(f"\nStructure:"), print(f"\nData Insights:"), print(f"\nBusiness
Understanding:"), print(f" → This is MySQL's own user management table."), and
print(f" → Not application user data - this is database admin accounts.") to
plain strings (e.g., print("\nStructure:")) while keeping the other prints that
reference structure[0] and data[0] (like print(f" - Columns: {',
'.join(structure[0]['document']['columns'])}") and print(f" -
{data[0]['document']['insight']}")) unchanged.
…omaly detection tests - Fix retry logic to use is_retryable_error function for proper HTTP error handling - Add exception handling to get_json_int function with try-catch around std::stoi - Improve validate_numeric_range to use strtol instead of atoi for better error reporting - Fix Chinese characters in documentation (non-zero -> non-zero) - Replace placeholder tests with actual comprehensive tests for anomaly detection functionality - Create new standalone unit test anomaly_detector_unit-t.cpp with 29 tests covering: * SQL injection pattern detection (12 tests) * Query normalization (8 tests) * Risk scoring calculations (5 tests) * Configuration validation (4 tests) - All tests pass successfully, providing meaningful validation of core anomaly detection logic Thanks to gemini-code-assist for the thorough code review and recommendations.
- Add edge case tests for SQL injection pattern detection: * Empty and NULL query handling * Very long query processing - Add edge case tests for query normalization: * Empty query handling * Unicode character support * Nested comment processing * Multiple line comment handling - Add edge case tests for risk scoring: * Negative match handling * Large match count capping * Boundary condition validation - Fix test assertions for more robust validation - All 39 tests now pass successfully These improvements address potential feedback from code reviewers by providing more comprehensive test coverage for edge cases and boundary conditions.
This commit fixes a serious design flaw where AI configuration variables were not integrated with the ProxySQL admin interface. All ai_* variables have been migrated to the GenAI module as genai-* variables. Changes: - Added 21 new genai_* variables to GenAI_Thread.h structure - Implemented get/set functions for all new variables in GenAI_Thread.cpp - Removed internal variables struct from AI_Features_Manager - AI_Features_Manager now reads from GloGATH instead of internal state - Updated documentation to reference genai-* variables - Fixed debug.cpp assertion for PROXY_DEBUG_NL2SQL and PROXY_DEBUG_ANOMALY Variable mapping: - ai_nl2sql_enabled → genai-nl2sql_enabled - ai_anomaly_detection_enabled → genai-anomaly_enabled - ai_features_enabled → genai-enabled - All other ai_* variables follow the same pattern The flush functions automatically handle all variables in the genai_thread_variables_names array, so database persistence works correctly without additional changes. Related to: #13
This is a critical architectural fix - NL2SQL was making blocking calls
to LLMs which would block the entire MySQL thread. Now NL2SQL uses the
same async socketpair pattern as the GENAI embed/rerank operations.
Changes:
- Added nl2sql operation type to process_json_query() in GenAI module
- Updated NL2SQL handler to construct JSON query and use async GENAI path
- Added extern declaration for GloAI in GenAI_Thread.cpp
- Falls back to synchronous path only on systems without epoll
Architecture:
- Before: NL2SQL: query → blocking nl2sql->convert() → blocks MySQL thread
- After: NL2SQL: query → JSON GENAI request → async socketpair → non-blocking
JSON protocol for NL2SQL:
GENAI: {"type": "nl2sql", "query": "Show customers", "schema": "mydb"}
The NL2SQL result is delivered asynchronously through the existing
GENAI response handler, making the system fully non-blocking.
Related to: #13
…cture Updated documentation to reflect the architectural changes: - Changed all ai_* variables to genai-* variables - Changed LOAD MYSQL VARIABLES to LOAD GENAI VARIABLES - Clarified that NL2SQL queries run on MySQL module, not Admin - Updated ARCHITECTURE.md to document async/non-blocking architecture - Added LOAD GENAI VARIABLES TO RUNTIME after variable changes - Changed stats_ai_nl2sql_cache to SHOW STATUS LIKE 'genai-nl2sql%' Key corrections in README.md: - Variables now use genai-* prefix (genai-nl2sql_enabled, etc.) - Commands use LOAD GENAI VARIABLES instead of LOAD MYSQL VARIABLES - Added note that NL2SQL queries are on MySQL module, not Admin interface Key corrections in ARCHITECTURE.md: - Updated system flow diagram to show async socketpair path - Added detailed async flow explanation - Documented that MySQL threads are NOT blocked during LLM calls Related to: #13
Added missing #include for GenAI_Thread.h in AI_Features_Manager.cpp to resolve the compilation error in debug mode. Also fixed the remaining reference to variables.ai_features_enabled which should now use GloGATH->variables.genai_enabled. This fixes the "make debug" build failure.
The flush_genai_variables___database_to_runtime() function was only setting internal state in GloGATH but not populating the runtime_global_variables table. This caused variables to appear in global_variables but not in runtime_global_variables after startup. Fix: Add call to flush_genai_variables___runtime_to_database() with runtime=true to populate the runtime table, following the same pattern used by MCP variables.
Update flush_genai_variables___database_to_runtime() to match the MCP pattern exactly: - Add 'lock' parameter (default true) for flexibility - Use ProxySQL_Admin's wrlock()/wrunlock() instead of GloGATH's - Use consistent variable naming (var_name = name + 6 for 'genai-' prefix) - Follow exact same locking pattern as MCP variables This fixes the issue where runtime_global_variables table was not being populated on startup because the locking pattern was incorrect.
…to_runtime The flush_genai_variables___database_to_runtime() function was using hardcoded 'admindb' instead of the 'db' parameter passed to the function. This caused the function to always query from admindb, ignoring the actual database parameter. This fixes the issue where runtime_global_variables table was not being populated on startup because the query was always hitting the same database regardless of the parameter.
Add flush_genai_variables___runtime_to_database() call to the central location where all modules populate runtime_global_variables table. This was missing, causing genai-* variables to not appear in runtime_global_variables.
- Add has_variable() method to GenAI_Threads_Handler for variable validation - Add genai- prefix check in is_valid_global_variable() - Auto-initialize NL2SQL converter when genai-nl2sql_enabled is set to true at runtime - Make init_nl2sql() public to allow runtime initialization - Mask API keys in logs (show only first 2 chars, rest as 'x')
- Rename NL2SQL_Converter to LLM_Bridge for generic prompt processing - Update MySQL protocol handler from /* NL2SQL: */ to /* LLM: */ - Remove SQL-specific fields (sql_query, confidence, tables_used) - Add GENAI_OP_LLM operation type to GenAI module - Rename all genai_nl2sql_* variables to genai_llm_* - Update AI_Features_Manager to use LLM_Bridge - Deprecate ai_nl2sql_convert MCP tool with error message - LLM bridge now handles any prompt type via MySQL protocol This enables generic LLM access (summarization, code generation, translation, analysis) while preserving infrastructure for future NL2SQL implementation via Web UI + external agents.
- Rename doc/NL2SQL/ to doc/LLM_Bridge/ - Update README.md for generic LLM bridge functionality - Replace NL2SQL terminology with LLM Bridge - Update variable references from ai_nl2sql_* to genai_llm_* - Add migration guide from NL2SQL to LLM Bridge - Document new use cases (summarization, code generation, etc.)
Comprehensive AI Features Implementation: NL2SQL, Anomaly Detection, and Vector Storage
Summary
This PR adds an MCP stdio bridge that enables Claude Code to communicate with ProxySQL's MCP endpoint over stdio, along with comprehensive database discovery tools.
Key Features
MCP stdio Bridge (
scripts/mcp/proxysql_mcp_stdio_bridge.py)tools/listandtools/call/tmp/proxysql_mcp_bridge.logfor debuggingDatabase Discovery Tools (via
lib/MCP_GHMYSQL.cpp)list_schemas- List all databases/schemaslist_tables- List tables in a schemadescribe_table- Get detailed table schema (columns, types, keys, indexes)get_constraints- Get foreign keys and constraintssample_rows- Sample data from a tablesample_distinct- Sample distinct values from a columnrun_sql_readonly- Execute read-only SQL queriesexplain_sql- Get query execution planstable_profile- Table statistics (row count, size, distribution)column_profile- Column statistics (distinct values, null count)suggest_joins- Suggest table joins based on column analysisfind_reference_candidates- Find tables that might be referenced by a foreign keyCatalog Tools - LLM external memory for storing and retrieving information
MCP Compliance
contentarray andisErrorbooleanTest Plan
tools/listreturns available toolstools/callworks for database discovery toolscontentwrapping