Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes by renecannao · Pull Request #25 · ProxySQL/proxysql-vec

renecannao · 2026-01-21T14:16:33Z

Summary

This PR addresses all recommendations from 3 AI code reviewers (CodeRabbit, Gemini Code Assist, Copilot) for PR #21 which added Full-Text Search (FTS) functionality. It also includes additional cherry-picked fixes unrelated to PR #21.

PR #21 Review Fixes

Security Fixes (Critical)

SQL Injection - Table Name Escaping: Added escape_identifier() helper that doubles backticks for proper SQLite identifier escaping. Applied to all DDL statements (CREATE TABLE, CREATE VIRTUAL TABLE, triggers).
SQL Injection - MATCH Clause FTS5 Operator: Wrapped FTS5 query in double quotes and escaped embedded quotes to prevent FTS5 operator injection (OR, AND, NOT, NEAR, *, "", ()).
sanitize_name() Allowlist Validation: Replaced blocklist with allowlist (only ASCII letters, digits, underscore) with max length enforcement.

Memory Safety Fixes (High)

VLA Replacement: Replaced VLA with std::vector in MySQL_FTS::init(), added delete db; on error path to prevent leak.
Error String Leak: Added (*proxy_sqlite3_free)(error); before return in list_indexes().
Potential Exception: Changed reindex_json["error"] to reindex_json.value("error", std::string("unknown error")).

Thread Safety Fixes (High)

reinit_fts() Missing Lock: Added mutex lock around pointer swap to prevent race condition.
reset_fts_path() Blocking I/O: Moved blocking fts->init() call outside lock, only swap pointer under lock.

Code Cleanup

Removed 7 debug fprintf statements from Query_Tool_Handler.cpp
Removed unused #include <memory> from MySQL_FTS.h

Test Script Security Fixes

MySQL Password Exposure: Changed from -p"${MYSQL_PASSWORD}" to MYSQL_PWD="${MYSQL_PASSWORD}" mysql to prevent password in process list
SQL Injection in INSERT: Added escape_sql() shell function and applied to INSERT statement
CURL_OPTS Quoting: Fixed to curl ${CURL_OPTS:+"${CURL_OPTS}"}
Unused Variables: Removed FTS_INDEX_NAME and SEARCH_QUERIES

Documentation Fixes

Fixed bare URL to markdown link format
Added code block language identifiers (text, bash)

Files Modified

include/MySQL_FTS.h
lib/MySQL_FTS.cpp
lib/MySQL_Tool_Handler.cpp
lib/Query_Tool_Handler.cpp
doc/MCP/FTS_USER_GUIDE.md
scripts/mcp/test_mcp_fts.sh
scripts/mcp/test_mcp_fts_detailed.sh

Additional Fixes

This PR also includes additional cherry-picked fixes unrelated to PR #21.

Verification

Build: cd lib && make clean && make ✅
Ready for testing with: bash scripts/mcp/test_mcp_fts.sh

Closes #21

Implement BM25-ranked full-text search capability for MySQL/MariaDB tables using SQLite-based external FTS index. Changes: - Add MySQL_FTS class for managing SQLite FTS indexes - Add FTS tools: fts_index_table, fts_search, fts_reindex, fts_delete_index, fts_list_indexes, fts_rebuild_all - Add thread-safe FTS lifecycle management with fts_lock mutex - Add reset_fts_path() for runtime FTS database path configuration - Add comprehensive FTS test scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh)

Add full-text search (FTS) tools to MCP query server

- Add special build rule in Makefile for anomaly_detection-t that includes: - $(OBJ) for global variables (GloVars, GloGATH) - -Wl,--allow-multiple-definition to allow test's main() to override ProxySQL's - ClickHouse client libraries (libclickhouse-cpp-lib.a, libzstdstatic.a, liblz4.a) - SQLite rembed library (libsqlite_rembed.a) - -lscram -lusual for PostgreSQL SCRAM support - Add stub function SQLite3_Server_session_handler required by SQLite3_Server.cpp Resolves compilation errors for anomaly_detection-t test.

- Add #include <string> for C++ std::string support - Add #include <cmath> for sqrt() function - Change format %lld to %ld for chrono duration types (long int, not long long) Resolves compilation errors for vector_db_performance-t test.

- Add #include <string> for C++ std::string support Resolves compilation errors for ai_llm_retry_scenarios-t test.

…leanup Security fixes: - Add escape_identifier() helper for proper SQLite identifier escaping - Replace sanitize_name() with allowlist validation (ASCII letters, digits, underscore only) - Fix MATCH clause FTS5 operator injection by wrapping query in double quotes - Apply escape_identifier() to all DDL statements (CREATE, DROP, triggers) Memory safety fixes: - Replace VLA with std::vector in MySQL_FTS::init(), add delete on error path - Fix memory leak: free error string before return in list_indexes() - Fix reindex_json["error"] potential exception using .value() with default Thread safety fixes: - reinit_fts(): Add mutex lock around pointer swap - reset_fts_path(): Move blocking init() outside lock, only swap pointer under lock Code cleanup: - Remove 7 debug fprintf statements from Query_Tool_Handler.cpp - Remove unused #include <memory> from MySQL_FTS.h Test script security fixes: - Use MYSQL_PWD environment variable instead of -p"..." for password - Add escape_sql() function and apply to INSERT statement - Fix CURL_OPTS quoting: ${CURL_OPTS:+"${CURL_OPTS}"} - Remove unused FTS_INDEX_NAME and SEARCH_QUERIES variables Documentation fixes: - Fix bare URL to markdown link format - Add code block language identifiers (text, bash)

gemini-code-assist · 2026-01-21T14:17:15Z

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the recently introduced Full-Text Search (FTS) functionality by addressing critical feedback from AI code reviewers. The primary focus is on bolstering the security, memory safety, and thread safety of the FTS implementation, ensuring a more robust and reliable system. Additionally, it integrates the FTS capabilities more deeply into the existing toolset, provides extensive user documentation, and enhances test coverage, making the feature more stable and usable.

Highlights

Security Enhancements: Implemented robust SQL injection prevention for table names and FTS5 MATCH clauses, and switched to an allowlist for sanitize_name() to prevent unexpected characters.
Memory Safety Improvements: Addressed potential memory leaks by replacing VLAs with std::vector, ensuring proper delete calls on error paths, and handling JSON exceptions gracefully.
Thread Safety Corrections: Introduced mutex locking for FTS reinitialization to prevent race conditions and refactored blocking I/O operations to occur outside critical locks.
Full-Text Search (FTS) Integration: Introduced a new MySQL_FTS class and integrated six FTS-related tools (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) into the MCP query handler.
Comprehensive FTS Documentation: Added a detailed FTS_USER_GUIDE.md covering overview, architecture, configuration, tool reference, usage examples, API endpoints, best practices, and troubleshooting.
Enhanced FTS Test Coverage: Added two new shell scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh) to thoroughly validate the FTS functionality, including runtime path changes and stress testing.
General Code Quality & Cleanup: Removed debug fprintf statements, unnecessary includes, and improved test script security by preventing MySQL password exposure and adding SQL escaping.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-21T14:17:51Z

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Added Full Text Search (FTS) system for MySQL data discovery with SQLite-backed indexing.
- New FTS operations: index creation, searching with BM25 ranking, index listing, deletion, reindexing, and rebuild functionality.
- Added FTS configuration support via runtime variables for dynamic path management.
Documentation
- Added comprehensive FTS user guide covering setup, configuration, usage examples, API endpoints, best practices, and troubleshooting.
Tests
- Added automated test suites for FTS validation and integration testing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

This PR introduces a complete Full-Text Search (FTS) system for MySQL tables using SQLite FTS5 backend. It adds a new MySQL_FTS class for index management, six new MCP tools (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all), thread-safe integration with MySQL_Tool_Handler, runtime-configurable FTS database path, and extensive test coverage with documentation.

Changes

Cohort / File(s)	Summary
FTS Core Implementation `include/MySQL_FTS.h`, `lib/MySQL_FTS.cpp`	New MySQL_FTS class implementing SQLite FTS5-backed indexing and search. Includes schema initialization, index table creation, data ingestion from MySQL, search with BM25 ranking, and index lifecycle management (create, list, delete, reindex, rebuild). Implements SQL escaping, identifier sanitization, and error handling with JSON result formatting.
MySQL_Tool_Handler Integration `include/MySQL_Tool_Handler.h`, `lib/MySQL_Tool_Handler.cpp`	Extended constructor with new fts_path parameter. Added FTS member (fts) and mutex (fts_lock) for thread-safe access. Introduced public FTS API methods (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) and reinit_fts() for runtime FTS path reconfiguration. Moved execute_query from private to public.
MCP Configuration `include/MCP_Thread.h`, `lib/MCP_Thread.cpp`	Added mcp_fts_path runtime-configurable variable. Constructor allocates storage; destructor frees it. Extended get_variable and set_variable to handle "fts_path" with runtime application via reset_fts_path() on the MySQL_Tool_Handler.
Query Tool Dispatch `lib/Query_Tool_Handler.cpp`	Added six FTS tool definitions (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) with parameter specifications and descriptions. Extended execute_tool to dispatch FTS operations with parameter extraction and delegation to mysql_handler methods. Removed several debug logging statements.
MCP Server Wiring `lib/ProxySQL_MCP_Server.cpp`	Updated MySQL_Tool_Handler constructor call to pass mcp_fts_path as an additional parameter after mcp_catalog_path.
Build System `lib/Makefile`	Added MySQL_FTS.oo to the _OBJ_CXX list for compilation into libproxysql.
Documentation `doc/MCP/FTS_USER_GUIDE.md`	New comprehensive user guide covering FTS overview, architecture, data flow, admin configuration, quick-start usage, tool reference, API endpoints, best practices, troubleshooting, test scripts, FTS5 syntax appendix, and version history.
FTS Test Suites `scripts/mcp/test_mcp_fts.sh`, `scripts/mcp/test_mcp_fts_detailed.sh`	Two comprehensive Bash test scripts validating FTS functionality. Basic suite includes 21+ tests covering index creation, listing, searching (simple/phrase/cross-table/ranking/pagination), deletion, reindexing, and rebuild. Detailed suite tests runtime configuration changes, tool availability, snippet highlighting, and empty query handling.
Test Infrastructure `test/tap/tests/Makefile`, `test/tap/tests/anomaly_detection-t.cpp`, `test/tap/tests/ai_llm_retry_scenarios-t.cpp`, `test/tap/tests/vector_db_performance-t.cpp`	Added anomaly_detection-t build target. Added forward declarations and stub functions (SQLite3_Server_session_handler) in anomaly_detection-t.cpp. Added include in ai_llm_retry_scenarios-t.cpp. Fixed formatting specifiers from %lld to %ld and added include in vector_db_performance-t.cpp.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant MySQL_Tool_Handler
    participant MySQL_FTS
    participant SQLite DB
    participant MySQL

    Client->>MySQL_Tool_Handler: fts_index_table(schema, table, columns, pk, where)
    MySQL_Tool_Handler->>MySQL_FTS: index_table(schema, table, columns, pk, where)
    MySQL_FTS->>SQLite DB: check index exists
    alt Index exists
        SQLite DB-->>MySQL_FTS: true
    else Create new index
        MySQL_FTS->>SQLite DB: create data_table
        MySQL_FTS->>SQLite DB: create fts_table (FTS5)
        MySQL_FTS->>SQLite DB: create triggers for sync
    end
    MySQL_FTS->>MySQL: execute SELECT query
    MySQL-->>MySQL_FTS: result set
    MySQL_FTS->>SQLite DB: batch insert into data_table
    MySQL_FTS->>SQLite DB: update fts_indexes metadata
    SQLite DB-->>MySQL_FTS: success
    MySQL_FTS-->>MySQL_Tool_Handler: JSON result
    MySQL_Tool_Handler-->>Client: FTS index created

sequenceDiagram
    participant Client
    participant MySQL_Tool_Handler
    participant MySQL_FTS
    participant SQLite DB

    Client->>MySQL_Tool_Handler: fts_search(query, schema, table, limit, offset)
    MySQL_Tool_Handler->>MySQL_FTS: search(query, schema, table, limit, offset)
    MySQL_FTS->>SQLite DB: retrieve index metadata
    SQLite DB-->>MySQL_FTS: matching indexes
    loop For each matching index
        MySQL_FTS->>SQLite DB: FTS5 search with BM25 ranking
        SQLite DB-->>MySQL_FTS: ranked results with snippets
    end
    MySQL_FTS->>MySQL_FTS: aggregate & paginate results
    MySQL_FTS-->>MySQL_Tool_Handler: JSON results array
    MySQL_Tool_Handler-->>Client: search results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add full-text search (FTS) tools to MCP query server #21 — Implements identical FTS functionality with the same MySQL_FTS class, MySQL_Tool_Handler extensions, MCP_Threads_Handler variable, Query_Tool_Handler dispatch, and test scripts
Use proxy_sqlite3_* wrappers and canonical proxy symbol TU #22 — Introduces proxy_sqlite3 symbol wrappers and SQLite canonical translation unit that directly support how the new MySQL_FTS code links and invokes SQLite APIs
feat: Add MCP query rules and digest statistics [WIP] #19 — Modifies Query_Tool_Handler::execute_tool, which this PR also extends with FTS tool dispatch logic

Poem

🐰 Whiskers twitching with delight,
SQLite indexes shining bright!
FTS searching hops so fast,
BM25 rankings unsurpassed! ✨
MySQL tables indexed true,
MCP tools now see you through!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: addressing security, memory safety, and thread safety issues from PR `#21` review plus additional fixes.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing security fixes, memory safety improvements, thread safety corrections, and documentation updates.
Linked Issues check	✅ Passed	All PR `#21` objectives are met: six FTS tools implemented, thread-safe operations with mutexes, configurable FTS path with reset support, and comprehensive test scripts. The PR also implements recommended security, memory, and thread-safety fixes from code reviewers.
Out of Scope Changes check	✅ Passed	Most changes are in-scope fixes to PR `#21` implementation. Minor out-of-scope changes include unrelated test file fixes (ai_llm_retry_scenarios-t.cpp, anomaly_detection-t.cpp, vector_db_performance-t.cpp, Makefile) and Makefile modifications, but these are explicitly mentioned as cherry-picked additional fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a Full Text Search (FTS) module to ProxySQL, leveraging SQLite's FTS5 extension for fast, indexed searching of MySQL data. The architecture includes MCP query endpoint integration, a Query Tool Handler, and a MySQL Tool Handler wrapping the MySQL_FTS class. Configuration is managed via ProxySQL's admin interface, with variables for the FTS database path and MySQL backend connection details. The FTS Tools Reference details six tools for indexing, searching, and managing FTS indexes. The code review highlights several key aspects of the changes, including robust input sanitization and identifier escaping to prevent SQL injection vulnerabilities, thread-safe access to the FTS database, runtime configuration changes for the FTS database path, and the inclusion of comprehensive test cases. Additionally, there were format specifier corrections for printing long int values and a note about a future date in the version history.

gemini-code-assist · 2026-01-21T14:30:09Z

lib/MySQL_Tool_Handler.cpp

+	pthread_mutex_lock(&fts_lock);
+	if (!fts) {
+		json result;
+		result["success"] = false;
+		result["error"] = "FTS not initialized";
+		pthread_mutex_unlock(&fts_lock);
+		return result.dump();
+	}
+
+	std::string out = fts->index_table(schema, table, columns, primary_key, where_clause, this);
+	pthread_mutex_unlock(&fts_lock);
+	return out;
+}
+
+std::string MySQL_Tool_Handler::fts_search(
+	const std::string& query,
+	const std::string& schema,
+	const std::string& table,
+	int limit,
+	int offset
+) {
+	pthread_mutex_lock(&fts_lock);
+	if (!fts) {
+		json result;
+		result["success"] = false;
+		result["error"] = "FTS not initialized";
+		pthread_mutex_unlock(&fts_lock);
+		return result.dump();
+	}
+
+	std::string out = fts->search(query, schema, table, limit, offset);
+	pthread_mutex_unlock(&fts_lock);
+	return out;
+}
+
+std::string MySQL_Tool_Handler::fts_list_indexes() {
+	pthread_mutex_lock(&fts_lock);
+	if (!fts) {
+		json result;
+		result["success"] = false;
+		result["error"] = "FTS not initialized";
+		pthread_mutex_unlock(&fts_lock);
+		return result.dump();
+	}
+
+	std::string out = fts->list_indexes();
+	pthread_mutex_unlock(&fts_lock);
+	return out;
+}
+
+std::string MySQL_Tool_Handler::fts_delete_index(const std::string& schema, const std::string& table) {
+	pthread_mutex_lock(&fts_lock);
+	if (!fts) {
+		json result;
+		result["success"] = false;
+		result["error"] = "FTS not initialized";
+		pthread_mutex_unlock(&fts_lock);
+		return result.dump();
+	}
+
+	std::string out = fts->delete_index(schema, table);
+	pthread_mutex_unlock(&fts_lock);
+	return out;
+}
+
+std::string MySQL_Tool_Handler::fts_reindex(const std::string& schema, const std::string& table) {
+	pthread_mutex_lock(&fts_lock);
+	if (!fts) {
+		json result;
+		result["success"] = false;
+		result["error"] = "FTS not initialized";
+		pthread_mutex_unlock(&fts_lock);
+		return result.dump();
+	}
+
+	std::string out = fts->reindex(schema, table, this);
+	pthread_mutex_unlock(&fts_lock);
+	return out;
+}
+
+std::string MySQL_Tool_Handler::fts_rebuild_all() {
+	pthread_mutex_lock(&fts_lock);
+	if (!fts) {
+		json result;
+		result["success"] = false;
+		result["error"] = "FTS not initialized";
+		pthread_mutex_unlock(&fts_lock);
+		return result.dump();
+	}
+
+	std::string out = fts->rebuild_all(this);
+	pthread_mutex_unlock(&fts_lock);
+	return out;
+}


All FTS tool wrapper functions (fts_index_table, fts_search, etc.) correctly acquire and release fts_lock before and after calling the underlying fts object methods. This ensures thread-safe access to the MySQL_FTS instance and its SQLite database.

gemini-code-assist · 2026-01-21T14:30:09Z

lib/MySQL_FTS.cpp

+std::string MySQL_FTS::sanitize_name(const std::string& name) {
+	const size_t MAX_NAME_LEN = 100;
+	std::string sanitized;
+	// Allowlist: only ASCII letters, digits, underscore
+	for (char c : name) {
+		if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
+		    (c >= '0' && c <= '9') || c == '_') {
+			sanitized.push_back(c);
+		}
+	}
+	// Prevent leading digit (SQLite identifiers can't start with digit)
+	if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
+		sanitized.insert(sanitized.begin(), '_');
+	}
+	// Enforce maximum length
+	if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
+	return sanitized;


The sanitize_name function implements a robust allowlist approach, preventing leading digits and enforcing a maximum length. This is a strong security measure against potential SQL injection or unexpected behavior when constructing dynamic table names in SQLite.

gemini-code-assist · 2026-01-21T14:30:09Z

lib/MySQL_FTS.cpp

+std::string MySQL_FTS::escape_identifier(const std::string& identifier) {
+	std::string escaped;
+	escaped.reserve(identifier.length() * 2 + 2);
+	escaped.push_back('`');
+	for (char c : identifier) {
+		escaped.push_back(c);
+		if (c == '`') escaped.push_back('`');  // Double backticks
+	}
+	escaped.push_back('`');
+	return escaped;


The escape_identifier function correctly doubles backticks for SQLite identifier escaping. This is crucial for preventing SQL injection when dynamic table or column names are used in DDL statements.

gemini-code-assist · 2026-01-21T14:30:09Z

lib/MySQL_FTS.cpp

+int MySQL_FTS::create_index_tables(const std::string& schema, const std::string& table) {
+	std::string data_table = get_data_table_name(schema, table);
+	std::string fts_table = get_fts_table_name(schema, table);
+	std::string escaped_data = escape_identifier(data_table);
+	std::string escaped_fts = escape_identifier(fts_table);
+
+	// Create data table
+	std::ostringstream create_data_sql;
+	create_data_sql << "CREATE TABLE IF NOT EXISTS " << escaped_data << " ("
+		"  rowid INTEGER PRIMARY KEY AUTOINCREMENT,"
+		"  schema_name TEXT NOT NULL,"
+		"  table_name TEXT NOT NULL,"
+		"  primary_key_value TEXT NOT NULL,"
+		"  content TEXT NOT NULL,"
+		"  metadata TEXT"
+		");";
+
+	if (!db->execute(create_data_sql.str().c_str())) {
+		proxy_error("Failed to create data table %s\n", data_table.c_str());
+		return -1;
+	}
+
+	// Create FTS5 virtual table with external content
+	std::ostringstream create_fts_sql;
+	create_fts_sql << "CREATE VIRTUAL TABLE IF NOT EXISTS " << escaped_fts << " USING fts5("
+		"  content, metadata,"
+		"  content=" << escaped_data << ","
+		"  content_rowid='rowid',"
+		"  tokenize='porter unicode61'"
+		");";
+
+	if (!db->execute(create_fts_sql.str().c_str())) {
+		proxy_error("Failed to create FTS table %s\n", fts_table.c_str());
+		return -1;
+	}
+
+	// Create triggers for automatic sync (populate the FTS table)
+	std::string base_name = sanitize_name(schema) + "_" + sanitize_name(table);
+	std::string escaped_base = escape_identifier(base_name);
+
+	// Drop existing triggers if any
+	db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_ai_" + base_name)).c_str());
+	db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_ad_" + base_name)).c_str());
+	db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_au_" + base_name)).c_str());
+
+	// AFTER INSERT trigger
+	std::ostringstream ai_sql;
+	ai_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_ai_" + base_name)
+		<< " AFTER INSERT ON " << escaped_data << " BEGIN"
+		<< "  INSERT INTO " << escaped_fts << "(rowid, content, metadata)"
+		<< "  VALUES (new.rowid, new.content, new.metadata);"
+		<< "END;";
+	db->execute(ai_sql.str().c_str());
+
+	// AFTER DELETE trigger
+	std::ostringstream ad_sql;
+	ad_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_ad_" + base_name)
+		<< " AFTER DELETE ON " << escaped_data << " BEGIN"
+		<< "  INSERT INTO " << escaped_fts << "(" << escaped_fts << ", rowid, content, metadata)"
+		<< "  VALUES ('delete', old.rowid, old.content, old.metadata);"
+		<< "END;";
+	db->execute(ad_sql.str().c_str());
+
+	// AFTER UPDATE trigger
+	std::ostringstream au_sql;
+	au_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_au_" + base_name)
+		<< " AFTER UPDATE ON " << escaped_data << " BEGIN"
+		<< "  INSERT INTO " << escaped_fts << "(" << escaped_fts << ", rowid, content, metadata)"
+		<< "  VALUES ('delete', old.rowid, old.content, old.metadata);"
+		<< "  INSERT INTO " << escaped_fts << "(rowid, content, metadata)"
+		<< "  VALUES (new.rowid, new.content, new.metadata);"
+		<< "END;";
+	db->execute(au_sql.str().c_str());
+
+	return 0;
+}


The create_index_tables function correctly uses escape_identifier for table names in CREATE TABLE and CREATE VIRTUAL TABLE statements, as well as for trigger names. This prevents SQL injection vulnerabilities in DDL operations.

gemini-code-assist · 2026-01-21T14:30:09Z

lib/MySQL_FTS.cpp

+		std::string fts_literal = "\"";
+		for (char c : query) {
+			fts_literal.push_back(c);
+			if (c == '"') fts_literal.push_back('"');  // Double quotes
+		}
+		fts_literal.push_back('"');


The FTS5 query is correctly wrapped in double quotes and embedded double quotes are escaped by doubling them. This is an effective measure to prevent FTS5 operator injection, as described in the PR summary.

gemini-code-assist · 2026-01-21T14:30:09Z

lib/MySQL_Tool_Handler.cpp

+int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) {
+	proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str());
+
+	// Check if directory exists (SQLite can't create directories)
+	std::string::size_type last_slash = fts_path.find_last_of("/");
+	if (last_slash != std::string::npos && last_slash > 0) {
+		std::string dir = fts_path.substr(0, last_slash);
+		struct stat st;
+		if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) {
+			proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n",
+				fts_path.c_str(), dir.c_str());
+			return -1;
+		}
+	}
+
+	// First, test if we can open the new database (outside lock)
+	MySQL_FTS* new_fts = new MySQL_FTS(fts_path);
+	if (!new_fts) {
+		proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n");
+		return -1;
+	}
+
+	if (new_fts->init() != 0) {
+		proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str());
+		delete new_fts;
+		return -1;  // Return error WITHOUT closing old FTS
+	}
+
+	// Success! Now swap the pointer under lock
+	pthread_mutex_lock(&fts_lock);
+	MySQL_FTS* old_fts = fts;
+	fts = new_fts;
+	pthread_mutex_unlock(&fts_lock);
+	if (old_fts) delete old_fts;
+
+	proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str());
+	return 0;
+}


The reinit_fts function includes a check for directory existence using stat() before attempting to create the new FTS database. This is a good practice to prevent errors if the specified path's parent directory does not exist or is not writable.

gemini-code-assist · 2026-01-21T14:30:09Z

doc/MCP/FTS_USER_GUIDE.md

+
+| Version | Date | Changes |
+|---------|------|---------|
+| 0.1.0 | 2025-01 | Initial implementation |


The version history lists a date in the future (2025-01). While this might be a placeholder, it's generally better to use a current or past date for initial documentation releases, or indicate it as a planned future release date explicitly.

gemini-code-assist · 2026-01-21T14:30:10Z

lib/MySQL_FTS.cpp

+
+	if (error) {
+		result["error"] = "Failed to list indexes: " + std::string(error);
+		(*proxy_sqlite3_free)(error);


The proxy_sqlite3_free(error) call ensures that the error string allocated by SQLite is properly freed, preventing a memory leak in the list_indexes function's error path.

gemini-code-assist · 2026-01-21T14:30:10Z

test/tap/tests/vector_db_performance-t.cpp


 	ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size);
-	diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count());
+	diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());


The format specifier for insert_duration.count() was changed from lld to ld. This is a correctness fix for printing long or long int values, ensuring proper output on different systems.

gemini-code-assist · 2026-01-21T14:30:10Z

test/tap/tests/vector_db_performance-t.cpp


 	diag("Average time per concurrent operation: %lld microseconds", avg_time);
-	diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count());
+	diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());


The format specifier for total_duration.count() was changed from lld to ld. This is a correctness fix for printing long or long int values, ensuring proper output on different systems.

coderabbitai

Actionable comments posted: 13

🤖 Fix all issues with AI agents

In `@lib/MCP_Thread.cpp`:
- Around line 350-364: The code assigns variables.mcp_fts_path = strdup(value)
before calling mysql_tool_handler->reset_fts_path, so on reset failure the
stored path is left inconsistent; change the logic in the "fts_path" branch to
preserve the previous value: save the old pointer (old =
variables.mcp_fts_path), create a new copy (new_val = strdup(value)) but do not
overwrite variables.mcp_fts_path until reset succeeds; call
mysql_tool_handler->reset_fts_path(value) and on success assign
variables.mcp_fts_path = new_val (free old if non-null), on failure free new_val
and leave variables.mcp_fts_path as old, log the error and return -1; ensure no
leaks or double-frees when old is null.

In `@lib/MySQL_FTS.cpp`:
- Around line 483-493: The early-return path that handles (!indexes_result ||
indexes_result->rows.empty()) leaks memory when indexes_result is non-null;
modify that branch so that if indexes_result is non-null you free/delete/cleanup
it (e.g., delete indexes_result or call its provided free method) before
returning, and keep db->rdunlock() in the correct order (free indexes_result if
non-null, then rdunlock(), then set result fields and return). Ensure you
reference and handle the indexes_result variable from the execute_statement
call.
- Around line 329-334: The MySQL identifier usage in the selected_cols loop and
the final FROM clause is quoting names with backticks but not escaping embedded
backticks; add an escape helper (e.g., escape_mysql_identifier) mirroring
escape_identifier() behavior that duplicates any internal '`' and wraps the
result in surrounding backticks, then replace usages building mysql_query for
selected_cols, schema and table to call
escape_mysql_identifier(selected_cols[i]), escape_mysql_identifier(schema), and
escape_mysql_identifier(table) so all identifiers are safely escaped.
- Around line 92-109: sanitize_name can return an empty string when input has
only invalid characters; update MySQL_FTS::sanitize_name to guarantee a
non-empty, collision-resistant identifier by adding a deterministic fallback:
after sanitization and truncation (use MAX_NAME_LEN), if result is empty or
would collide, prepend or replace with a safe prefix (e.g., "_") and append a
short unique suffix derived from the original name (use std::hash<std::string>
of the original input truncated to a few hex chars) so symbols like
sanitize_name and MAX_NAME_LEN are used but the function always returns a valid,
bounded-length, and effectively unique name.
- Around line 326-338: The code in MySQL_FTS.cpp concatenates the user-provided
where_clause directly into the SQL built in the fts_index_table flow (see
mysql_query construction using selected_cols, schema, table and where_clause),
creating an SQL injection risk; fix it by changing the query construction to use
parameterized/prepared statements from your MySQL client API (build the SELECT
with column/backtick quoting as now, but replace dynamic predicates in
where_clause with named or positional placeholders and bind the corresponding
values), or if placeholders cannot represent arbitrary predicates, enforce a
strict whitelist/validation of allowed columns/operators and parse/escape values
before concatenation; locate the code that forms mysql_query and refactor to use
the client’s prepare/bind/execute API or add validation logic for where_clause
and document the expected safe format for callers.

In `@lib/MySQL_Tool_Handler.cpp`:
- Around line 1047-1058: The FTS methods (fts_index_table, fts_search,
fts_reindex, fts_rebuild_all, fts_delete_index) currently hold
pthread_mutex_lock(&fts_lock) across long-running I/O; refactor them to minimize
lock scope by copying the fts pointer under the lock (e.g., lock fts_lock, if
(!fts) return error, else MySQL_FTS* local_fts = fts; unlock fts_lock) and then
call local_fts->... (index_table/search/reindex/...) outside the mutex, or
alternatively add a clear comment above these functions that the mutex
intentionally serializes all FTS operations for correctness; ensure you
reference the same error behavior (returning JSON error when fts is null) and
preserve thread-safety during the pointer swap like reset_fts_path().

In `@scripts/mcp/test_mcp_fts_detailed.sh`:
- Around line 71-74: The mysql_exec function currently passes the password on
the command line (-p"${MYSQL_PASSWORD}") which exposes it; update mysql_exec to
stop using -p and instead invoke mysql with MYSQL_PWD set in the environment for
that invocation (e.g., set MYSQL_PWD="${MYSQL_PASSWORD}" only for the mysql
command) so the password does not appear in the process list; locate the
mysql_exec function and replace the argument-based password with an
environment-based MYSQL_PWD usage and ensure the variable is not exported beyond
the single invocation.
- Around line 251-260: The cleanup block guarded by CLEANUP calls
fts_delete_index twice and attempts to delete fts_test.orders again even though
it was already removed earlier; update the cleanup to avoid failing under set
-euo pipefail by either removing the redundant call that invokes tool_call
"fts_delete_index" for '{"schema":"fts_test","table":"orders"}' or make that
specific delete call tolerant of non-existent indexes (e.g., append a fallback
so the extract_tool_result/echo/jq pipeline does not cause a non-zero exit on
"not found" errors); ensure you modify the code that calls tool_call
"fts_delete_index" and the subsequent extract_tool_result/check logic to
implement the chosen approach.

In `@scripts/mcp/test_mcp_fts.sh`:
- Around line 205-240: The current fallback JSON parsing in extract_json_field
and check_json_bool is broken for jq-style dot-notation and complex paths (e.g.,
".success", ".results | length") and silently fails; update the script to
enforce jq presence at startup (fail fast with a clear error) OR rewrite the
fallback to canonicalize the field by stripping leading dots and support simple
dot-path traversal and array length checks used by extract_inner_result;
specifically modify the startup check to call command -v jq and exit with an
informative message if missing, and update extract_json_field/check_json_bool to
delegate to jq-only or to parse dot paths by splitting on '.' and traversing
nested objects/arrays so patterns like .success and .results|length return
correct values.
- Around line 509-514: The MCP tool responses are wrapped; replace uses of
extract_json_field with extract_inner_field when pulling numeric fields from the
wrapped fts_tool_call response (e.g., change total_matches, results | length,
row_count, rebuilt_count extractions). Locate occurrences where
extract_json_field is used alongside variables like total_matches, result_count,
row_count, rebuilt_count (and where check_json_bool already uses
extract_inner_result) and swap to extract_inner_field so the code unwraps the
MCP JSON-RPC result before evaluating numeric comparisons; keep the same
variable names and call sites (e.g., where response is parsed after
fts_tool_call and in routines that follow check_json_bool).

In `@test/tap/tests/Makefile`:
- Around line 171-173: The Makefile target anomaly_detection-t currently passes
the misspelled preprocessor define -DEXCLUDE_TRACKING_VARAIABLES; update that
flag to the correctly spelled -DEXCLUDE_TRACKING_VARIABLES in the
anomaly_detection-t rule so the code that checks for EXCLUDE_TRACKING_VARIABLES
is enabled as intended.

In `@test/tap/tests/vector_db_performance-t.cpp`:
- Around line 380-381: The second diagnostic uses the wrong/ inconsistent printf
format for total_duration.count(); change the format to match the 64-bit type
(use %lld) and ensure you cast total_duration.count() to (long long) in the diag
call so both diag("Average time per concurrent operation: %lld microseconds",
avg_time) and diag("Total time for %d operations: %lld microseconds",
num_operations, (long long)total_duration.count()) use the same 64-bit specifier
and types; locate these calls (diag with avg_time and diag with
total_duration.count()) and update the format/specifier and cast accordingly.
- Line 325: The diag call uses the wrong printf format specifier for
insert_duration.count(); update the format to safely match the underlying type
by either using "%lld" and casting insert_duration.count() to (long long) or by
casting to long and using "%ld" consistently; change the diag invocation that
prints "Time to insert %zu entries: %ld ms" to use a matching specifier and cast
(e.g., "%zu entries: %lld ms", (long long)insert_duration.count()) so the format
matches insert_duration.count() and aligns with the other timing output.

🧹 Nitpick comments (8)

doc/MCP/FTS_USER_GUIDE.md (1)
1-1: Consider hyphenating "Full-Text Search" for grammatical correctness.

"Full-Text Search" is a compound adjective modifying "User Guide" and should be hyphenated for consistency with standard technical writing conventions. This applies to both the title and the Overview section (line 19).
Suggested fix
-# MCP Full Text Search (FTS) - User Guide
+# MCP Full-Text Search (FTS) - User Guide
And similarly on line 19:
-The MCP Full Text Search (FTS) module provides fast, indexed search capabilities
+The MCP Full-Text Search (FTS) module provides fast, indexed search capabilities
scripts/mcp/test_mcp_fts.sh (2)
147-150: Add curl timeouts to avoid hanging test runs.

If the MCP endpoint is down, curl can block indefinitely. Add --connect-timeout/--max-time (optionally configurable via env).
⏱️ Suggested timeout defaults
-    response=$(curl -s -w "\n%{http_code}" -X POST "${MCP_ENDPOINT}" \
+    response=$(curl -s --connect-timeout 5 --max-time 30 -w "\n%{http_code}" -X POST "${MCP_ENDPOINT}" \
         -H "Content-Type: application/json" \
         -d "${payload}" 2>/dev/null)
352-355: Drop unused delete_response.

Shellcheck flags delete_response as unused. Either log failures or remove the variable.
🧹 Remove unused variable
-    local delete_response
-    delete_response=$(fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}")
+    fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}" >/dev/null
lib/Query_Tool_Handler.cpp (1)
220-227: Tool schema marks columns as string, but runtime expects an array.

create_tool_schema types required params as strings, yet fts_index_table expects a JSON array (tests send an array). This mismatch can confuse schema-driven clients. Consider a custom schema for this tool where columns is an array of strings.
♻️ Example custom schema for fts_index_table
-    tools.push_back(create_tool_schema(
-        "fts_index_table",
-        "Create and populate a full-text search index for a MySQL table",
-        {"schema", "table", "columns", "primary_key"},
-        {{"where_clause", "string"}}
-    ));
+    {
+        json schema = {
+            {"type", "object"},
+            {"properties", {
+                {"schema", {{"type", "string"}, {"description", "schema parameter"}}},
+                {"table", {{"type", "string"}, {"description", "table parameter"}}},
+                {"columns", {{"type", "array"}, {"items", {{"type", "string"}}}, {"description", "columns to index"}}},
+                {"primary_key", {{"type", "string"}, {"description", "primary_key parameter"}}},
+                {"where_clause", {{"type", "string"}, {"description", "where_clause parameter"}}}
+            }},
+            {"required", {"schema", "table", "columns", "primary_key"}}
+        };
+        tools.push_back(create_tool_description(
+            "fts_index_table",
+            "Create and populate a full-text search index for a MySQL table",
+            schema
+        ));
+    }
include/MySQL_FTS.h (1)
90-96: Delete copy and move operations to prevent accidental resource duplication.

The class owns a raw SQLite3DB* pointer that is deleted in the destructor. Since copy/move operations are not explicitly deleted, they're implicitly defined, which means shallow copies could lead to double-free or use-after-free if the class is ever used by value or moved. Although the current codebase only uses pointer instances, it's defensive to explicitly prevent copying/moving.
🔒 Disable copy/move
  MySQL_FTS(const std::string& path);

+ MySQL_FTS(const MySQL_FTS&) = delete;
+ MySQL_FTS& operator=(const MySQL_FTS&) = delete;
+ MySQL_FTS(MySQL_FTS&&) = delete;
+ MySQL_FTS& operator=(MySQL_FTS&&) = delete;

  /**
   * `@brief` Destructor
   */
  ~MySQL_FTS();
lib/MySQL_Tool_Handler.cpp (2)
117-138: Redundant implementations of FTS path reset logic.

Both reset_fts_path() (lines 117-138) and reinit_fts() (lines 1142-1179) implement nearly identical functionality for swapping the FTS instance. The key differences are:

reinit_fts() additionally checks if the directory exists

reinit_fts() returns int while reset_fts_path() returns bool

Consider consolidating into a single implementation to avoid code duplication and potential divergence in behavior.
♻️ Suggested consolidation
 bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
-	MySQL_FTS* new_fts = NULL;
-
-	// Initialize new FTS outside lock (blocking I/O)
-	if (!path.empty()) {
-		new_fts = new MySQL_FTS(path);
-		if (new_fts->init()) {
-			proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
-			delete new_fts;
-			return false;
-		}
-	}
-
-	// Swap pointer under lock (non-blocking)
-	pthread_mutex_lock(&fts_lock);
-	MySQL_FTS* old_fts = fts;
-	fts = new_fts;
-	pthread_mutex_unlock(&fts_lock);
-	if (old_fts) delete old_fts;
-
-	return true;
+	return reinit_fts(path) == 0;
 }
Then update reinit_fts() to handle empty path case if needed, or keep reset_fts_path() as the primary implementation and have reinit_fts() call it after the directory check.
Also applies to: 1142-1179

303-329: Remove debug fprintf statements before merging.

Multiple debug fprintf(stderr, ...) statements remain throughout the file (e.g., lines 303, 309, 317, 319, 325, 329, 343-356, 401, 413, 419, 493-532, 537-566). The PR objectives mention "Code cleanup: removed seven debug fprintfs" but many still remain.

These should be removed or converted to proper proxy_debug() calls for production code.

Also applies to: 343-356, 401-419, 493-532, 537-566
lib/MySQL_FTS.cpp (1)

359-430: Database lock held during entire indexing operation.

The wrlock() acquired at line 359 is held until line 430, encompassing potentially thousands of row insertions. This could block read operations (search, list_indexes) for extended periods.

Consider committing in batches and releasing/reacquiring the lock between batches, or use a transaction with checkpoints.

coderabbitai · 2026-01-21T14:31:00Z

lib/MCP_Thread.cpp

+	if (!strcmp(name, "fts_path")) {
+		if (variables.mcp_fts_path)
+			free(variables.mcp_fts_path);
+		variables.mcp_fts_path = strdup(value);
+		// Apply at runtime by resetting FTS in the existing handler
+		if (mysql_tool_handler) {
+			proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
+			if (!mysql_tool_handler->reset_fts_path(value)) {
+				proxy_error("Failed to reset FTS path at runtime\n");
+				return -1;
+			}
+		}
+
+		return 0;
+	}


⚠️ Potential issue | 🟠 Major

Rollback mcp_fts_path on reset failure to keep config consistent.

set_variable("fts_path") updates the stored path before calling reset_fts_path. If reset fails, the function returns -1 but leaves the new path saved, so config no longer reflects the active FTS DB. Preserve the previous value and restore on failure.

🔧 Preserve previous value on failure

- if (variables.mcp_fts_path) - free(variables.mcp_fts_path); - variables.mcp_fts_path = strdup(value); - // Apply at runtime by resetting FTS in the existing handler - if (mysql_tool_handler) { - proxy_info("MCP: Applying new fts_path at runtime: %s\n", value); - if (!mysql_tool_handler->reset_fts_path(value)) { - proxy_error("Failed to reset FTS path at runtime\n"); - return -1; - } - } - - return 0; + char* old = variables.mcp_fts_path; + variables.mcp_fts_path = strdup(value); + // Apply at runtime by resetting FTS in the existing handler + if (mysql_tool_handler) { + proxy_info("MCP: Applying new fts_path at runtime: %s\n", value); + if (!mysql_tool_handler->reset_fts_path(value)) { + proxy_error("Failed to reset FTS path at runtime\n"); + free(variables.mcp_fts_path); + variables.mcp_fts_path = old; + return -1; + } + } + if (old) free(old); + return 0;

🤖 Prompt for AI Agents

In `@lib/MCP_Thread.cpp` around lines 350 - 364, The code assigns variables.mcp_fts_path = strdup(value) before calling mysql_tool_handler->reset_fts_path, so on reset failure the stored path is left inconsistent; change the logic in the "fts_path" branch to preserve the previous value: save the old pointer (old = variables.mcp_fts_path), create a new copy (new_val = strdup(value)) but do not overwrite variables.mcp_fts_path until reset succeeds; call mysql_tool_handler->reset_fts_path(value) and on success assign variables.mcp_fts_path = new_val (free old if non-null), on failure free new_val and leave variables.mcp_fts_path as old, log the error and return -1; ensure no leaks or double-frees when old is null.

coderabbitai · 2026-01-21T14:31:00Z

lib/MySQL_FTS.cpp

+std::string MySQL_FTS::sanitize_name(const std::string& name) {
+	const size_t MAX_NAME_LEN = 100;
+	std::string sanitized;
+	// Allowlist: only ASCII letters, digits, underscore
+	for (char c : name) {
+		if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
+		    (c >= '0' && c <= '9') || c == '_') {
+			sanitized.push_back(c);
+		}
+	}
+	// Prevent leading digit (SQLite identifiers can't start with digit)
+	if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
+		sanitized.insert(sanitized.begin(), '_');
+	}
+	// Enforce maximum length
+	if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
+	return sanitized;
+}


⚠️ Potential issue | 🟡 Minor

sanitize_name() may return empty string causing issues.

If the input contains only invalid characters, sanitize_name() returns an empty string. This could lead to invalid table names like fts_data__ or trigger name collisions.

🔧 Suggested fix

std::string MySQL_FTS::sanitize_name(const std::string& name) { const size_t MAX_NAME_LEN = 100; std::string sanitized; // Allowlist: only ASCII letters, digits, underscore for (char c : name) { if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_') { sanitized.push_back(c); } } + // Return fallback if empty + if (sanitized.empty()) { + sanitized = "_unnamed"; + } // Prevent leading digit (SQLite identifiers can't start with digit) - if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') { + if (sanitized[0] >= '0' && sanitized[0] <= '9') { sanitized.insert(sanitized.begin(), '_'); } // Enforce maximum length if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN); return sanitized; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::string MySQL_FTS::sanitize_name(const std::string& name) {

const size_t MAX_NAME_LEN = 100;

std::string sanitized;

// Allowlist: only ASCII letters, digits, underscore

for (char c : name) {

if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||

(c >= '0' && c <= '9') || c == '_') {

sanitized.push_back(c);

}

}

// Prevent leading digit (SQLite identifiers can't start with digit)

if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {

sanitized.insert(sanitized.begin(), '_');

}

// Enforce maximum length

if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);

return sanitized;

}

std::string MySQL_FTS::sanitize_name(const std::string& name) {

const size_t MAX_NAME_LEN = 100;

std::string sanitized;

// Allowlist: only ASCII letters, digits, underscore

for (char c : name) {

if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||

(c >= '0' && c <= '9') || c == '_') {

sanitized.push_back(c);

}

}

// Return fallback if empty

if (sanitized.empty()) {

sanitized = "_unnamed";

}

// Prevent leading digit (SQLite identifiers can't start with digit)

if (sanitized[0] >= '0' && sanitized[0] <= '9') {

sanitized.insert(sanitized.begin(), '_');

}

// Enforce maximum length

if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);

return sanitized;

}

🤖 Prompt for AI Agents

In `@lib/MySQL_FTS.cpp` around lines 92 - 109, sanitize_name can return an empty string when input has only invalid characters; update MySQL_FTS::sanitize_name to guarantee a non-empty, collision-resistant identifier by adding a deterministic fallback: after sanitization and truncation (use MAX_NAME_LEN), if result is empty or would collide, prepend or replace with a safe prefix (e.g., "_") and append a short unique suffix derived from the original name (use std::hash<std::string> of the original input truncated to a few hex chars) so symbols like sanitize_name and MAX_NAME_LEN are used but the function always returns a valid, bounded-length, and effectively unique name.

coderabbitai · 2026-01-21T14:31:00Z

lib/MySQL_FTS.cpp

+	// Build MySQL query to fetch data
+	std::ostringstream mysql_query;
+	mysql_query << "SELECT ";
+	for (size_t i = 0; i < selected_cols.size(); i++) {
+		if (i > 0) mysql_query << ", ";
+		mysql_query << "`" << selected_cols[i] << "`";
+	}
+
+	mysql_query << " FROM `" << schema << "`.`" << table << "`";
+
+	if (!where_clause.empty()) {
+		mysql_query << " WHERE " << where_clause;
+	}


⚠️ Potential issue | 🟠 Major

SQL injection risk: User-provided where_clause directly interpolated into query.

The where_clause parameter is directly concatenated into the MySQL query without validation or escaping (line 337). While the PR objectives mention security hardening, this vector remains open.

An attacker could inject arbitrary SQL through the where_clause parameter passed to fts_index_table.

🔒 Suggested mitigation

Consider one of these approaches:

Validate that where_clause contains only safe SQL predicates

Use parameterized queries if the MySQL client library supports it

At minimum, document the security implications and ensure this is only exposed to trusted callers

+ // WARNING: where_clause is passed directly to MySQL - ensure caller validates input if (!where_clause.empty()) { + // Basic sanity check - reject obviously dangerous patterns + std::string upper_where = where_clause; + std::transform(upper_where.begin(), upper_where.end(), upper_where.begin(), ::toupper); + if (upper_where.find("INTO OUTFILE") != std::string::npos || + upper_where.find("LOAD_FILE") != std::string::npos) { + result["error"] = "Dangerous pattern in where_clause"; + return result.dump(); + } mysql_query << " WHERE " << where_clause; }

🤖 Prompt for AI Agents

In `@lib/MySQL_FTS.cpp` around lines 326 - 338, The code in MySQL_FTS.cpp concatenates the user-provided where_clause directly into the SQL built in the fts_index_table flow (see mysql_query construction using selected_cols, schema, table and where_clause), creating an SQL injection risk; fix it by changing the query construction to use parameterized/prepared statements from your MySQL client API (build the SELECT with column/backtick quoting as now, but replace dynamic predicates in where_clause with named or positional placeholders and bind the corresponding values), or if placeholders cannot represent arbitrary predicates, enforce a strict whitelist/validation of allowed columns/operators and parse/escape values before concatenation; locate the code that forms mysql_query and refactor to use the client’s prepare/bind/execute API or add validation logic for where_clause and document the expected safe format for callers.

coderabbitai · 2026-01-21T14:31:00Z

lib/MySQL_FTS.cpp

+	for (size_t i = 0; i < selected_cols.size(); i++) {
+		if (i > 0) mysql_query << ", ";
+		mysql_query << "`" << selected_cols[i] << "`";
+	}
+
+	mysql_query << " FROM `" << schema << "`.`" << table << "`";


⚠️ Potential issue | 🟡 Minor

Identifiers in MySQL query not properly escaped.

The column names from selected_cols and the schema/table names are wrapped in backticks but not escaped for embedded backticks. If a column name contains a backtick, this could cause SQL syntax errors or injection.

The code has escape_identifier() for SQLite but doesn't apply equivalent escaping for MySQL queries.

🔧 Suggested fix

Add a MySQL identifier escape helper or reuse the pattern from escape_identifier():

// Helper for MySQL identifiers (double backticks) std::string escape_mysql_identifier(const std::string& id) { std::string escaped = "`"; for (char c : id) { escaped.push_back(c); if (c == '`') escaped.push_back('`'); } escaped.push_back('`'); return escaped; }

Then use it:

for (size_t i = 0; i < selected_cols.size(); i++) { if (i > 0) mysql_query << ", "; - mysql_query << "`" << selected_cols[i] << "`"; + mysql_query << escape_mysql_identifier(selected_cols[i]); } - mysql_query << " FROM `" << schema << "`.`" << table << "`"; + mysql_query << " FROM " << escape_mysql_identifier(schema) << "." << escape_mysql_identifier(table);

🤖 Prompt for AI Agents

In `@lib/MySQL_FTS.cpp` around lines 329 - 334, The MySQL identifier usage in the selected_cols loop and the final FROM clause is quoting names with backticks but not escaping embedded backticks; add an escape helper (e.g., escape_mysql_identifier) mirroring escape_identifier() behavior that duplicates any internal '`' and wraps the result in surrounding backticks, then replace usages building mysql_query for selected_cols, schema and table to call escape_mysql_identifier(selected_cols[i]), escape_mysql_identifier(schema), and escape_mysql_identifier(table) so all identifiers are safely escaped.

coderabbitai · 2026-01-21T14:31:00Z

lib/MySQL_FTS.cpp

+	db->rdlock();
+	indexes_result = db->execute_statement(indexes_sql.str().c_str(), &error, &cols, &affected);
+
+	if (!indexes_result || indexes_result->rows.empty()) {
+		db->rdunlock();
+		result["success"] = true;
+		result["query"] = query;
+		result["total_matches"] = 0;
+		result["results"] = json::array();
+		return result.dump();
+	}


⚠️ Potential issue | 🟡 Minor

Memory leak: indexes_result not freed before early return.

If indexes_result is non-null but has empty rows, the code returns early (line 492) without freeing indexes_result.

🐛 Proposed fix

if (!indexes_result || indexes_result->rows.empty()) { db->rdunlock(); + if (indexes_result) delete indexes_result; result["success"] = true; result["query"] = query; result["total_matches"] = 0; result["results"] = json::array(); return result.dump(); }

🤖 Prompt for AI Agents

In `@lib/MySQL_FTS.cpp` around lines 483 - 493, The early-return path that handles (!indexes_result || indexes_result->rows.empty()) leaks memory when indexes_result is non-null; modify that branch so that if indexes_result is non-null you free/delete/cleanup it (e.g., delete indexes_result or call its provided free method) before returning, and keep db->rdunlock() in the correct order (free indexes_result if non-null, then rdunlock(), then set result fields and return). Ensure you reference and handle the indexes_result variable from the execute_statement call.

coderabbitai · 2026-01-21T14:31:00Z

scripts/mcp/test_mcp_fts.sh

+# Extract JSON field value
+extract_json_field() {
+    local response="$1"
+    local field="$2"
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "${response}" | jq -r "${field}" 2>/dev/null || echo ""
+    else
+        # Fallback to grep/sed for basic JSON parsing
+        echo "${response}" | grep -o "\"${field}\"[[:space:]]*:[[:space:]]*\"[^\"]*\"" | sed 's/.*: "\(.*\)"/\1/' || echo ""
+    fi
+}
+
+# Check JSON boolean field
+check_json_bool() {
+    local response="$1"
+    local field="$2"
+    local expected="$3"
+
+    # Extract inner result from double-nested structure
+    local inner_result
+    inner_result=$(extract_inner_result "${response}")
+
+    if command -v jq >/dev/null 2>&1; then
+        local actual
+        actual=$(echo "${inner_result}" | jq -r "${field}" 2>/dev/null)
+        [ "${actual}" = "${expected}" ]
+    else
+        # Fallback: check for true/false string
+        if [ "${expected}" = "true" ]; then
+            echo "${inner_result}" | grep -q "\"${field}\"[[:space:]]*:[[:space:]]*true"
+        else
+            echo "${inner_result}" | grep -q "\"${field}\"[[:space:]]*:[[:space:]]*false"
+        fi
+    fi
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the target lines and surrounding context cat -n scripts/mcp/test_mcp_fts.sh | sed -n '200,250p'

Repository: ProxySQL/proxysql-vec

Length of output: 186

🏁 Script executed:

# Also check the second location mentioned cat -n scripts/mcp/test_mcp_fts.sh | sed -n '1280,1300p'

Repository: ProxySQL/proxysql-vec

Length of output: 608

🏁 Script executed:

# Search for all calls to extract_json_field to see what field parameters are passed rg "extract_json_field" scripts/mcp/test_mcp_fts.sh -A 2 -B 2

Repository: ProxySQL/proxysql-vec

Length of output: 2143

🏁 Script executed:

# Search for all calls to check_json_bool to see what field parameters are passed rg "check_json_bool" scripts/mcp/test_mcp_fts.sh -A 2 -B 2

Repository: ProxySQL/proxysql-vec

Length of output: 3831

🏁 Script executed:

# Verify the actual implementation matches the snippet - read exact lines 205-240 cat -n scripts/mcp/test_mcp_fts.sh | sed -n '205,240p'

Repository: ProxySQL/proxysql-vec

Length of output: 1463

🏁 Script executed:

# Check if there's an extract_inner_result function to understand the full context rg "extract_inner_result" scripts/mcp/test_mcp_fts.sh -A 5

Repository: ProxySQL/proxysql-vec

Length of output: 694

jq is de facto required but not enforced; the grep/sed fallback is non-functional and will silently fail tests.

The fallback JSON parsing in extract_json_field and check_json_bool cannot work with jq dot-notation fields (e.g., .success, .total_matches, .results | length). Line 214 and lines 235/237 search grep for literal "${field}", which becomes patterns like "\.success" that never match JSON keys like "success". Complex paths with pipes (.results | length, .indexes[] | select(...)) are unsupported entirely. Without jq, all extraction calls return empty strings silently, causing tests to fail or pass incorrectly. Add a pre-flight jq requirement check at lines 1284–1295, or redesign the fallback to strip leading dots and support basic path traversal.

🔧 Suggested pre-flight guard (make jq mandatory)

# Pre-flight checks log_section "Pre-flight Checks" + if ! command -v jq >/dev/null 2>&1; then + log_error "jq is required for this test suite." + exit 1 + fi + if ! check_mcp_server; then log_error "MCP server check failed. Exiting." exit 1 fi

🤖 Prompt for AI Agents

In `@scripts/mcp/test_mcp_fts.sh` around lines 205 - 240, The current fallback JSON parsing in extract_json_field and check_json_bool is broken for jq-style dot-notation and complex paths (e.g., ".success", ".results | length") and silently fails; update the script to enforce jq presence at startup (fail fast with a clear error) OR rewrite the fallback to canonicalize the field by stripping leading dots and support simple dot-path traversal and array length checks used by extract_inner_result; specifically modify the startup check to call command -v jq and exit with an informative message if missing, and update extract_json_field/check_json_bool to delegate to jq-only or to parse dot paths by splitting on '.' and traversing nested objects/arrays so patterns like .success and .results|length return correct values.

coderabbitai · 2026-01-21T14:31:00Z

scripts/mcp/test_mcp_fts.sh

+    # Check results
+    local total_matches
+    local result_count
+    total_matches=$(extract_json_field "${response}" ".total_matches")
+    result_count=$(extract_json_field "${response}" ".results | length")
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 97

🏁 Script executed:

sed -n '505,520p' scripts/mcp/test_mcp_fts.sh sed -n '535,550p' scripts/mcp/test_mcp_fts.sh sed -n '565,575p' scripts/mcp/test_mcp_fts.sh sed -n '590,605p' scripts/mcp/test_mcp_fts.sh sed -n '690,705p' scripts/mcp/test_mcp_fts.sh sed -n '773,785p' scripts/mcp/test_mcp_fts.sh sed -n '850,865p' scripts/mcp/test_mcp_fts.sh sed -n '930,945p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 3718

🏁 Script executed:

grep -n "^extract_json_field\|^extract_inner_field\|^extract_inner_result" scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 148

🏁 Script executed:

rg "function extract_json_field|function extract_inner_field|function extract_inner_result" scripts/mcp/ -A 5

Repository: ProxySQL/proxysql-vec

Length of output: 47

🏁 Script executed:

rg "extract_json_field|extract_inner_field|extract_inner_result" scripts/mcp/ -B 2 -A 2 | head -100

Repository: ProxySQL/proxysql-vec

Length of output: 5618

🏁 Script executed:

rg "^fts_tool_call\(|function fts_tool_call" scripts/mcp/ -A 10

Repository: ProxySQL/proxysql-vec

Length of output: 573

🏁 Script executed:

sed -n '500,520p' scripts/mcp/test_mcp_fts.sh sed -n '530,555p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 1494

🏁 Script executed:

sed -n '156,195p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 942

🏁 Script executed:

rg "check_json_bool" scripts/mcp/test_mcp_fts.sh -B 2 -A 5 | head -80

Repository: ProxySQL/proxysql-vec

Length of output: 1886

🏁 Script executed:

sed -n '200,280p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 2369

🏁 Script executed:

sed -n '305,330p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 1070

🏁 Script executed:

sed -n '690,710p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 732

🏁 Script executed:

sed -n '850,870p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 663

🏁 Script executed:

sed -n '775,785p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 426

🏁 Script executed:

sed -n '930,945p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 551

🏁 Script executed:

# Let's also verify instances that correctly use extract_inner_field to see the pattern rg "extract_inner_field.*total_matches|extract_inner_field.*row_count" scripts/mcp/test_mcp_fts.sh -B 3 -A 1

Repository: ProxySQL/proxysql-vec

Length of output: 588

Use extract_inner_field consistently for MCP tool result fields.

Tool responses from fts_tool_call are nested in the MCP JSON-RPC structure. While check_json_bool correctly unwraps them via extract_inner_result, the flagged lines extract other fields like .total_matches and .row_count using extract_json_field, which operates on the wrapped response and yields empty/null values. This causes numeric comparisons to fail silently.

Replace extract_json_field with extract_inner_field at these locations:

Lines 509-514: .total_matches, .results | length

Lines 541-543: .total_matches

Lines 568-570: .total_matches

Lines 596-598: .total_matches

Lines 695-698: .total_matches

Lines 778-779: .row_count

Lines 855-857: .rebuilt_count

Lines 935-936: .total_matches

🔧 Example fix (apply similarly to all flagged locations)

- total_matches=$(extract_json_field "${response}" ".total_matches") - result_count=$(extract_json_field "${response}" ".results | length") + total_matches=$(extract_inner_field "${response}" ".total_matches") + result_count=$(extract_inner_field "${response}" ".results | length")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Check results

local total_matches

local result_count

total_matches=$(extract_json_field "${response}" ".total_matches")

result_count=$(extract_json_field "${response}" ".results | length")

# Check results

local total_matches

local result_count

total_matches=$(extract_inner_field "${response}" ".total_matches")

result_count=$(extract_inner_field "${response}" ".results | length")

🤖 Prompt for AI Agents

In `@scripts/mcp/test_mcp_fts.sh` around lines 509 - 514, The MCP tool responses are wrapped; replace uses of extract_json_field with extract_inner_field when pulling numeric fields from the wrapped fts_tool_call response (e.g., change total_matches, results | length, row_count, rebuilt_count extractions). Locate occurrences where extract_json_field is used alongside variables like total_matches, result_count, row_count, rebuilt_count (and where check_json_bool already uses extract_inner_result) and swap to extract_inner_field so the code unwraps the MCP JSON-RPC result before evaluating numeric comparisons; keep the same variable names and call sites (e.g., where response is parsed after fts_tool_call and in routines that follow check_json_bool).

coderabbitai · 2026-01-21T14:31:01Z

test/tap/tests/Makefile

+anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
+	$(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and check the specific lines echo "=== Checking Makefile lines 171-173 ===" sed -n '171,173p' test/tap/tests/Makefile echo "" echo "=== Checking Makefile lines 196 (and surrounding context) ===" sed -n '194,198p' test/tap/tests/Makefile echo "" echo "=== Checking which spelling appears in the Makefile ===" grep -n "EXCLUDE_TRACKING_VAR" test/tap/tests/Makefile

Repository: ProxySQL/proxysql-vec

Length of output: 1906

🏁 Script executed:

# Search the codebase for both spellings echo "=== Checking for EXCLUDE_TRACKING_VARAIABLES (misspelled) ===" rg "EXCLUDE_TRACKING_VARAIABLES" --type cpp --type h echo "" echo "=== Checking for EXCLUDE_TRACKING_VARIABLES (correct) ===" rg "EXCLUDE_TRACKING_VARIABLES" --type cpp --type h

Repository: ProxySQL/proxysql-vec

Length of output: 372

Fix typo in preprocessor define: EXCLUDE_TRACKING_VARAIABLES should be EXCLUDE_TRACKING_VARIABLES.

Line 172 uses -DEXCLUDE_TRACKING_VARAIABLES (misspelled), while the codebase defines and checks for -DEXCLUDE_TRACKING_VARIABLES (correct spelling). This typo means the intended preprocessor guard won't work as the code never references the misspelled symbol.

Suggested fix

anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so - $(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@ + $(CXX) -DEXCLUDE_TRACKING_VARIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so

$(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@

anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so

$(CXX) -DEXCLUDE_TRACKING_VARIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@

🤖 Prompt for AI Agents

In `@test/tap/tests/Makefile` around lines 171 - 173, The Makefile target anomaly_detection-t currently passes the misspelled preprocessor define -DEXCLUDE_TRACKING_VARAIABLES; update that flag to the correctly spelled -DEXCLUDE_TRACKING_VARIABLES in the anomaly_detection-t rule so the code that checks for EXCLUDE_TRACKING_VARIABLES is enabled as intended.

coderabbitai · 2026-01-21T14:31:01Z

test/tap/tests/vector_db_performance-t.cpp


 	ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size);
-	diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count());
+	diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());


⚠️ Potential issue | 🟠 Major

Incorrect format specifier may cause undefined behavior.

std::chrono::milliseconds::rep is typically int64_t/long long, not long. Using %ld is incorrect on LLP64 platforms (e.g., Windows 64-bit) where long is 32 bits. This also creates inconsistency with line 332 which still uses %lld for similar timing output.

Suggested fix: revert to %lld or cast explicitly

- diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count()); + diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());

diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));

🤖 Prompt for AI Agents

In `@test/tap/tests/vector_db_performance-t.cpp` at line 325, The diag call uses the wrong printf format specifier for insert_duration.count(); update the format to safely match the underlying type by either using "%lld" and casting insert_duration.count() to (long long) or by casting to long and using "%ld" consistently; change the diag invocation that prints "Time to insert %zu entries: %ld ms" to use a matching specifier and cast (e.g., "%zu entries: %lld ms", (long long)insert_duration.count()) so the format matches insert_duration.count() and aligns with the other timing output.

coderabbitai · 2026-01-21T14:31:01Z

test/tap/tests/vector_db_performance-t.cpp

 	diag("Average time per concurrent operation: %lld microseconds", avg_time);
-	diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count());
+	diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());


⚠️ Potential issue | 🟠 Major

Inconsistent and incorrect format specifier.

Line 380 correctly uses %lld for avg_time (long long), but line 381 uses %ld for total_duration.count() which also returns a 64-bit integer type. This inconsistency and type mismatch should be corrected.

Suggested fix

diag("Average time per concurrent operation: %lld microseconds", avg_time); - diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count()); + diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count()));

🤖 Prompt for AI Agents

In `@test/tap/tests/vector_db_performance-t.cpp` around lines 380 - 381, The second diagnostic uses the wrong/ inconsistent printf format for total_duration.count(); change the format to match the 64-bit type (use %lld) and ensure you cast total_duration.count() to (long long) in the diag call so both diag("Average time per concurrent operation: %lld microseconds", avg_time) and diag("Total time for %d operations: %lld microseconds", num_operations, (long long)total_duration.count()) use the same 64-bit specifier and types; locate these calls (diag with avg_time and diag with total_duration.count()) and update the format/specifier and cast accordingly.

Copilot

Pull request overview

This PR addresses security, memory safety, and thread safety issues identified in PR #21 code reviews for the Full-Text Search (FTS) functionality. It also includes additional cherry-picked fixes for test compilation issues.

Changes:

Security fixes: SQL injection prevention via identifier escaping and FTS5 query wrapping, allowlist-based name sanitization
Memory safety fixes: VLA replacement with std::vector, memory leak fixes for error strings
Thread safety fixes: mutex protection for FTS pointer swaps and runtime path changes
Test improvements: format specifier corrections, missing includes, build rule additions
New comprehensive test scripts for FTS functionality validation

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/tap/tests/vector_db_performance-t.cpp	Fixed printf format specifiers for duration types, added missing includes
test/tap/tests/anomaly_detection-t.cpp	Added stub function for SQLite3_Server_session_handler
test/tap/tests/ai_llm_retry_scenarios-t.cpp	Added missing string header include
test/tap/tests/Makefile	Added build rule for anomaly_detection-t test
scripts/mcp/test_mcp_fts_detailed.sh	New detailed FTS validation script (has security issue)
scripts/mcp/test_mcp_fts.sh	New comprehensive FTS test script
lib/MySQL_FTS.cpp	New FTS implementation with security fixes
lib/MySQL_FTS.h	New FTS class header
lib/MySQL_Tool_Handler.cpp	FTS integration with mutex protection
lib/MySQL_Tool_Handler.h	FTS method declarations
lib/Query_Tool_Handler.cpp	FTS tool registration, debug statement removal
lib/ProxySQL_MCP_Server.cpp	FTS path configuration parameter
lib/MCP_Thread.cpp	FTS path variable handling
lib/MCP_Thread.h	FTS path variable declaration
lib/Makefile	Added MySQL_FTS.oo to build
include/MySQL_Tool_Handler.h	FTS declarations and method signatures
include/MySQL_FTS.h	Complete FTS class interface
include/MCP_Thread.h	FTS path variable in struct
doc/MCP/FTS_USER_GUIDE.md	Comprehensive FTS user guide

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-21T14:32:24Z

test/tap/tests/vector_db_performance-t.cpp


 	diag("Average time per concurrent operation: %lld microseconds", avg_time);
-	diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count());
+	diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());


The printf format specifier %lld is correct for long long, but std::chrono::duration::count() returns rep type which is implementation-defined. On some platforms it may be long rather than long long, causing format mismatch warnings. Consider using static_cast<long long>(total_duration.count()) for portability.

Suggested change

diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());

diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count()));

Copilot · 2026-01-21T14:32:24Z

scripts/mcp/test_mcp_fts_detailed.sh

+
+mysql_exec() {
+  local sql="$1"
+  mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"


The MySQL password is exposed in the command line arguments using -p"${MYSQL_PASSWORD}". This is a security vulnerability as the password can be seen in process listings by other users. The script should use the MYSQL_PWD environment variable instead, which is already done in other parts of the script. Change this line to: MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"

Suggested change

mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"

MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"

Copilot · 2026-01-21T14:32:24Z

lib/MySQL_FTS.cpp

+
+	if (error || !resultset || resultset->rows.empty()) {
+		result["error"] = "Index not found for " + schema + "." + table;
+		if (resultset) delete resultset;


Missing error cleanup in reindex function. If an error is set from execute_statement (line 726), it should be freed before returning. Add if (error) (*proxy_sqlite3_free)(error); after line 730 and before returning the error result.

Suggested change

if (resultset) delete resultset;

if (resultset) delete resultset;

if (error) (*proxy_sqlite3_free)(error);

Copilot · 2026-01-21T14:32:24Z

lib/MySQL_FTS.cpp

+std::string MySQL_FTS::sanitize_name(const std::string& name) {
+	const size_t MAX_NAME_LEN = 100;
+	std::string sanitized;
+	// Allowlist: only ASCII letters, digits, underscore
+	for (char c : name) {
+		if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
+		    (c >= '0' && c <= '9') || c == '_') {
+			sanitized.push_back(c);
+		}
+	}
+	// Prevent leading digit (SQLite identifiers can't start with digit)
+	if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
+		sanitized.insert(sanitized.begin(), '_');
+	}
+	// Enforce maximum length
+	if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
+	return sanitized;
+}


The sanitize_name function could return an empty string if the input contains no valid characters. An empty sanitized name could lead to issues when constructing table names. Consider adding validation to ensure the sanitized name is not empty, or prepend a default prefix like "fts_" when the result is empty.

Copilot · 2026-01-21T14:32:25Z

test/tap/tests/vector_db_performance-t.cpp


 	ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size);
-	diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count());
+	diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());


The printf format specifier %lld is correct for long long, but std::chrono::duration::count() returns rep type which is implementation-defined. On some platforms it may be long rather than long long, causing format mismatch warnings. Consider using static_cast<long long>(insert_duration.count()) for portability.

Suggested change

diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());

diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));

This commit addresses all recommendations from CodeRabbit, Gemini Code Assist, and Copilot for PR #25 (FTS security and code quality improvements). Critical Security Fixes: - MCP_Thread.cpp: Rollback fts_path on reset failure to keep config consistent - MySQL_FTS.cpp: Add escape_mysql_identifier() for MySQL query identifier escaping - MySQL_FTS.cpp: Add unique hash-based fallback to sanitize_name() for empty strings - MySQL_FTS.cpp: Add where_clause validation to block dangerous SQL patterns Memory Safety Fixes: - MySQL_FTS.cpp: Fix indexes_result memory leak on early return in search() - MySQL_FTS.h: Delete copy/move operations to prevent accidental resource duplication Thread Safety Documentation: - MySQL_Tool_Handler.cpp: Add comment explaining FTS lock design rationale Test Script Improvements: - test_mcp_fts.sh: Add curl timeouts (5s connect, 30s max) - test_mcp_fts.sh: Remove unused delete_response variable - test_mcp_fts_detailed.sh: Make cleanup tolerant of non-existent indexes Build Fixes: - Makefile: Fix EXCLUDE_TRACKING_VARAIABLES typo to EXCLUDE_TRACKING_VARIABLES - vector_db_performance-t.cpp: Fix printf format specifiers to %lld with cast Schema Fixes: - Query_Tool_Handler.cpp: Change fts_index_table columns schema from string to array Code Cleanup: - MySQL_Tool_Handler.cpp: Remove all remaining debug fprintf statements (34 lines) - Documentation: Change "Full Text" to "Full-Text" (hyphenated) Total: ~50 fixes across 10 files

rahim-kanji and others added 8 commits January 20, 2026 14:19

Add FTS_User_Guide.md

8e2230c

Merge remote-tracking branch 'v3.1-vec' into v3.1_fts-support

18dd249

Merge pull request #21 from rahim-kanji/v3.1_fts-support

a831670

Add full-text search (FTS) tools to MCP query server

fix: Missing <string> header in ai_llm_retry_scenarios-t

fd5d433

- Add #include <string> for C++ std::string support Resolves compilation errors for ai_llm_retry_scenarios-t test.

renecannao requested a review from Copilot January 21, 2026 14:18

Copilot started reviewing on behalf of renecannao January 21, 2026 14:18 View session

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

coderabbitai bot reviewed Jan 21, 2026

View reviewed changes

Copilot AI reviewed Jan 21, 2026

View reviewed changes

renecannao merged commit f257c6a into v3.1-vec Jan 21, 2026

		anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
		$(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@

	diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
	diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));

	diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());
	diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count()));

	mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
	MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"

	if (resultset) delete resultset;
	if (resultset) delete resultset;
	if (error) (*proxy_sqlite3_free)(error);

Conversation

renecannao commented Jan 21, 2026

Summary

PR #21 Review Fixes

Security Fixes (Critical)

Memory Safety Fixes (High)

Thread Safety Fixes (High)

Code Cleanup

Test Script Security Fixes

Documentation Fixes

Files Modified

Additional Fixes

Verification

Uh oh!

gemini-code-assist bot commented Jan 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 21, 2026

coderabbitai bot commented Jan 21, 2026 •

edited

Loading