Skip to content

Add full-text search (FTS) tools to MCP query server#21

Merged
renecannao merged 3 commits intoProxySQL:v3.1-vec_21from
rahim-kanji:v3.1_fts-support
Jan 21, 2026
Merged

Add full-text search (FTS) tools to MCP query server#21
renecannao merged 3 commits intoProxySQL:v3.1-vec_21from
rahim-kanji:v3.1_fts-support

Conversation

@rahim-kanji
Copy link

Summary

  • Add full-text search (FTS) capability with BM25 ranking for MySQL/MariaDB tables
  • Implement external SQLite-based FTS index that doesn't require native MySQL FTS support
  • Add 6 new FTS tools for index management and searching

New Tools Added

Tool Description
fts_index_table Create and populate FTS index for a table
fts_search Search indexed data with BM25 ranking
fts_reindex Refresh index with fresh data
fts_delete_index Remove an FTS index
fts_list_indexes List all FTS indexes
fts_rebuild_all Rebuild all FTS indexes

Key Features

  • Thread-safe FTS operations with mutex protection
  • Optional WHERE clause support for partial table indexing
  • Primary key mapping for source data retrieval
  • Configurable FTS database path (runtime resettable)
  • Comprehensive test coverage with 2 test scripts

Technical Details

  • Uses SQLite FTS5 extension for inverted index
  • BM25 algorithm for relevance ranking
  • Stores indexed columns + metadata externally
  • Maintains 1:1 mapping with MySQL primary keys
  • Separate from MySQL - no performance impact on writes

Test Plan

  • scripts/mcp/test_mcp_fts.sh - Basic FTS functionality
  • scripts/mcp/test_mcp_fts_detailed.sh - Comprehensive FTS testing

Implement BM25-ranked full-text search capability for MySQL/MariaDB tables using SQLite-based external FTS index.

Changes:
- Add MySQL_FTS class for managing SQLite FTS indexes
- Add FTS tools: fts_index_table, fts_search, fts_reindex, fts_delete_index, fts_list_indexes, fts_rebuild_all
- Add thread-safe FTS lifecycle management with fts_lock mutex
- Add reset_fts_path() for runtime FTS database path configuration
- Add comprehensive FTS test scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh)
@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added Full Text Search (FTS): index, search, list, delete, reindex, rebuild operations and runtime-configurable FTS database path.
  • Tests

    • Added extensive automated and detailed FTS test suites covering index lifecycle, searches (phrase, pagination, BM25), snippets, special cases, and config reloads.
  • Documentation

    • Added comprehensive FTS user guide with usage, configuration, examples, and troubleshooting.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds SQLite-backed Full Text Search (FTS): new mcp_fts_path config, MySQL_Tool_Handler FTS wiring and APIs, MCP variable get/set support, Query tools registration, build integration, MySQL_FTS implementation, tests, and user documentation.

Changes

Cohort / File(s) Summary
Configuration Headers
include/MCP_Thread.h, include/MySQL_Tool_Handler.h
Added mcp_fts_path config to MCP_Threads_Handler; extended MySQL_Tool_Handler API and members (FTS pointer, fts_lock), constructor now accepts fts_path, reset_fts_path, reinit_fts, multiple fts_* public methods; execute_query moved to public.
MCP Thread Logic
lib/MCP_Thread.cpp
Registered fts_path variable, defaulted to mcp_fts.db, cleaned up on destroy, added get_variable("fts_path") and set_variable("fts_path") invoking mysql_tool_handler->reset_fts_path(...) at runtime.
MySQL Tool Implementation
lib/MySQL_Tool_Handler.cpp
Implemented FTS lifecycle: constructor/destructor handling, conditional FTS creation, init() integration, reset_fts_path, reinit_fts, and thread-safe wrappers (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) with filesystem checks and locking.
FTS Core
include/MySQL_FTS.h, lib/MySQL_FTS.cpp
New MySQL_FTS class and implementation: SQLite-backed index catalog, index/create/delete/reindex/rebuild/search/list APIs, schema/table helpers, JSON results, integration points accepting MySQL_Tool_Handler for data extraction.
Tool Registration
lib/Query_Tool_Handler.cpp
Registered and implemented MCP tools: fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all; parameter extraction and dispatch to MySQL_Tool_Handler.
Build & Integration
lib/Makefile, lib/ProxySQL_MCP_Server.cpp
Added MySQL_FTS.oo to lib objects; updated MySQL_Tool_Handler constructor call sites to pass mcp_fts_path.
Tests
scripts/mcp/test_mcp_fts.sh, scripts/mcp/test_mcp_fts_detailed.sh
Added extensive integration and detailed test suites covering index lifecycle, search variants (BM25, snippets, pagination, special chars), reindex/delete/rebuild, config reloads, custom DB paths, and setup/teardown helpers.
Documentation
doc/MCP/FTS_USER_GUIDE.md
New comprehensive user guide for MCP FTS: architecture, configuration, APIs, examples, troubleshooting, and testing instructions.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/MCP
    participant QTH as Query_Tool_Handler
    participant MTH as MySQL_Tool_Handler
    participant FTS as MySQL_FTS
    participant DB as SQLite

    rect rgba(200,200,255,0.5)
    Client->>QTH: fts_index_table(schema, table, columns)
    QTH->>MTH: fts_index_table(...) [acquire fts_lock]
    MTH->>FTS: index_table(...)
    FTS->>DB: write index metadata / store index
    DB-->>FTS: OK
    FTS-->>MTH: JSON result
    MTH-->>QTH: return result
    QTH-->>Client: JSON response
    end

    rect rgba(200,255,200,0.5)
    Client->>QTH: fts_search(query, schema, table)
    QTH->>MTH: fts_search(...) [acquire fts_lock]
    MTH->>FTS: search(...)
    FTS->>DB: SELECT using FTS (MATCH/snippet)
    DB-->>FTS: rows + snippets
    FTS-->>MTH: JSON result
    MTH-->>QTH: return result
    QTH-->>Client: JSON response
    end
Loading
sequenceDiagram
    participant Admin as Admin/Config
    participant MCP as MCP_Threads_Handler
    participant MTH as MySQL_Tool_Handler
    participant OldFTS as MySQL_FTS (old)
    participant NewFTS as MySQL_FTS (new)

    rect rgba(255,230,200,0.5)
    Admin->>MCP: set_variable("fts_path", new_path)
    MCP->>MTH: reset_fts_path(new_path)
    MTH->>MTH: validate directory
    MTH->>NewFTS: create & initialize
    NewFTS-->>MTH: initialized
    MTH->>MTH: acquire fts_lock
    MTH->>OldFTS: delete old instance (if any)
    MTH->>MTH: swap fts pointer to NewFTS
    MTH->>MTH: release fts_lock
    MTH-->>MCP: success
    MCP-->>Admin: variable updated
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Poem

🐰 I hopped through code, found FTS ground,
I planted indexes, snippets abound,
Locks snugly held, swaps done with care,
Queries return highlights everywhere,
A tiny rabbit cheered — search now shared!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main change: adding full-text search (FTS) tools to the MCP query server, which aligns with the comprehensive FTS implementation across multiple files.
Description check ✅ Passed The PR description is well-related to the changeset, providing a clear summary of FTS capabilities, new tools, features, and test coverage that matches the actual implementation changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@lib/MCP_Thread.cpp`:
- Around line 333-347: Currently the code updates variables.mcp_fts_path (via
strdup/free) before calling mysql_tool_handler->reset_fts_path(value), which can
leave the stored config out of sync if reset_fts_path fails; change the flow so
you attempt the runtime update first (call
mysql_tool_handler->reset_fts_path(value) while leaving variables.mcp_fts_path
unchanged), and only on success free() the old variables.mcp_fts_path and assign
strdup(value), returning 0; if reset_fts_path fails, do not modify
variables.mcp_fts_path and return -1. Reference symbols: variables.mcp_fts_path,
mysql_tool_handler, reset_fts_path, strdup, free.

In `@lib/MySQL_Tool_Handler.cpp`:
- Around line 117-137: reset_fts_path deletes the existing FTS before validating
the new path; instead allocate a temporary MySQL_FTS* (e.g., new_fts) with the
provided path, call new_fts->init(), and if init() fails call proxy_error,
delete new_fts and return false while leaving the original fts intact; only on
successful init swap under the existing pthread_mutex_lock (&fts_lock) by
deleting the old fts, assigning fts = new_fts, and then unlock and return true;
also preserve current behavior when path is empty (delete existing fts) and
ensure no double-free by only deleting the old instance after new_fts
successfully initialized.
- Around line 1142-1178: reinit_fts currently mutates the shared
pointer/variable fts without acquiring the FTS mutex; to fix, perform the
expensive new MySQL_FTS(fts_path) and new_fts->init() before taking the lock,
then acquire fts_lock (the same mutex used by other FTS calls) and inside the
critical section delete the old fts and assign fts = new_fts, then release the
lock; use the same locking primitive as the rest of the code (e.g.,
std::lock_guard or equivalent on fts_lock) to ensure no race with in-flight
operations while avoiding holding the lock during initialization.

In `@scripts/mcp/test_mcp_fts_detailed.sh`:
- Around line 71-74: The mysql_exec function exposes MYSQL_PASSWORD on the
process command line via the -p option; change it to avoid inline password usage
by using a safer method such as setting MYSQL_PWD in the environment before
invoking mysql or by using a MySQL option file, e.g., export
MYSQL_PWD="${MYSQL_PASSWORD}" && mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u
"${MYSQL_USER}" -e "${sql}" (and unset MYSQL_PWD afterward); update the
mysql_exec function and apply the same change to the similar invocation in
test_mcp_fts.sh to remove -p"${MYSQL_PASSWORD}" and prevent password leakage to
ps listings.
- Around line 251-260: The cleanup block currently unconditionally attempts to
delete fts_test.orders although it was already deleted earlier; update the
cleanup logic around the calls to tool_call "fts_delete_index" (and the
subsequent extract_tool_result) to either first check existence (e.g., call a
check/existence tool or API) and only call fts_delete_index for existing
indexes, or relax the jq assertion to accept either a success result or the
expected "index not found" error response (e.g., accept .success == true OR
.error indicates non-existent index) so the second delete does not cause the
test to fail.

In `@scripts/mcp/test_mcp_fts.sh`:
- Around line 992-993: The mysql invocations in test_fts_custom_database_path
(e.g., the command populating current_path and other mysql calls) expose the
password on the command line; replace direct use of -p"${MYSQL_PASSWORD}" with a
secure approach such as exporting MYSQL_PWD="${MYSQL_PASSWORD}" for the command
(e.g., MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" ...) or implement
a small helper function (e.g., safe_mysql()) that injects MYSQL_PWD into the
environment and runs mysql, then update all occurrences (current_path assignment
and the other mysql calls referenced) to use that helper to avoid password
exposure. Ensure the helper preserves -s -N and 2>/dev/null flags used in
existing calls.
- Around line 279-283: The mysql_exec function currently exposes the password on
the command line via -p"${MYSQL_PASSWORD}"; update mysql_exec to avoid that by
passing the password through the MYSQL_PWD environment variable instead (e.g.,
invoke mysql with MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P
"${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}" 2>/dev/null) so the password is
not visible in process listings; keep the function name mysql_exec and the same
parameters but remove the inline -p argument.
- Around line 341-343: The INSERT builds SQL with unescaped variables (title,
doc, category, priority) which allows quotes to break the statement; before
calling mysql_exec in the loop, sanitize each variable by replacing every
single-quote (') with two single-quotes ('') so the SQL string is safe (e.g.,
transform ${doc} -> escaped_doc), then use those escaped variables in the
INSERT; update the code around the mysql_exec call and reference the mysql_exec
invocation and the TEST_SCHEMA/TEST_TABLE variables when making the change.
🧹 Nitpick comments (7)
lib/Query_Tool_Handler.cpp (2)

278-300: Remove debug fprintf statements before merging.

These debug statements will pollute stderr in production. Convert to proxy_debug() calls for consistency with the rest of the codebase, or remove entirely.

♻️ Suggested fix
 static std::string get_json_string(const json& j, const std::string& key, const std::string& default_val = "") {
-	fprintf(stderr, "DEBUG: get_json_string key=%s, default='%s'\n", key.c_str(), default_val.c_str());
 	if (j.contains(key)) {
 		const json& val = j[key];
-		fprintf(stderr, "DEBUG: key exists, is_null=%d, is_string=%d\n", val.is_null(), val.is_string());
 		if (!val.is_null()) {
 			if (val.is_string()) {
 				std::string result = val.get<std::string>();
-				fprintf(stderr, "DEBUG: returning string: '%s'\n", result.c_str());
 				return result;
 			} else {
-				fprintf(stderr, "DEBUG: value is not a string, trying dump\n");
 				std::string result = val.dump();
-				fprintf(stderr, "DEBUG: returning dumped: '%s'\n", result.c_str());
 				return result;
 			}
 		}
 	}
-	fprintf(stderr, "DEBUG: returning default: '%s'\n", default_val.c_str());
 	return default_val;
 }

310-312: Debug fprintf in execute_tool should also be removed.

Same issue as the helper function - this debug statement will pollute stderr.

♻️ Suggested fix
 json Query_Tool_Handler::execute_tool(const std::string& tool_name, const json& arguments) {
-	fprintf(stderr, "DEBUG: execute_tool tool_name=%s, arguments=%s\n", tool_name.c_str(), arguments.dump().c_str());
-
 	if (!mysql_handler) {
 		return create_error_response("MySQL handler not initialized");
 	}
scripts/mcp/test_mcp_fts_detailed.sh (2)

101-106: Quote CURL_OPTS to prevent word splitting issues.

When CURL_OPTS is empty, the unquoted expansion can cause unexpected behavior. Use an array or conditional inclusion.

♻️ Suggested fix
 mcp_request() {
   local payload="$1"
-  curl ${CURL_OPTS} -s -X POST "${MCP_ENDPOINT}" \
+  curl ${CURL_OPTS:+"${CURL_OPTS}"} -s -X POST "${MCP_ENDPOINT}" \
     -H "Content-Type: application/json" \
     -d "${payload}"
 }

Or use an array approach for cleaner handling:

if [ "${USE_SSL}" = "true" ]; then
  CURL_OPTS=(-k)
else
  CURL_OPTS=()
fi

# Then use:
curl "${CURL_OPTS[@]}" -s -X POST ...

162-162: Columns argument passed as JSON string literal may cause issues.

The columns argument is passed as a raw JSON array string '["name","email","created_at"]' which is then interpolated into another JSON string. This works but is fragile. Consider validating the JSON structure.

scripts/mcp/test_mcp_fts.sh (3)

58-58: Unused variable FTS_INDEX_NAME.

This variable is defined but never used. Either remove it or use it where ${TEST_SCHEMA}.${TEST_TABLE} is repeated.

-FTS_INDEX_NAME="${TEST_SCHEMA}.${TEST_TABLE}"

74-81: Unused associative array SEARCH_QUERIES.

The SEARCH_QUERIES array is defined but the tests use hardcoded query strings instead. Remove or utilize this array for consistency.


356-359: Unused variable delete_response - consider checking for errors.

The response from fts_delete_index is captured but not validated. While cleanup failures may be acceptable, logging the error could help debugging.

♻️ Suggested improvement
     # Drop FTS index if exists
     local delete_response
     delete_response=$(fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}")
+    log_verbose "FTS index delete response: ${delete_response}"

Comment on lines +333 to +347
if (!strcmp(name, "fts_path")) {
if (variables.mcp_fts_path)
free(variables.mcp_fts_path);
variables.mcp_fts_path = strdup(value);
// Apply at runtime by resetting FTS in the existing handler
if (mysql_tool_handler) {
proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
if (!mysql_tool_handler->reset_fts_path(value)) {
proxy_error("Failed to reset FTS path at runtime\n");
return -1;
}
}

return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid updating mcp_fts_path when runtime reset fails.

The variable is updated before reset_fts_path succeeds, which can leave config out of sync with the live FTS handler. Swap the stored value only after a successful reset (or roll back on failure).

🔧 Suggested fix (update after successful reset)
 if (!strcmp(name, "fts_path")) {
-	if (variables.mcp_fts_path)
-		free(variables.mcp_fts_path);
-	variables.mcp_fts_path = strdup(value);
-	// Apply at runtime by resetting FTS in the existing handler
-	if (mysql_tool_handler) {
-		proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
-		if (!mysql_tool_handler->reset_fts_path(value)) {
-			proxy_error("Failed to reset FTS path at runtime\n");
-			return -1;
-		}
-	}
-
-	return 0;
+	char* new_path = strdup(value);
+	if (!new_path)
+		return -1;
+	// Apply at runtime by resetting FTS in the existing handler
+	if (mysql_tool_handler) {
+		proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
+		if (!mysql_tool_handler->reset_fts_path(value)) {
+			proxy_error("Failed to reset FTS path at runtime\n");
+			free(new_path);
+			return -1;
+		}
+	}
+	if (variables.mcp_fts_path)
+		free(variables.mcp_fts_path);
+	variables.mcp_fts_path = new_path;
+	return 0;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!strcmp(name, "fts_path")) {
if (variables.mcp_fts_path)
free(variables.mcp_fts_path);
variables.mcp_fts_path = strdup(value);
// Apply at runtime by resetting FTS in the existing handler
if (mysql_tool_handler) {
proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
if (!mysql_tool_handler->reset_fts_path(value)) {
proxy_error("Failed to reset FTS path at runtime\n");
return -1;
}
}
return 0;
}
if (!strcmp(name, "fts_path")) {
char* new_path = strdup(value);
if (!new_path)
return -1;
// Apply at runtime by resetting FTS in the existing handler
if (mysql_tool_handler) {
proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
if (!mysql_tool_handler->reset_fts_path(value)) {
proxy_error("Failed to reset FTS path at runtime\n");
free(new_path);
return -1;
}
}
if (variables.mcp_fts_path)
free(variables.mcp_fts_path);
variables.mcp_fts_path = new_path;
return 0;
}
🤖 Prompt for AI Agents
In `@lib/MCP_Thread.cpp` around lines 333 - 347, Currently the code updates
variables.mcp_fts_path (via strdup/free) before calling
mysql_tool_handler->reset_fts_path(value), which can leave the stored config out
of sync if reset_fts_path fails; change the flow so you attempt the runtime
update first (call mysql_tool_handler->reset_fts_path(value) while leaving
variables.mcp_fts_path unchanged), and only on success free() the old
variables.mcp_fts_path and assign strdup(value), returning 0; if reset_fts_path
fails, do not modify variables.mcp_fts_path and return -1. Reference symbols:
variables.mcp_fts_path, mysql_tool_handler, reset_fts_path, strdup, free.

Comment on lines +117 to +137
bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
pthread_mutex_lock(&fts_lock);

if (fts) {
delete fts;
fts = NULL;
}

if (!path.empty()) {
fts = new MySQL_FTS(path);
if (fts->init()) {
proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
delete fts;
fts = NULL;
pthread_mutex_unlock(&fts_lock);
return false;
}
}

pthread_mutex_unlock(&fts_lock);
return true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve existing FTS if the new path fails to init.

Currently the old FTS instance is deleted before the new one is validated, which can disable FTS on a bad path. Swap only after a successful init.

🔧 Suggested fix (safe swap)
 bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
 	pthread_mutex_lock(&fts_lock);

-	if (fts) {
-		delete fts;
-		fts = NULL;
-	}
-
-	if (!path.empty()) {
-		fts = new MySQL_FTS(path);
-		if (fts->init()) {
-			proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
-			delete fts;
-			fts = NULL;
-			pthread_mutex_unlock(&fts_lock);
-			return false;
-		}
-	}
+	MySQL_FTS* new_fts = NULL;
+	if (!path.empty()) {
+		new_fts = new MySQL_FTS(path);
+		if (!new_fts || new_fts->init()) {
+			proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
+			delete new_fts;
+			pthread_mutex_unlock(&fts_lock);
+			return false;
+		}
+	}
+	if (fts) {
+		delete fts;
+	}
+	fts = new_fts;

 	pthread_mutex_unlock(&fts_lock);
 	return true;
 }
🤖 Prompt for AI Agents
In `@lib/MySQL_Tool_Handler.cpp` around lines 117 - 137, reset_fts_path deletes
the existing FTS before validating the new path; instead allocate a temporary
MySQL_FTS* (e.g., new_fts) with the provided path, call new_fts->init(), and if
init() fails call proxy_error, delete new_fts and return false while leaving the
original fts intact; only on successful init swap under the existing
pthread_mutex_lock (&fts_lock) by deleting the old fts, assigning fts = new_fts,
and then unlock and return true; also preserve current behavior when path is
empty (delete existing fts) and ensure no double-free by only deleting the old
instance after new_fts successfully initialized.

Comment on lines +1142 to +1178
int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) {
proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str());

// Check if directory exists (SQLite can't create directories)
std::string::size_type last_slash = fts_path.find_last_of("/");
if (last_slash != std::string::npos && last_slash > 0) {
std::string dir = fts_path.substr(0, last_slash);
struct stat st;
if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) {
proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n",
fts_path.c_str(), dir.c_str());
return -1;
}
}

// First, test if we can open the new database
MySQL_FTS* new_fts = new MySQL_FTS(fts_path);
if (!new_fts) {
proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n");
return -1;
}

if (new_fts->init() != 0) {
proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str());
delete new_fts;
return -1; // Return error WITHOUT closing old FTS
}

// Success! Now close old and replace with new
if (fts) {
delete fts;
}
fts = new_fts;

proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str());
return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard reinit_fts with fts_lock to prevent races.

This method mutates fts without acquiring the same mutex used by FTS calls, which can race with in-flight operations. Lock it consistently.

🔧 Suggested fix (lock around mutation)
 int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) {
+	pthread_mutex_lock(&fts_lock);
 	proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str());

 	// Check if directory exists (SQLite can't create directories)
 	std::string::size_type last_slash = fts_path.find_last_of("/");
 	if (last_slash != std::string::npos && last_slash > 0) {
 		std::string dir = fts_path.substr(0, last_slash);
 		struct stat st;
 		if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) {
 			proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n",
 				fts_path.c_str(), dir.c_str());
+			pthread_mutex_unlock(&fts_lock);
 			return -1;
 		}
 	}

 	// First, test if we can open the new database
 	MySQL_FTS* new_fts = new MySQL_FTS(fts_path);
 	if (!new_fts) {
 		proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n");
+		pthread_mutex_unlock(&fts_lock);
 		return -1;
 	}

 	if (new_fts->init() != 0) {
 		proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str());
 		delete new_fts;
+		pthread_mutex_unlock(&fts_lock);
 		return -1;  // Return error WITHOUT closing old FTS
 	}

 	// Success! Now close old and replace with new
 	if (fts) {
 		delete fts;
 	}
 	fts = new_fts;

 	proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str());
+	pthread_mutex_unlock(&fts_lock);
 	return 0;
 }
🤖 Prompt for AI Agents
In `@lib/MySQL_Tool_Handler.cpp` around lines 1142 - 1178, reinit_fts currently
mutates the shared pointer/variable fts without acquiring the FTS mutex; to fix,
perform the expensive new MySQL_FTS(fts_path) and new_fts->init() before taking
the lock, then acquire fts_lock (the same mutex used by other FTS calls) and
inside the critical section delete the old fts and assign fts = new_fts, then
release the lock; use the same locking primitive as the rest of the code (e.g.,
std::lock_guard or equivalent on fts_lock) to ensure no race with in-flight
operations while avoiding holding the lock during initialization.

Comment on lines +71 to +74
mysql_exec() {
local sql="$1"
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Password exposed in command line is visible in process list.

Using -p"${MYSQL_PASSWORD}" exposes the password in ps output. Consider using MYSQL_PWD environment variable or a MySQL option file instead.

🔒 Suggested fix using MYSQL_PWD
 mysql_exec() {
   local sql="$1"
-  mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
+  MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"
 }

Note: The same pattern appears in test_mcp_fts.sh at line 279-282 and should be updated there as well.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mysql_exec() {
local sql="$1"
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
}
mysql_exec() {
local sql="$1"
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"
}
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts_detailed.sh` around lines 71 - 74, The mysql_exec
function exposes MYSQL_PASSWORD on the process command line via the -p option;
change it to avoid inline password usage by using a safer method such as setting
MYSQL_PWD in the environment before invoking mysql or by using a MySQL option
file, e.g., export MYSQL_PWD="${MYSQL_PASSWORD}" && mysql -h "${MYSQL_HOST}" -P
"${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}" (and unset MYSQL_PWD afterward);
update the mysql_exec function and apply the same change to the similar
invocation in test_mcp_fts.sh to remove -p"${MYSQL_PASSWORD}" and prevent
password leakage to ps listings.

Comment on lines +251 to +260
if [ "${CLEANUP}" = "true" ]; then
log "Cleanup: deleting fts_test.customers and fts_test.orders indexes"
delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"customers"}')
delete_resp=$(extract_tool_result "${delete_resp}")
echo "${delete_resp}" | jq -e '.success == true' >/dev/null

delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"orders"}')
delete_resp=$(extract_tool_result "${delete_resp}")
echo "${delete_resp}" | jq -e '.success == true' >/dev/null
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cleanup deletes fts_test.orders index that was already deleted at line 237.

The test at line 237 deletes fts_test.orders. When --cleanup is specified, line 257-259 attempts to delete it again. While the test verifies .success == true, the backend should return an error for non-existent index (as tested in test_mcp_fts.sh test 18). This assertion may fail.

🐛 Suggested fix - skip already-deleted index or expect failure
 if [ "${CLEANUP}" = "true" ]; then
   log "Cleanup: deleting fts_test.customers and fts_test.orders indexes"
   delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"customers"}')
   delete_resp=$(extract_tool_result "${delete_resp}")
   echo "${delete_resp}" | jq -e '.success == true' >/dev/null

-  delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"orders"}')
-  delete_resp=$(extract_tool_result "${delete_resp}")
-  echo "${delete_resp}" | jq -e '.success == true' >/dev/null
+  # Note: fts_test.orders was already deleted earlier in the test
 fi
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts_detailed.sh` around lines 251 - 260, The cleanup
block currently unconditionally attempts to delete fts_test.orders although it
was already deleted earlier; update the cleanup logic around the calls to
tool_call "fts_delete_index" (and the subsequent extract_tool_result) to either
first check existence (e.g., call a check/existence tool or API) and only call
fts_delete_index for existing indexes, or relax the jq assertion to accept
either a success result or the expected "index not found" error response (e.g.,
accept .success == true OR .error indicates non-existent index) so the second
delete does not cause the test to fail.

Comment on lines +279 to +283
mysql_exec() {
local sql="$1"
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \
-e "${sql}" 2>/dev/null
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same password exposure issue as the detailed test script.

🔒 Suggested fix using MYSQL_PWD
 mysql_exec() {
     local sql="$1"
-    mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \
-        -e "${sql}" 2>/dev/null
+    MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" \
+        -e "${sql}" 2>/dev/null
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mysql_exec() {
local sql="$1"
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \
-e "${sql}" 2>/dev/null
}
mysql_exec() {
local sql="$1"
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" \
-e "${sql}" 2>/dev/null
}
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 279 - 283, The mysql_exec function
currently exposes the password on the command line via -p"${MYSQL_PASSWORD}";
update mysql_exec to avoid that by passing the password through the MYSQL_PWD
environment variable instead (e.g., invoke mysql with
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u
"${MYSQL_USER}" -e "${sql}" 2>/dev/null) so the password is not visible in
process listings; keep the function name mysql_exec and the same parameters but
remove the inline -p argument.

Comment on lines +341 to +343
mysql_exec "INSERT INTO ${TEST_SCHEMA}.${TEST_TABLE} (title, content, category, priority) \
VALUES ('${title}', '${doc}', '${category}', '${priority}');" 2>/dev/null || true
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection vulnerability in INSERT statement.

The ${doc} variable contains user-defined test data with apostrophes (e.g., "Customer John Smith...") which will break the SQL or could enable injection. Use proper escaping or parameterized queries.

🐛 Suggested fix - escape single quotes
     for doc_id in "${!TEST_DOCUMENTS[@]}"; do
         local doc="${TEST_DOCUMENTS[$doc_id]}"
+        # Escape single quotes for SQL
+        doc="${doc//\'/\'\'}"
         local title="Document ${doc_id}"
         ...
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 341 - 343, The INSERT builds SQL
with unescaped variables (title, doc, category, priority) which allows quotes to
break the statement; before calling mysql_exec in the loop, sanitize each
variable by replacing every single-quote (') with two single-quotes ('') so the
SQL string is safe (e.g., transform ${doc} -> escaped_doc), then use those
escaped variables in the INSERT; update the code around the mysql_exec call and
reference the mysql_exec invocation and the TEST_SCHEMA/TEST_TABLE variables
when making the change.

Comment on lines +992 to +993
current_path=$(mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \
-e "SELECT @@mcp-fts_path" -s -N 2>/dev/null | tr -d '\r')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Password exposure in multiple mysql invocations in test_fts_custom_database_path.

Several direct mysql calls in this test function also expose the password via command line. Apply the same MYSQL_PWD fix or create a helper function.

Also applies to: 1005-1006, 1018-1019, 1029-1030, 1089-1092

🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 992 - 993, The mysql invocations in
test_fts_custom_database_path (e.g., the command populating current_path and
other mysql calls) expose the password on the command line; replace direct use
of -p"${MYSQL_PASSWORD}" with a secure approach such as exporting
MYSQL_PWD="${MYSQL_PASSWORD}" for the command (e.g.,
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" ...) or implement a small
helper function (e.g., safe_mysql()) that injects MYSQL_PWD into the environment
and runs mysql, then update all occurrences (current_path assignment and the
other mysql calls referenced) to use that helper to avoid password exposure.
Ensure the helper preserves -s -N and 2>/dev/null flags used in existing calls.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@doc/MCP/FTS_USER_GUIDE.md`:
- Line 853: Replace the bare URL "https://github.com/ProxySQL/proxysql-vec" with
a labeled Markdown link to satisfy lint and improve rendering; for example
change the line to use a descriptive label like [ProxySQL proxysql-vec GitHub
repository](https://github.com/ProxySQL/proxysql-vec) so the link is not bare.
🧹 Nitpick comments (2)
doc/MCP/FTS_USER_GUIDE.md (2)

32-38: Add language identifiers to fenced code blocks. This is flagged by markdownlint and improves readability/syntax highlighting.

Proposed fix
-```
+```text
 Traditional Query Flow:
 LLM Agent → Full Table Scan → Millions of Rows → Slow Response
@@
-```
+```text
@@
-```
+```text
@@
-```
+```text
@@
-```
+```text
@@
-```
+```text

Also applies to: 46-87, 92-99, 575-577, 773-775, 779-781


1-1: Hyphenate compound adjective “full-text” where it modifies a noun. Keeps phrasing consistent and matches common style guides.

Also applies to: 19-19

## Support

For issues, questions, or contributions:
- GitHub: https://github.com/ProxySQL/proxysql-vec
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid bare URLs in Markdown. Convert to a labeled link to satisfy lint and improve rendering.

Proposed fix
-- GitHub: https://github.com/ProxySQL/proxysql-vec
+- GitHub: [ProxySQL/proxysql-vec](https://github.com/ProxySQL/proxysql-vec)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- GitHub: https://github.com/ProxySQL/proxysql-vec
- GitHub: [ProxySQL/proxysql-vec](https://github.com/ProxySQL/proxysql-vec)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

853-853: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In `@doc/MCP/FTS_USER_GUIDE.md` at line 853, Replace the bare URL
"https://github.com/ProxySQL/proxysql-vec" with a labeled Markdown link to
satisfy lint and improve rendering; for example change the line to use a
descriptive label like [ProxySQL proxysql-vec GitHub
repository](https://github.com/ProxySQL/proxysql-vec) so the link is not bare.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@lib/MySQL_FTS.cpp`:
- Around line 303-316: The query builder in MySQL_FTS.cpp currently appends
identifiers and the where_clause raw (symbols: mysql_query, selected_cols,
schema, table, where_clause), which risks SQL injection; implement an
identifier-escaping helper (e.g., escape_identifier) that doubles any backticks
and use it when writing selected_cols, schema and table into mysql_query, and
stop appending where_clause verbatim—either require the caller to supply a
pre-validated/parameterized filter, validate the clause against an allowed
pattern, or convert to a parameterized WHERE using bound parameters; update the
mysql_query construction to use escape_identifier(schema),
escape_identifier(table), and escaped selected_cols instead of raw values.
- Around line 582-585: The code sets the C-string pointer error and returns
without freeing it, causing a memory leak; in the block where result["error"] is
assigned (the same place where error is checked), call the SQLite free stub
exactly as used in search(): (*proxy_sqlite3_free)(error) before returning
result.dump(); locate the error pointer usage in MySQL_FTS.cpp (the error
variable and the block that builds result["error"]) and add the free call
immediately after constructing the error message (ensuring you do not free error
twice elsewhere).
- Around line 748-756: The code reads reindex_json["error"].get<std::string>()
which can throw if the "error" key is missing; update the failure branch (where
reindex_json is inspected and failed_item is built) to safely obtain the error
string — either check reindex_json.contains("error") before calling get or use
reindex_json.value("error", "unknown error") (or similar default) to populate
failed_item["error"]; keep the rest of the logic (failed.push_back(failed_item),
rebuilt_count handling) unchanged and reference reindex_json, failed_item, and
failed in your change.
- Around line 90-99: sanitize_name currently only replaces '.', '-', and space,
leaving other dangerous characters (backticks, quotes, semicolons, nulls) that
can enable SQL injection when used in CREATE/ALTER statements; change it to an
allowlist-based sanitizer that only permits ASCII letters, digits and
underscore, enforces the first character is a letter, enforces a reasonable max
length, rejects or canonicalizes empty/invalid results (e.g., return error/throw
or prepend a fixed safe prefix), and explicitly strip or reject any NUL bytes or
non-ASCII characters; update the sanitize_name implementation to iterate over
bytes of the input, build a new string containing only [A-Za-z0-9_], ensure the
first character is a letter (or prefix with "t_" if not), truncate to the max
length, and make callers treat an empty/invalid sanitized result as an error
rather than embedding it into SQL.
- Around line 30-33: The init path leaks the allocated SQLite3DB when db->open()
fails; in MySQL_FTS.cpp where you construct SQLite3DB* db and call db->open(),
ensure you free the allocation before returning -1 (either delete db on error or
switch the variable to a smart pointer like std::unique_ptr<SQLite3DB>), and
include the same error logging via proxy_error; update the failure branch around
db->open() to release/delete db (or use unique_ptr so it auto-frees) before
returning.
- Around line 24-37: MySQL_FTS::init currently creates a VLA for db_path and
calls new SQLite3DB() without handling allocation failure; replace the VLA with
a heap buffer using std::vector<char> (copy db_path into it and push a
terminating '\0') and pass path_buf.data() to db->open, and change the
allocation to use new (std::nothrow) SQLite3DB() (or wrap in try/catch) and
check the returned pointer (db) for nullptr before calling db->open; on failure
call proxy_error and return -1; keep the rest of the flow (init_schema)
unchanged and reference the SQLite3DB, db_path, db->open, proxy_error, and
init_schema symbols when making the edits.
- Around line 488-497: The MATCH clause currently interpolates user input via
escape_sql(query) into the constructed search_sql (see fts_table, sanitized_data
and the WHERE ... MATCH usage), which allows FTS5 operator injection; fix by
producing an FTS5-literal-safe value: double any embedded double-quote
characters in the user query and wrap the entire query in double quotes before
inserting into MATCH (i.e., transform query -> "\"" + replace(query, "\"",
"\"\"") + "\""), or better yet switch to a parameterized/bound MATCH parameter
instead of string interpolation; update the code that builds search_sql (the
snippet constructing search_sql and the use of escape_sql) to use the
quoted/doubled-quote form or bind parameters so MATCH receives a literal phrase.
🧹 Nitpick comments (5)
include/MySQL_FTS.h (1)

7-7: Unused include.

<memory> is included but no smart pointers (std::unique_ptr, std::shared_ptr) are used in this class. Consider removing it or switching to smart pointers for SQLite3DB* db member.

lib/MySQL_FTS.cpp (4)

46-61: PRAGMA and index creation failures silently ignored.

Lines 48-50 execute PRAGMA statements without checking return values. While these are typically optional optimizations, failures could indicate underlying issues. Similarly, lines 84-85 don't check if index creation succeeded.


332-392: Batch insert without transaction may be slow and non-atomic.

Rows are inserted individually without an explicit transaction. SQLite auto-commits each statement by default (unless already in a transaction). For large tables, this will be significantly slower and won't be atomic.

Proposed fix - wrap in transaction
 	db->wrlock();
+	db->execute("BEGIN TRANSACTION");

 	try {
 		// ... existing insert loop ...

 		// Update metadata
 		// ...
 		db->execute(metadata_sql.str().c_str());

+		db->execute("COMMIT");
 		db->wrunlock();
 		// ...
 	} catch (const std::exception& e) {
+		db->execute("ROLLBACK");
 		db->wrunlock();
 		// ...
 	}

544-554: In-memory pagination is inefficient for large datasets.

All matching results are fetched into all_results before pagination is applied. For queries matching many rows across multiple indexes, this could consume significant memory. Consider pushing LIMIT and OFFSET to individual index queries or implementing cursor-based pagination.


633-656: Delete operations not wrapped in transaction.

If an error occurs between dropping triggers/tables and deleting metadata, the database could be left in an inconsistent state. Consider wrapping in a transaction.

Comment on lines +24 to +37
int MySQL_FTS::init() {
// Initialize database connection
db = new SQLite3DB();
char path_buf[db_path.size() + 1];
strcpy(path_buf, db_path.c_str());
int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
if (rc != SQLITE_OK) {
proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc);
return -1;
}

// Initialize schema
return init_schema();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

VLA and potential null pointer dereference.

  1. Line 27 uses a Variable Length Array (VLA) which is a non-standard C++ extension. If db_path is very long, this could cause stack overflow.
  2. Line 29 uses db->open() without checking if new SQLite3DB() returned a valid pointer (though in practice new throws on failure in C++, unless nothrow is used).
Suggested fix using std::vector or c_str() directly
 int MySQL_FTS::init() {
 	// Initialize database connection
 	db = new SQLite3DB();
-	char path_buf[db_path.size() + 1];
-	strcpy(path_buf, db_path.c_str());
-	int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
+	int rc = db->open(const_cast<char*>(db_path.c_str()), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
 	if (rc != SQLITE_OK) {
 		proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc);
+		delete db;
+		db = NULL;
 		return -1;
 	}

If db->open() requires a mutable buffer, use std::vector<char>:

std::vector<char> path_buf(db_path.begin(), db_path.end());
path_buf.push_back('\0');
int rc = db->open(path_buf.data(), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
🧰 Tools
🪛 Cppcheck (2.19.0)

[warning] 30-30: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 24 - 37, MySQL_FTS::init currently creates a
VLA for db_path and calls new SQLite3DB() without handling allocation failure;
replace the VLA with a heap buffer using std::vector<char> (copy db_path into it
and push a terminating '\0') and pass path_buf.data() to db->open, and change
the allocation to use new (std::nothrow) SQLite3DB() (or wrap in try/catch) and
check the returned pointer (db) for nullptr before calling db->open; on failure
call proxy_error and return -1; keep the rest of the flow (init_schema)
unchanged and reference the SQLite3DB, db_path, db->open, proxy_error, and
init_schema symbols when making the edits.

Comment on lines +30 to +33
if (rc != SQLITE_OK) {
proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc);
return -1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Memory leak on init failure.

If db->open() fails, the allocated SQLite3DB object is not deleted before returning -1, causing a memory leak.

Proposed fix
 	int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
 	if (rc != SQLITE_OK) {
 		proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc);
+		delete db;
+		db = NULL;
 		return -1;
 	}
🧰 Tools
🪛 Cppcheck (2.19.0)

[warning] 30-30: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 30 - 33, The init path leaks the allocated
SQLite3DB when db->open() fails; in MySQL_FTS.cpp where you construct SQLite3DB*
db and call db->open(), ensure you free the allocation before returning -1
(either delete db on error or switch the variable to a smart pointer like
std::unique_ptr<SQLite3DB>), and include the same error logging via proxy_error;
update the failure branch around db->open() to release/delete db (or use
unique_ptr so it auto-frees) before returning.

Comment on lines +90 to +99
std::string MySQL_FTS::sanitize_name(const std::string& name) {
std::string sanitized = name;
// Replace dots and special characters with underscores
for (size_t i = 0; i < sanitized.length(); i++) {
if (sanitized[i] == '.' || sanitized[i] == '-' || sanitized[i] == ' ') {
sanitized[i] = '_';
}
}
return sanitized;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incomplete sanitization allows potential SQL injection in table names.

sanitize_name() only replaces ., -, and space characters. It does not handle SQL metacharacters like backticks, quotes, semicolons, or null bytes. Since the result is used directly in SQL statements (e.g., CREATE TABLE ... << sanitized_data), malicious schema/table names could inject SQL.

Proposed fix - use allowlist approach
 std::string MySQL_FTS::sanitize_name(const std::string& name) {
-	std::string sanitized = name;
-	// Replace dots and special characters with underscores
-	for (size_t i = 0; i < sanitized.length(); i++) {
-		if (sanitized[i] == '.' || sanitized[i] == '-' || sanitized[i] == ' ') {
-			sanitized[i] = '_';
+	std::string sanitized;
+	sanitized.reserve(name.length());
+	for (size_t i = 0; i < name.length(); i++) {
+		char c = name[i];
+		// Allow only alphanumeric and underscore
+		if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
+		    (c >= '0' && c <= '9') || c == '_') {
+			sanitized += c;
+		} else {
+			sanitized += '_';
 		}
 	}
 	return sanitized;
 }
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 90 - 99, sanitize_name currently only
replaces '.', '-', and space, leaving other dangerous characters (backticks,
quotes, semicolons, nulls) that can enable SQL injection when used in
CREATE/ALTER statements; change it to an allowlist-based sanitizer that only
permits ASCII letters, digits and underscore, enforces the first character is a
letter, enforces a reasonable max length, rejects or canonicalizes empty/invalid
results (e.g., return error/throw or prepend a fixed safe prefix), and
explicitly strip or reject any NUL bytes or non-ASCII characters; update the
sanitize_name implementation to iterate over bytes of the input, build a new
string containing only [A-Za-z0-9_], ensure the first character is a letter (or
prefix with "t_" if not), truncate to the max length, and make callers treat an
empty/invalid sanitized result as an error rather than embedding it into SQL.

Comment on lines +303 to +316
// Build MySQL query to fetch data
std::ostringstream mysql_query;
mysql_query << "SELECT ";
for (size_t i = 0; i < selected_cols.size(); i++) {
if (i > 0) mysql_query << ", ";
mysql_query << "`" << selected_cols[i] << "`";
}

mysql_query << " FROM `" << schema << "`.`" << table << "`";

if (!where_clause.empty()) {
mysql_query << " WHERE " << where_clause;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection risk in MySQL query construction.

  1. Line 311: Schema and table names are wrapped in backticks but backticks within the names are not escaped. A schema/table containing a backtick could break out.
  2. Line 314: where_clause is appended directly without any validation or sanitization, allowing arbitrary SQL injection.
Proposed fix

For backtick escaping in identifiers:

std::string escape_identifier(const std::string& name) {
    std::string escaped;
    for (char c : name) {
        if (c == '`') escaped += "``";
        else escaped += c;
    }
    return escaped;
}

For where_clause, consider:

  1. Validating it contains only allowed SQL constructs, or
  2. Documenting that this is a privileged operation and the caller must ensure safety, or
  3. Using parameterized queries where possible
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 303 - 316, The query builder in MySQL_FTS.cpp
currently appends identifiers and the where_clause raw (symbols: mysql_query,
selected_cols, schema, table, where_clause), which risks SQL injection;
implement an identifier-escaping helper (e.g., escape_identifier) that doubles
any backticks and use it when writing selected_cols, schema and table into
mysql_query, and stop appending where_clause verbatim—either require the caller
to supply a pre-validated/parameterized filter, validate the clause against an
allowed pattern, or convert to a parameterized WHERE using bound parameters;
update the mysql_query construction to use escape_identifier(schema),
escape_identifier(table), and escaped selected_cols instead of raw values.

Comment on lines +488 to +497
// Search query for this index (use table name for MATCH/bm25)
std::ostringstream search_sql;
search_sql << "SELECT d.schema_name, d.table_name, d.primary_key_value, "
<< "snippet(" << fts_table << ", 0, '<mark>', '</mark>', '...', 30) AS snippet, "
<< "d.metadata "
<< "FROM " << fts_table << " "
<< "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid "
<< "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' "
<< "ORDER BY bm25(" << fts_table << ") ASC "
<< "LIMIT " << limit;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the MySQL_FTS.cpp file
find . -name "MySQL_FTS.cpp" -type f

Repository: ProxySQL/proxysql-vec

Length of output: 84


🏁 Script executed:

# Search for the escape_sql function to understand how it works
rg "escape_sql" -A 5 -B 2 | head -100

Repository: ProxySQL/proxysql-vec

Length of output: 5391


🌐 Web query:

SQLite FTS5 MATCH query escaping requirements and security

💡 Result:

Short summary + practical guidance:

Key points

  • The right-hand side of MATCH is an FTS5 query string that is parsed (supports boolean operators, NEAR, prefixes, column filters, phrases, etc.). User input can therefore be interpreted as query operators, not just a literal term. [1][2]
  • FTS5 allows unquoted barewords (letters, digits, underscore, non-ASCII) and requires quoting (double quotes) for strings containing other characters (spaces, punctuation). Inside a quoted string double quotes are escaped by doubling them. [1]
  • Because the MATCH string is parsed by FTS5, you must treat it like any other language input: untrusted input can change the meaning of the query (i.e., “injection” of operators). [1][3]

Escaping / literal-search rules

  • To make an input treated as a single literal phrase, enclose it in double quotes and double any embedded double-quote characters. Example: input He said "hi" -> MATCH '"He said ""hi"""'. [1]
  • If you need to treat each word literally (not as a phrase), quote each word separately: MATCH '"word1" "word2"'. [1]
  • Bare punctuation/special characters cannot appear unquoted in a bareword; if they’re present you must quote the string. Also bear in mind future FTS5 versions might change bareword allowances, so quoting is safer for untrusted content. [1]

Security / best practices

  • Prefer parameterized queries / prepared statements rather than concatenating SQL. You can bind the MATCH RHS as a parameter (e.g., WHERE ft MATCH ? or use the table-valued form ft(?)). Binding prevents SQL-level injection and is the recommended first defense. [1][2]
  • Because MATCH has its own query language, binding alone does not change how FTS5 parses the bound string — it still can contain FTS operators. If you need to prevent the user from supplying operators (i.e., force a literal token/phrase search), sanitize by quoting the whole input as an FTS5 string (wrap in double quotes, escape internal double-quotes) before binding. Example: bind the value '"user input with ""quotes"""'. [1][3]
  • For strict literal substring matching consider using the trigram tokenizer and LIKE/GLOB on that column (FTS5 trigram supports indexed LIKE/GLOB), or maintain a separate (escaped) column for exact matches. These approaches avoid FTS5 query parsing semantics. [1]
  • Validate inputs and limit features exposed to untrusted callers (e.g., disallow OR/NEAR/column filters) by sanitizing or rejecting characters/patterns you don’t want allowed.

Minimal recommended workflow

  1. Decide whether callers may supply FTS operators. If yes, bind their string directly (prepared statement). If no:
  2. Escape by wrapping the user input in double quotes and doubling any existing double quotes, then bind that value to MATCH ? (or use ft(?)). This forces a phrase-literal interpretation. [1][3]
  3. For substring/complex needs, consider trigram tokenizer + LIKE/GLOB or explicit tokenization on input. [1]

References

  • SQLite FTS5 documentation (query syntax, strings, quoting, table-valued form, trigram tokenizer). [1][2]
  • Community note / examples on escaping MATCH terms (practical quoting advice). [3]

References used
[1] SQLite FTS5 documentation — Query syntax & strings/phrases/quoting.
[2] SQLite FTS5 documentation — table-valued function, MATCH parameter examples (bm25/rank examples).
[3] StackOverflow discussion showing practical quoting examples for MATCH.


🏁 Script executed:

# Read the search() function around lines 488-497 to get full context
sed -n '470,520p' lib/MySQL_FTS.cpp | cat -n

Repository: ProxySQL/proxysql-vec

Length of output: 2292


🏁 Script executed:

# Check if there's any validation on the query parameter in the function signature or before line 488
sed -n '426,497p' lib/MySQL_FTS.cpp | cat -n

Repository: ProxySQL/proxysql-vec

Length of output: 2858


FTS5 query syntax injection risk in MATCH clause.

Line 495 uses escape_sql(query) which only escapes single quotes, but FTS5 MATCH parses special operators and syntax (*, ", AND, OR, NOT, NEAR, etc.). Without FTS5-compliant escaping, untrusted user input can inject these operators to alter query behavior or trigger unintended search patterns.

For example:

  • Input test OR malicious would execute as a boolean OR instead of a literal phrase search
  • Input test* would perform a prefix wildcard search

Use FTS5-compliant escaping: wrap the entire query in double quotes and double any embedded double-quote characters before the MATCH clause (e.g., "user input with ""quotes""" to force literal phrase matching). Alternatively, use parameterized queries with bound parameters to prevent the injection vector.

🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 488 - 497, The MATCH clause currently
interpolates user input via escape_sql(query) into the constructed search_sql
(see fts_table, sanitized_data and the WHERE ... MATCH usage), which allows FTS5
operator injection; fix by producing an FTS5-literal-safe value: double any
embedded double-quote characters in the user query and wrap the entire query in
double quotes before inserting into MATCH (i.e., transform query -> "\"" +
replace(query, "\"", "\"\"") + "\""), or better yet switch to a
parameterized/bound MATCH parameter instead of string interpolation; update the
code that builds search_sql (the snippet constructing search_sql and the use of
escape_sql) to use the quoted/doubled-quote form or bind parameters so MATCH
receives a literal phrase.

Comment on lines +582 to +585
if (error) {
result["error"] = "Failed to list indexes: " + std::string(error);
return result.dump();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory leak: error pointer not freed.

When error is set (line 582-583), it should be freed with (*proxy_sqlite3_free)(error) as done in the search() method (line 508).

Proposed fix
 	if (error) {
 		result["error"] = "Failed to list indexes: " + std::string(error);
+		(*proxy_sqlite3_free)(error);
 		return result.dump();
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (error) {
result["error"] = "Failed to list indexes: " + std::string(error);
return result.dump();
}
if (error) {
result["error"] = "Failed to list indexes: " + std::string(error);
(*proxy_sqlite3_free)(error);
return result.dump();
}
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 582 - 585, The code sets the C-string pointer
error and returns without freeing it, causing a memory leak; in the block where
result["error"] is assigned (the same place where error is checked), call the
SQLite free stub exactly as used in search(): (*proxy_sqlite3_free)(error)
before returning result.dump(); locate the error pointer usage in MySQL_FTS.cpp
(the error variable and the block that builds result["error"]) and add the free
call immediately after constructing the error message (ensuring you do not free
error twice elsewhere).

Comment on lines +748 to +756
if (reindex_json["success"].get<bool>()) {
rebuilt_count++;
} else {
json failed_item;
failed_item["schema"] = schema;
failed_item["table"] = table;
failed_item["error"] = reindex_json["error"].get<std::string>();
failed.push_back(failed_item);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential exception if "error" key is missing.

Line 754 calls reindex_json["error"].get<std::string>() but the index_table method may return without setting an "error" key in some edge cases. Use .value() with a default or check for key existence.

Proposed fix
 		if (reindex_json["success"].get<bool>()) {
 			rebuilt_count++;
 		} else {
 			json failed_item;
 			failed_item["schema"] = schema;
 			failed_item["table"] = table;
-			failed_item["error"] = reindex_json["error"].get<std::string>();
+			failed_item["error"] = reindex_json.value("error", "Unknown error");
 			failed.push_back(failed_item);
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (reindex_json["success"].get<bool>()) {
rebuilt_count++;
} else {
json failed_item;
failed_item["schema"] = schema;
failed_item["table"] = table;
failed_item["error"] = reindex_json["error"].get<std::string>();
failed.push_back(failed_item);
}
if (reindex_json["success"].get<bool>()) {
rebuilt_count++;
} else {
json failed_item;
failed_item["schema"] = schema;
failed_item["table"] = table;
failed_item["error"] = reindex_json.value("error", "Unknown error");
failed.push_back(failed_item);
}
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 748 - 756, The code reads
reindex_json["error"].get<std::string>() which can throw if the "error" key is
missing; update the failure branch (where reindex_json is inspected and
failed_item is built) to safely obtain the error string — either check
reindex_json.contains("error") before calling get or use
reindex_json.value("error", "unknown error") (or similar default) to populate
failed_item["error"]; keep the rest of the logic (failed.push_back(failed_item),
rebuilt_count handling) unchanged and reference reindex_json, failed_item, and
failed in your change.

@renecannao
Copy link

@gemini-code-assist review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: full-text search (FTS) capabilities for the MCP query server, using SQLite's FTS5 extension. The changes include new C++ classes for managing FTS, integration into the existing tool handling framework, configuration options, extensive documentation, and new test scripts. Overall, this is a well-structured and comprehensive addition. My review focuses on improving security, correctness, and performance in the core implementation, as well as enhancing the clarity of the new user documentation.

Comment on lines +366 to +370
if (row.contains(primary_key_lower) && !row[primary_key_lower].is_null()) {
pk_value = row[primary_key_lower].get<std::string>();
} else {
pk_value = std::to_string(row_count);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When a primary key value is null or not found for a row during indexing, a fallback key is generated using std::to_string(row_count). This is problematic because this generated key does not correspond to any real key in the source MySQL table, making it impossible to retrieve the original row using the FTS result. This can lead to incorrect data retrieval and confusion. It would be safer to skip indexing rows that have a null or missing primary key and log a warning.

Suggested change
if (row.contains(primary_key_lower) && !row[primary_key_lower].is_null()) {
pk_value = row[primary_key_lower].get<std::string>();
} else {
pk_value = std::to_string(row_count);
}
if (row.contains(primary_key_lower) && !row[primary_key_lower].is_null()) {
pk_value = row[primary_key_lower].get<std::string>();
} else {
proxy_warning("FTS: Skipping row in %s.%s with null or missing primary key '%s'.", schema.c_str(), table.c_str(), primary_key.c_str());
continue; // Skip this row
}

Comment on lines +373 to +384
std::ostringstream insert_sql;
insert_sql << "INSERT INTO " << sanitized_data
<< " (schema_name, table_name, primary_key_value, content, metadata) "
<< "VALUES ('" << escape_sql(schema) << "', '"
<< escape_sql(table) << "', '"
<< escape_sql(pk_value) << "', '"
<< escape_sql(content.str()) << "', '"
<< escape_sql(metadata.dump()) << "');";

if (!db->execute(insert_sql.str().c_str())) {
proxy_error("Failed to insert row into FTS: %s\n", insert_sql.str().c_str());
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SQL INSERT statement is constructed using string concatenation with manual escaping via escape_sql. This approach is vulnerable to SQL injection if the escaping function is not exhaustive. To ensure security and robustness, it is strongly recommended to use prepared statements with parameter binding (e.g., sqlite3_prepare_v2, sqlite3_bind_text) instead of manual string construction.

<< "d.metadata "
<< "FROM " << fts_table << " "
<< "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid "
<< "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The FTS5 MATCH clause is constructed by concatenating a user-provided query string after only escaping single quotes. The FTS5 query syntax is complex, and this simple escaping may not be sufficient to prevent all forms of injection or syntax errors, especially with untrusted input. While MATCH does not support bound parameters for the query string, the input should be more rigorously validated or sanitized to prevent abuse of the FTS5 query syntax (e.g., injection of column filters or function calls).

{
"schema": "sales",
"table": "orders",
"columns": ["order_id","customer_name","notes"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example response for fts_list_indexes has an incorrectly formatted columns field. It is shown as a single string ["order_id","customer_name","notes"] instead of a proper JSON array of strings. This could be misleading for users trying to parse the API response.

Suggested change
"columns": ["order_id","customer_name","notes"],
"columns": ["order_id", "customer_name", "notes"],

Comment on lines +472 to +542
// Collect all results from each index
json all_results = json::array();
int total_matches = 0;

for (std::vector<SQLite3_row*>::iterator it = indexes_result->rows.begin();
it != indexes_result->rows.end(); ++it) {
SQLite3_row* row = *it;
const char* idx_schema = row->fields[0];
const char* idx_table = row->fields[1];

if (!idx_schema || !idx_table) continue;

std::string data_table = get_data_table_name(idx_schema, idx_table);
std::string fts_table = get_fts_table_name(idx_schema, idx_table);
std::string sanitized_data = data_table;

// Search query for this index (use table name for MATCH/bm25)
std::ostringstream search_sql;
search_sql << "SELECT d.schema_name, d.table_name, d.primary_key_value, "
<< "snippet(" << fts_table << ", 0, '<mark>', '</mark>', '...', 30) AS snippet, "
<< "d.metadata "
<< "FROM " << fts_table << " "
<< "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid "
<< "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' "
<< "ORDER BY bm25(" << fts_table << ") ASC "
<< "LIMIT " << limit;

SQLite3_result* idx_resultset = NULL;
error = NULL;
cols = 0;
affected = 0;

idx_resultset = db->execute_statement(search_sql.str().c_str(), &error, &cols, &affected);

if (error) {
proxy_error("FTS search error on %s.%s: %s\n", idx_schema, idx_table, error);
(*proxy_sqlite3_free)(error);
}

if (idx_resultset) {
for (std::vector<SQLite3_row*>::iterator row_it = idx_resultset->rows.begin();
row_it != idx_resultset->rows.end(); ++row_it) {
SQLite3_row* res_row = *row_it;

json match;
match["schema"] = res_row->fields[0] ? res_row->fields[0] : "";
match["table"] = res_row->fields[1] ? res_row->fields[1] : "";
match["primary_key_value"] = res_row->fields[2] ? res_row->fields[2] : "";

match["snippet"] = res_row->fields[3] ? res_row->fields[3] : "";

// Parse metadata JSON
try {
if (res_row->fields[4]) {
match["metadata"] = json::parse(res_row->fields[4]);
} else {
match["metadata"] = json::object();
}
} catch (const json::exception& e) {
match["metadata"] = res_row->fields[4] ? res_row->fields[4] : "";
}

all_results.push_back(match);
total_matches++;
}
delete idx_resultset;
}
}

delete indexes_result;
db->rdunlock();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The search function fetches results from all matching indexes into an in-memory all_results vector and then applies pagination. This can lead to excessive memory consumption if there are many matches across numerous indexes, as the initial LIMIT is applied per index. A more scalable approach would be to perform pagination at the database level, for instance by combining results with UNION ALL and applying a single ORDER BY and LIMIT/OFFSET to the final result set.

Comment on lines +117 to +138
bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
pthread_mutex_lock(&fts_lock);

if (fts) {
delete fts;
fts = NULL;
}

if (!path.empty()) {
fts = new MySQL_FTS(path);
if (fts->init()) {
proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
delete fts;
fts = NULL;
pthread_mutex_unlock(&fts_lock);
return false;
}
}

pthread_mutex_unlock(&fts_lock);
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are two functions for updating the FTS path: reset_fts_path and reinit_fts (lines 1142-1178). reset_fts_path is called by the configuration handler but lacks a check for the directory's existence. reinit_fts performs this check but appears to be unused. To improve robustness and eliminate redundant code, the directory existence check from reinit_fts should be integrated into reset_fts_path, and the unused reinit_fts function should be removed.

@renecannao renecannao changed the base branch from v3.1-vec to v3.1-vec_21 January 21, 2026 13:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds full-text search (FTS) capabilities to the MCP query server using SQLite's FTS5 extension with BM25 ranking. The implementation allows AI agents to quickly search MySQL table data through an external SQLite index without requiring native MySQL FTS support.

Changes:

  • Added 6 new FTS tools: fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, and fts_rebuild_all
  • Implemented MySQL_FTS class for managing SQLite FTS5 indexes with BM25 ranking
  • Added thread-safe FTS operations with mutex protection and runtime path configuration
  • Included comprehensive test scripts and detailed user documentation

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
scripts/mcp/test_mcp_fts_detailed.sh Detailed FTS validation script testing all tools, runtime config changes, and edge cases
scripts/mcp/test_mcp_fts.sh Comprehensive test suite with 21 test cases covering all FTS functionality
lib/Query_Tool_Handler.cpp Registers 6 FTS tools in MCP protocol and routes calls to MySQL_Tool_Handler
lib/ProxySQL_MCP_Server.cpp Passes FTS path from configuration to MySQL_Tool_Handler constructor
lib/MySQL_Tool_Handler.cpp Wraps MySQL_FTS class with thread-safe mutex protection and runtime path reset
lib/MySQL_FTS.cpp Core FTS implementation using SQLite FTS5 with indexing, search, and management
lib/Makefile Adds MySQL_FTS.oo to build targets
lib/MCP_Thread.cpp Handles mcp_fts_path configuration variable with runtime reload support
include/MySQL_Tool_Handler.h Declares FTS integration methods and FTS pointer member
include/MySQL_FTS.h Defines MySQL_FTS class interface for full-text search operations
include/MCP_Thread.h Adds mcp_fts_path to configuration variables structure
doc/MCP/FTS_USER_GUIDE.md Comprehensive 854-line user guide with architecture, API reference, and examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +101 to +111
std::string MySQL_FTS::escape_sql(const std::string& str) {
std::string escaped;
for (size_t i = 0; i < str.length(); i++) {
if (str[i] == '\'') {
escaped += "''";
} else {
escaped += str[i];
}
}
return escaped;
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection vulnerability: The escape_sql function only escapes single quotes, but table/column names are being directly interpolated into SQL statements without proper quoting or validation. While sanitize_name is used for some identifiers, the FTS search query at line 495 uses escape_sql(query) within a MATCH clause which could allow FTS query syntax injection. Consider using parameterized queries where possible, or implementing more robust escaping/validation for user-controlled inputs.

Copilot uses AI. Check for mistakes.
Comment on lines +1145 to +1154
// Check if directory exists (SQLite can't create directories)
std::string::size_type last_slash = fts_path.find_last_of("/");
if (last_slash != std::string::npos && last_slash > 0) {
std::string dir = fts_path.substr(0, last_slash);
struct stat st;
if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) {
proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n",
fts_path.c_str(), dir.c_str());
return -1;
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling: If stat() fails, the code returns -1, but if the directory exists but is not writable, this check won't catch it. Consider adding a write permission check or attempting to create a test file in the directory to verify write access before attempting to initialize the FTS database.

Copilot uses AI. Check for mistakes.

if (new_fts->init() != 0) {
proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str());
delete new_fts;
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential resource leak: If new_fts->init() fails at line 1164, the function deletes new_fts and returns -1 without keeping the old FTS handler. However, there's a comment "Return error WITHOUT closing old FTS" which is correct. But if the old fts is NULL (first initialization), the system will remain in a state where FTS is not initialized. Consider documenting this behavior more clearly or handling the null case explicitly.

Suggested change
delete new_fts;
delete new_fts;
// If this was the first initialization attempt (no previous FTS handler),
// the system remains without an initialized FTS instance.
if (!fts) {
proxy_error("MySQL_Tool_Handler: No previous FTS handler exists; FTS remains uninitialized after failure\n");
}

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +296
json cols_json = json::parse(columns);
if (!cols_json.is_array()) {
result["error"] = "columns must be a JSON array";
return result.dump();
}
} catch (const json::exception& e) {
result["error"] = std::string("Invalid JSON in columns: ") + e.what();
return result.dump();
}

// Check if index already exists
if (index_exists(schema, table)) {
result["error"] = "Index already exists for " + schema + "." + table + ". Use fts_reindex to update.";
return result.dump();
}

// Create index tables
if (create_index_tables(schema, table) != 0) {
result["error"] = "Failed to create index tables";
return result.dump();
}

// Parse columns and build query (ensure primary key is selected)
std::vector<std::string> indexed_cols;
std::vector<std::string> selected_cols;
std::unordered_set<std::string> seen;

try {
json cols_json = json::parse(columns);
if (!cols_json.is_array()) {
result["error"] = "columns must be a JSON array";
return result.dump();
}
for (const auto& col : cols_json) {
std::string col_name = col.get<std::string>();
std::string col_lower = col_name;
std::transform(col_lower.begin(), col_lower.end(), col_lower.begin(), ::tolower);
indexed_cols.push_back(col_lower);
if (seen.insert(col_lower).second) {
selected_cols.push_back(col_name);
}
}
} catch (const json::exception& e) {
result["error"] = std::string("Failed to parse columns: ") + e.what();
return result.dump();
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The columns parameter is parsed twice in the index_table method: once at lines 251-259 for validation, and again at lines 278-296 for actual processing. This is inefficient and could lead to inconsistencies if the JSON parsing behaves differently between the two calls. Consider parsing once and reusing the parsed result.

Copilot uses AI. Check for mistakes.
test_fts_search_special_chars() {
# Create a document with special characters
mysql_exec "INSERT INTO ${TEST_SCHEMA}.${TEST_TABLE} (title, content, category, priority) \
VALUES ('Special Chars', 'Test with @ # $ % ^ & * ( ) - _ = + [ ] { } | \\ : ; \" \" < > ? / ~', 'test', 'normal');" 2>/dev/null || true
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test at line 921, special characters including backslashes are inserted into the database. The backslash should be escaped properly in the bash script string. The current approach might not correctly insert all special characters as intended. Consider using a here-document or different quoting mechanism for the SQL statement.

Suggested change
VALUES ('Special Chars', 'Test with @ # $ % ^ & * ( ) - _ = + [ ] { } | \\ : ; \" \" < > ? / ~', 'test', 'normal');" 2>/dev/null || true
VALUES ('Special Chars', 'Test with @ # $ % ^ & * ( ) - _ = + [ ] { } | \\\\ : ; \" \" < > ? / ~', 'test', 'normal');" 2>/dev/null || true

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +138
bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
pthread_mutex_lock(&fts_lock);

if (fts) {
delete fts;
fts = NULL;
}

if (!path.empty()) {
fts = new MySQL_FTS(path);
if (fts->init()) {
proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
delete fts;
fts = NULL;
pthread_mutex_unlock(&fts_lock);
return false;
}
}

pthread_mutex_unlock(&fts_lock);
return true;
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reset_fts_path method acquires fts_lock, but reinit_fts does not. If both methods can be called from different contexts, there's a potential for inconsistent locking patterns. Consider either making reinit_fts acquire the lock, or documenting that it must only be called when the caller already holds the lock, or consolidating the logic to avoid duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +473 to +554
json all_results = json::array();
int total_matches = 0;

for (std::vector<SQLite3_row*>::iterator it = indexes_result->rows.begin();
it != indexes_result->rows.end(); ++it) {
SQLite3_row* row = *it;
const char* idx_schema = row->fields[0];
const char* idx_table = row->fields[1];

if (!idx_schema || !idx_table) continue;

std::string data_table = get_data_table_name(idx_schema, idx_table);
std::string fts_table = get_fts_table_name(idx_schema, idx_table);
std::string sanitized_data = data_table;

// Search query for this index (use table name for MATCH/bm25)
std::ostringstream search_sql;
search_sql << "SELECT d.schema_name, d.table_name, d.primary_key_value, "
<< "snippet(" << fts_table << ", 0, '<mark>', '</mark>', '...', 30) AS snippet, "
<< "d.metadata "
<< "FROM " << fts_table << " "
<< "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid "
<< "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' "
<< "ORDER BY bm25(" << fts_table << ") ASC "
<< "LIMIT " << limit;

SQLite3_result* idx_resultset = NULL;
error = NULL;
cols = 0;
affected = 0;

idx_resultset = db->execute_statement(search_sql.str().c_str(), &error, &cols, &affected);

if (error) {
proxy_error("FTS search error on %s.%s: %s\n", idx_schema, idx_table, error);
(*proxy_sqlite3_free)(error);
}

if (idx_resultset) {
for (std::vector<SQLite3_row*>::iterator row_it = idx_resultset->rows.begin();
row_it != idx_resultset->rows.end(); ++row_it) {
SQLite3_row* res_row = *row_it;

json match;
match["schema"] = res_row->fields[0] ? res_row->fields[0] : "";
match["table"] = res_row->fields[1] ? res_row->fields[1] : "";
match["primary_key_value"] = res_row->fields[2] ? res_row->fields[2] : "";

match["snippet"] = res_row->fields[3] ? res_row->fields[3] : "";

// Parse metadata JSON
try {
if (res_row->fields[4]) {
match["metadata"] = json::parse(res_row->fields[4]);
} else {
match["metadata"] = json::object();
}
} catch (const json::exception& e) {
match["metadata"] = res_row->fields[4] ? res_row->fields[4] : "";
}

all_results.push_back(match);
total_matches++;
}
delete idx_resultset;
}
}

delete indexes_result;
db->rdunlock();

// Apply pagination to collected results
int total_size = (int)all_results.size();
int start_idx = offset;
if (start_idx >= total_size) start_idx = total_size;
int end_idx = start_idx + limit;
if (end_idx > total_size) end_idx = total_size;

json paginated_results = json::array();
for (int i = start_idx; i < end_idx; i++) {
paginated_results.push_back(all_results[i]);
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search method performs iteration over potentially large result sets without limiting memory usage. If a search matches millions of rows across multiple indexes, the all_results array at line 473 could grow very large before pagination is applied at line 544. Consider applying the limit during collection rather than after, or implementing streaming/chunked result processing.

Copilot uses AI. Check for mistakes.
Comment on lines +426 to +432
std::string MySQL_FTS::search(
const std::string& query,
const std::string& schema,
const std::string& table,
int limit,
int offset
) {
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation: The limit parameter in the search methods could potentially be negative or zero, which might cause unexpected behavior. Consider adding validation to ensure limit is positive (e.g., if (limit <= 0) limit = 100;) and similarly for offset to ensure it's non-negative.

Copilot uses AI. Check for mistakes.
mysql_exec() {
local sql="$1"
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \
-e "${sql}" 2>/dev/null
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test script uses set -e which causes the script to exit on any error. However, the mysql_exec function at line 279 redirects stderr to /dev/null, which might hide important error messages. Consider allowing errors to be visible during test execution, or at least logging them for debugging purposes.

Suggested change
-e "${sql}" 2>/dev/null
-e "${sql}"

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +28
char path_buf[db_path.size() + 1];
strcpy(path_buf, db_path.c_str());
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of variable-length array (VLA) char path_buf[db_path.size() + 1] is not standard in C++ and is a C99 feature that may not be supported by all compilers. This can lead to portability issues. Consider using std::string directly with the SQLite3DB API, or dynamically allocate the buffer with new[] and delete[], or use std::vector<char>.

Copilot uses AI. Check for mistakes.
@renecannao renecannao merged commit a831670 into ProxySQL:v3.1-vec_21 Jan 21, 2026
7 checks passed
renecannao added a commit that referenced this pull request Jan 21, 2026
…leanup

Security fixes:
- Add escape_identifier() helper for proper SQLite identifier escaping
- Replace sanitize_name() with allowlist validation (ASCII letters, digits, underscore only)
- Fix MATCH clause FTS5 operator injection by wrapping query in double quotes
- Apply escape_identifier() to all DDL statements (CREATE, DROP, triggers)

Memory safety fixes:
- Replace VLA with std::vector in MySQL_FTS::init(), add delete on error path
- Fix memory leak: free error string before return in list_indexes()
- Fix reindex_json["error"] potential exception using .value() with default

Thread safety fixes:
- reinit_fts(): Add mutex lock around pointer swap
- reset_fts_path(): Move blocking init() outside lock, only swap pointer under lock

Code cleanup:
- Remove 7 debug fprintf statements from Query_Tool_Handler.cpp
- Remove unused #include <memory> from MySQL_FTS.h

Test script security fixes:
- Use MYSQL_PWD environment variable instead of -p"..." for password
- Add escape_sql() function and apply to INSERT statement
- Fix CURL_OPTS quoting: ${CURL_OPTS:+"${CURL_OPTS}"}
- Remove unused FTS_INDEX_NAME and SEARCH_QUERIES variables

Documentation fixes:
- Fix bare URL to markdown link format
- Add code block language identifiers (text, bash)
renecannao added a commit that referenced this pull request Jan 21, 2026
Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants