Add full-text search (FTS) tools to MCP query server#21
Add full-text search (FTS) tools to MCP query server#21renecannao merged 3 commits intoProxySQL:v3.1-vec_21from
Conversation
Implement BM25-ranked full-text search capability for MySQL/MariaDB tables using SQLite-based external FTS index. Changes: - Add MySQL_FTS class for managing SQLite FTS indexes - Add FTS tools: fts_index_table, fts_search, fts_reindex, fts_delete_index, fts_list_indexes, fts_rebuild_all - Add thread-safe FTS lifecycle management with fts_lock mutex - Add reset_fts_path() for runtime FTS database path configuration - Add comprehensive FTS test scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh)
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds SQLite-backed Full Text Search (FTS): new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/MCP
participant QTH as Query_Tool_Handler
participant MTH as MySQL_Tool_Handler
participant FTS as MySQL_FTS
participant DB as SQLite
rect rgba(200,200,255,0.5)
Client->>QTH: fts_index_table(schema, table, columns)
QTH->>MTH: fts_index_table(...) [acquire fts_lock]
MTH->>FTS: index_table(...)
FTS->>DB: write index metadata / store index
DB-->>FTS: OK
FTS-->>MTH: JSON result
MTH-->>QTH: return result
QTH-->>Client: JSON response
end
rect rgba(200,255,200,0.5)
Client->>QTH: fts_search(query, schema, table)
QTH->>MTH: fts_search(...) [acquire fts_lock]
MTH->>FTS: search(...)
FTS->>DB: SELECT using FTS (MATCH/snippet)
DB-->>FTS: rows + snippets
FTS-->>MTH: JSON result
MTH-->>QTH: return result
QTH-->>Client: JSON response
end
sequenceDiagram
participant Admin as Admin/Config
participant MCP as MCP_Threads_Handler
participant MTH as MySQL_Tool_Handler
participant OldFTS as MySQL_FTS (old)
participant NewFTS as MySQL_FTS (new)
rect rgba(255,230,200,0.5)
Admin->>MCP: set_variable("fts_path", new_path)
MCP->>MTH: reset_fts_path(new_path)
MTH->>MTH: validate directory
MTH->>NewFTS: create & initialize
NewFTS-->>MTH: initialized
MTH->>MTH: acquire fts_lock
MTH->>OldFTS: delete old instance (if any)
MTH->>MTH: swap fts pointer to NewFTS
MTH->>MTH: release fts_lock
MTH-->>MCP: success
MCP-->>Admin: variable updated
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@lib/MCP_Thread.cpp`:
- Around line 333-347: Currently the code updates variables.mcp_fts_path (via
strdup/free) before calling mysql_tool_handler->reset_fts_path(value), which can
leave the stored config out of sync if reset_fts_path fails; change the flow so
you attempt the runtime update first (call
mysql_tool_handler->reset_fts_path(value) while leaving variables.mcp_fts_path
unchanged), and only on success free() the old variables.mcp_fts_path and assign
strdup(value), returning 0; if reset_fts_path fails, do not modify
variables.mcp_fts_path and return -1. Reference symbols: variables.mcp_fts_path,
mysql_tool_handler, reset_fts_path, strdup, free.
In `@lib/MySQL_Tool_Handler.cpp`:
- Around line 117-137: reset_fts_path deletes the existing FTS before validating
the new path; instead allocate a temporary MySQL_FTS* (e.g., new_fts) with the
provided path, call new_fts->init(), and if init() fails call proxy_error,
delete new_fts and return false while leaving the original fts intact; only on
successful init swap under the existing pthread_mutex_lock (&fts_lock) by
deleting the old fts, assigning fts = new_fts, and then unlock and return true;
also preserve current behavior when path is empty (delete existing fts) and
ensure no double-free by only deleting the old instance after new_fts
successfully initialized.
- Around line 1142-1178: reinit_fts currently mutates the shared
pointer/variable fts without acquiring the FTS mutex; to fix, perform the
expensive new MySQL_FTS(fts_path) and new_fts->init() before taking the lock,
then acquire fts_lock (the same mutex used by other FTS calls) and inside the
critical section delete the old fts and assign fts = new_fts, then release the
lock; use the same locking primitive as the rest of the code (e.g.,
std::lock_guard or equivalent on fts_lock) to ensure no race with in-flight
operations while avoiding holding the lock during initialization.
In `@scripts/mcp/test_mcp_fts_detailed.sh`:
- Around line 71-74: The mysql_exec function exposes MYSQL_PASSWORD on the
process command line via the -p option; change it to avoid inline password usage
by using a safer method such as setting MYSQL_PWD in the environment before
invoking mysql or by using a MySQL option file, e.g., export
MYSQL_PWD="${MYSQL_PASSWORD}" && mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u
"${MYSQL_USER}" -e "${sql}" (and unset MYSQL_PWD afterward); update the
mysql_exec function and apply the same change to the similar invocation in
test_mcp_fts.sh to remove -p"${MYSQL_PASSWORD}" and prevent password leakage to
ps listings.
- Around line 251-260: The cleanup block currently unconditionally attempts to
delete fts_test.orders although it was already deleted earlier; update the
cleanup logic around the calls to tool_call "fts_delete_index" (and the
subsequent extract_tool_result) to either first check existence (e.g., call a
check/existence tool or API) and only call fts_delete_index for existing
indexes, or relax the jq assertion to accept either a success result or the
expected "index not found" error response (e.g., accept .success == true OR
.error indicates non-existent index) so the second delete does not cause the
test to fail.
In `@scripts/mcp/test_mcp_fts.sh`:
- Around line 992-993: The mysql invocations in test_fts_custom_database_path
(e.g., the command populating current_path and other mysql calls) expose the
password on the command line; replace direct use of -p"${MYSQL_PASSWORD}" with a
secure approach such as exporting MYSQL_PWD="${MYSQL_PASSWORD}" for the command
(e.g., MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" ...) or implement
a small helper function (e.g., safe_mysql()) that injects MYSQL_PWD into the
environment and runs mysql, then update all occurrences (current_path assignment
and the other mysql calls referenced) to use that helper to avoid password
exposure. Ensure the helper preserves -s -N and 2>/dev/null flags used in
existing calls.
- Around line 279-283: The mysql_exec function currently exposes the password on
the command line via -p"${MYSQL_PASSWORD}"; update mysql_exec to avoid that by
passing the password through the MYSQL_PWD environment variable instead (e.g.,
invoke mysql with MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P
"${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}" 2>/dev/null) so the password is
not visible in process listings; keep the function name mysql_exec and the same
parameters but remove the inline -p argument.
- Around line 341-343: The INSERT builds SQL with unescaped variables (title,
doc, category, priority) which allows quotes to break the statement; before
calling mysql_exec in the loop, sanitize each variable by replacing every
single-quote (') with two single-quotes ('') so the SQL string is safe (e.g.,
transform ${doc} -> escaped_doc), then use those escaped variables in the
INSERT; update the code around the mysql_exec call and reference the mysql_exec
invocation and the TEST_SCHEMA/TEST_TABLE variables when making the change.
🧹 Nitpick comments (7)
lib/Query_Tool_Handler.cpp (2)
278-300: Remove debug fprintf statements before merging.These debug statements will pollute stderr in production. Convert to
proxy_debug()calls for consistency with the rest of the codebase, or remove entirely.♻️ Suggested fix
static std::string get_json_string(const json& j, const std::string& key, const std::string& default_val = "") { - fprintf(stderr, "DEBUG: get_json_string key=%s, default='%s'\n", key.c_str(), default_val.c_str()); if (j.contains(key)) { const json& val = j[key]; - fprintf(stderr, "DEBUG: key exists, is_null=%d, is_string=%d\n", val.is_null(), val.is_string()); if (!val.is_null()) { if (val.is_string()) { std::string result = val.get<std::string>(); - fprintf(stderr, "DEBUG: returning string: '%s'\n", result.c_str()); return result; } else { - fprintf(stderr, "DEBUG: value is not a string, trying dump\n"); std::string result = val.dump(); - fprintf(stderr, "DEBUG: returning dumped: '%s'\n", result.c_str()); return result; } } } - fprintf(stderr, "DEBUG: returning default: '%s'\n", default_val.c_str()); return default_val; }
310-312: Debug fprintf in execute_tool should also be removed.Same issue as the helper function - this debug statement will pollute stderr.
♻️ Suggested fix
json Query_Tool_Handler::execute_tool(const std::string& tool_name, const json& arguments) { - fprintf(stderr, "DEBUG: execute_tool tool_name=%s, arguments=%s\n", tool_name.c_str(), arguments.dump().c_str()); - if (!mysql_handler) { return create_error_response("MySQL handler not initialized"); }scripts/mcp/test_mcp_fts_detailed.sh (2)
101-106: QuoteCURL_OPTSto prevent word splitting issues.When
CURL_OPTSis empty, the unquoted expansion can cause unexpected behavior. Use an array or conditional inclusion.♻️ Suggested fix
mcp_request() { local payload="$1" - curl ${CURL_OPTS} -s -X POST "${MCP_ENDPOINT}" \ + curl ${CURL_OPTS:+"${CURL_OPTS}"} -s -X POST "${MCP_ENDPOINT}" \ -H "Content-Type: application/json" \ -d "${payload}" }Or use an array approach for cleaner handling:
if [ "${USE_SSL}" = "true" ]; then CURL_OPTS=(-k) else CURL_OPTS=() fi # Then use: curl "${CURL_OPTS[@]}" -s -X POST ...
162-162: Columns argument passed as JSON string literal may cause issues.The
columnsargument is passed as a raw JSON array string'["name","email","created_at"]'which is then interpolated into another JSON string. This works but is fragile. Consider validating the JSON structure.scripts/mcp/test_mcp_fts.sh (3)
58-58: Unused variableFTS_INDEX_NAME.This variable is defined but never used. Either remove it or use it where
${TEST_SCHEMA}.${TEST_TABLE}is repeated.-FTS_INDEX_NAME="${TEST_SCHEMA}.${TEST_TABLE}"
74-81: Unused associative arraySEARCH_QUERIES.The
SEARCH_QUERIESarray is defined but the tests use hardcoded query strings instead. Remove or utilize this array for consistency.
356-359: Unused variabledelete_response- consider checking for errors.The response from
fts_delete_indexis captured but not validated. While cleanup failures may be acceptable, logging the error could help debugging.♻️ Suggested improvement
# Drop FTS index if exists local delete_response delete_response=$(fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}") + log_verbose "FTS index delete response: ${delete_response}"
| if (!strcmp(name, "fts_path")) { | ||
| if (variables.mcp_fts_path) | ||
| free(variables.mcp_fts_path); | ||
| variables.mcp_fts_path = strdup(value); | ||
| // Apply at runtime by resetting FTS in the existing handler | ||
| if (mysql_tool_handler) { | ||
| proxy_info("MCP: Applying new fts_path at runtime: %s\n", value); | ||
| if (!mysql_tool_handler->reset_fts_path(value)) { | ||
| proxy_error("Failed to reset FTS path at runtime\n"); | ||
| return -1; | ||
| } | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Avoid updating mcp_fts_path when runtime reset fails.
The variable is updated before reset_fts_path succeeds, which can leave config out of sync with the live FTS handler. Swap the stored value only after a successful reset (or roll back on failure).
🔧 Suggested fix (update after successful reset)
if (!strcmp(name, "fts_path")) {
- if (variables.mcp_fts_path)
- free(variables.mcp_fts_path);
- variables.mcp_fts_path = strdup(value);
- // Apply at runtime by resetting FTS in the existing handler
- if (mysql_tool_handler) {
- proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
- if (!mysql_tool_handler->reset_fts_path(value)) {
- proxy_error("Failed to reset FTS path at runtime\n");
- return -1;
- }
- }
-
- return 0;
+ char* new_path = strdup(value);
+ if (!new_path)
+ return -1;
+ // Apply at runtime by resetting FTS in the existing handler
+ if (mysql_tool_handler) {
+ proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
+ if (!mysql_tool_handler->reset_fts_path(value)) {
+ proxy_error("Failed to reset FTS path at runtime\n");
+ free(new_path);
+ return -1;
+ }
+ }
+ if (variables.mcp_fts_path)
+ free(variables.mcp_fts_path);
+ variables.mcp_fts_path = new_path;
+ return 0;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!strcmp(name, "fts_path")) { | |
| if (variables.mcp_fts_path) | |
| free(variables.mcp_fts_path); | |
| variables.mcp_fts_path = strdup(value); | |
| // Apply at runtime by resetting FTS in the existing handler | |
| if (mysql_tool_handler) { | |
| proxy_info("MCP: Applying new fts_path at runtime: %s\n", value); | |
| if (!mysql_tool_handler->reset_fts_path(value)) { | |
| proxy_error("Failed to reset FTS path at runtime\n"); | |
| return -1; | |
| } | |
| } | |
| return 0; | |
| } | |
| if (!strcmp(name, "fts_path")) { | |
| char* new_path = strdup(value); | |
| if (!new_path) | |
| return -1; | |
| // Apply at runtime by resetting FTS in the existing handler | |
| if (mysql_tool_handler) { | |
| proxy_info("MCP: Applying new fts_path at runtime: %s\n", value); | |
| if (!mysql_tool_handler->reset_fts_path(value)) { | |
| proxy_error("Failed to reset FTS path at runtime\n"); | |
| free(new_path); | |
| return -1; | |
| } | |
| } | |
| if (variables.mcp_fts_path) | |
| free(variables.mcp_fts_path); | |
| variables.mcp_fts_path = new_path; | |
| return 0; | |
| } |
🤖 Prompt for AI Agents
In `@lib/MCP_Thread.cpp` around lines 333 - 347, Currently the code updates
variables.mcp_fts_path (via strdup/free) before calling
mysql_tool_handler->reset_fts_path(value), which can leave the stored config out
of sync if reset_fts_path fails; change the flow so you attempt the runtime
update first (call mysql_tool_handler->reset_fts_path(value) while leaving
variables.mcp_fts_path unchanged), and only on success free() the old
variables.mcp_fts_path and assign strdup(value), returning 0; if reset_fts_path
fails, do not modify variables.mcp_fts_path and return -1. Reference symbols:
variables.mcp_fts_path, mysql_tool_handler, reset_fts_path, strdup, free.
| bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) { | ||
| pthread_mutex_lock(&fts_lock); | ||
|
|
||
| if (fts) { | ||
| delete fts; | ||
| fts = NULL; | ||
| } | ||
|
|
||
| if (!path.empty()) { | ||
| fts = new MySQL_FTS(path); | ||
| if (fts->init()) { | ||
| proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str()); | ||
| delete fts; | ||
| fts = NULL; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| pthread_mutex_unlock(&fts_lock); | ||
| return true; |
There was a problem hiding this comment.
Preserve existing FTS if the new path fails to init.
Currently the old FTS instance is deleted before the new one is validated, which can disable FTS on a bad path. Swap only after a successful init.
🔧 Suggested fix (safe swap)
bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
pthread_mutex_lock(&fts_lock);
- if (fts) {
- delete fts;
- fts = NULL;
- }
-
- if (!path.empty()) {
- fts = new MySQL_FTS(path);
- if (fts->init()) {
- proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
- delete fts;
- fts = NULL;
- pthread_mutex_unlock(&fts_lock);
- return false;
- }
- }
+ MySQL_FTS* new_fts = NULL;
+ if (!path.empty()) {
+ new_fts = new MySQL_FTS(path);
+ if (!new_fts || new_fts->init()) {
+ proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
+ delete new_fts;
+ pthread_mutex_unlock(&fts_lock);
+ return false;
+ }
+ }
+ if (fts) {
+ delete fts;
+ }
+ fts = new_fts;
pthread_mutex_unlock(&fts_lock);
return true;
}🤖 Prompt for AI Agents
In `@lib/MySQL_Tool_Handler.cpp` around lines 117 - 137, reset_fts_path deletes
the existing FTS before validating the new path; instead allocate a temporary
MySQL_FTS* (e.g., new_fts) with the provided path, call new_fts->init(), and if
init() fails call proxy_error, delete new_fts and return false while leaving the
original fts intact; only on successful init swap under the existing
pthread_mutex_lock (&fts_lock) by deleting the old fts, assigning fts = new_fts,
and then unlock and return true; also preserve current behavior when path is
empty (delete existing fts) and ensure no double-free by only deleting the old
instance after new_fts successfully initialized.
| int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) { | ||
| proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str()); | ||
|
|
||
| // Check if directory exists (SQLite can't create directories) | ||
| std::string::size_type last_slash = fts_path.find_last_of("/"); | ||
| if (last_slash != std::string::npos && last_slash > 0) { | ||
| std::string dir = fts_path.substr(0, last_slash); | ||
| struct stat st; | ||
| if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) { | ||
| proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n", | ||
| fts_path.c_str(), dir.c_str()); | ||
| return -1; | ||
| } | ||
| } | ||
|
|
||
| // First, test if we can open the new database | ||
| MySQL_FTS* new_fts = new MySQL_FTS(fts_path); | ||
| if (!new_fts) { | ||
| proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n"); | ||
| return -1; | ||
| } | ||
|
|
||
| if (new_fts->init() != 0) { | ||
| proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str()); | ||
| delete new_fts; | ||
| return -1; // Return error WITHOUT closing old FTS | ||
| } | ||
|
|
||
| // Success! Now close old and replace with new | ||
| if (fts) { | ||
| delete fts; | ||
| } | ||
| fts = new_fts; | ||
|
|
||
| proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str()); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Guard reinit_fts with fts_lock to prevent races.
This method mutates fts without acquiring the same mutex used by FTS calls, which can race with in-flight operations. Lock it consistently.
🔧 Suggested fix (lock around mutation)
int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) {
+ pthread_mutex_lock(&fts_lock);
proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str());
// Check if directory exists (SQLite can't create directories)
std::string::size_type last_slash = fts_path.find_last_of("/");
if (last_slash != std::string::npos && last_slash > 0) {
std::string dir = fts_path.substr(0, last_slash);
struct stat st;
if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) {
proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n",
fts_path.c_str(), dir.c_str());
+ pthread_mutex_unlock(&fts_lock);
return -1;
}
}
// First, test if we can open the new database
MySQL_FTS* new_fts = new MySQL_FTS(fts_path);
if (!new_fts) {
proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n");
+ pthread_mutex_unlock(&fts_lock);
return -1;
}
if (new_fts->init() != 0) {
proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str());
delete new_fts;
+ pthread_mutex_unlock(&fts_lock);
return -1; // Return error WITHOUT closing old FTS
}
// Success! Now close old and replace with new
if (fts) {
delete fts;
}
fts = new_fts;
proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str());
+ pthread_mutex_unlock(&fts_lock);
return 0;
}🤖 Prompt for AI Agents
In `@lib/MySQL_Tool_Handler.cpp` around lines 1142 - 1178, reinit_fts currently
mutates the shared pointer/variable fts without acquiring the FTS mutex; to fix,
perform the expensive new MySQL_FTS(fts_path) and new_fts->init() before taking
the lock, then acquire fts_lock (the same mutex used by other FTS calls) and
inside the critical section delete the old fts and assign fts = new_fts, then
release the lock; use the same locking primitive as the rest of the code (e.g.,
std::lock_guard or equivalent on fts_lock) to ensure no race with in-flight
operations while avoiding holding the lock during initialization.
| mysql_exec() { | ||
| local sql="$1" | ||
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}" | ||
| } |
There was a problem hiding this comment.
Password exposed in command line is visible in process list.
Using -p"${MYSQL_PASSWORD}" exposes the password in ps output. Consider using MYSQL_PWD environment variable or a MySQL option file instead.
🔒 Suggested fix using MYSQL_PWD
mysql_exec() {
local sql="$1"
- mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
+ MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"
}Note: The same pattern appears in test_mcp_fts.sh at line 279-282 and should be updated there as well.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mysql_exec() { | |
| local sql="$1" | |
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}" | |
| } | |
| mysql_exec() { | |
| local sql="$1" | |
| MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}" | |
| } |
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts_detailed.sh` around lines 71 - 74, The mysql_exec
function exposes MYSQL_PASSWORD on the process command line via the -p option;
change it to avoid inline password usage by using a safer method such as setting
MYSQL_PWD in the environment before invoking mysql or by using a MySQL option
file, e.g., export MYSQL_PWD="${MYSQL_PASSWORD}" && mysql -h "${MYSQL_HOST}" -P
"${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}" (and unset MYSQL_PWD afterward);
update the mysql_exec function and apply the same change to the similar
invocation in test_mcp_fts.sh to remove -p"${MYSQL_PASSWORD}" and prevent
password leakage to ps listings.
| if [ "${CLEANUP}" = "true" ]; then | ||
| log "Cleanup: deleting fts_test.customers and fts_test.orders indexes" | ||
| delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"customers"}') | ||
| delete_resp=$(extract_tool_result "${delete_resp}") | ||
| echo "${delete_resp}" | jq -e '.success == true' >/dev/null | ||
|
|
||
| delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"orders"}') | ||
| delete_resp=$(extract_tool_result "${delete_resp}") | ||
| echo "${delete_resp}" | jq -e '.success == true' >/dev/null | ||
| fi |
There was a problem hiding this comment.
Cleanup deletes fts_test.orders index that was already deleted at line 237.
The test at line 237 deletes fts_test.orders. When --cleanup is specified, line 257-259 attempts to delete it again. While the test verifies .success == true, the backend should return an error for non-existent index (as tested in test_mcp_fts.sh test 18). This assertion may fail.
🐛 Suggested fix - skip already-deleted index or expect failure
if [ "${CLEANUP}" = "true" ]; then
log "Cleanup: deleting fts_test.customers and fts_test.orders indexes"
delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"customers"}')
delete_resp=$(extract_tool_result "${delete_resp}")
echo "${delete_resp}" | jq -e '.success == true' >/dev/null
- delete_resp=$(tool_call "fts_delete_index" '{"schema":"fts_test","table":"orders"}')
- delete_resp=$(extract_tool_result "${delete_resp}")
- echo "${delete_resp}" | jq -e '.success == true' >/dev/null
+ # Note: fts_test.orders was already deleted earlier in the test
fi🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts_detailed.sh` around lines 251 - 260, The cleanup
block currently unconditionally attempts to delete fts_test.orders although it
was already deleted earlier; update the cleanup logic around the calls to
tool_call "fts_delete_index" (and the subsequent extract_tool_result) to either
first check existence (e.g., call a check/existence tool or API) and only call
fts_delete_index for existing indexes, or relax the jq assertion to accept
either a success result or the expected "index not found" error response (e.g.,
accept .success == true OR .error indicates non-existent index) so the second
delete does not cause the test to fail.
| mysql_exec() { | ||
| local sql="$1" | ||
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \ | ||
| -e "${sql}" 2>/dev/null | ||
| } |
There was a problem hiding this comment.
Same password exposure issue as the detailed test script.
🔒 Suggested fix using MYSQL_PWD
mysql_exec() {
local sql="$1"
- mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \
- -e "${sql}" 2>/dev/null
+ MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" \
+ -e "${sql}" 2>/dev/null
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mysql_exec() { | |
| local sql="$1" | |
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \ | |
| -e "${sql}" 2>/dev/null | |
| } | |
| mysql_exec() { | |
| local sql="$1" | |
| MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" \ | |
| -e "${sql}" 2>/dev/null | |
| } |
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 279 - 283, The mysql_exec function
currently exposes the password on the command line via -p"${MYSQL_PASSWORD}";
update mysql_exec to avoid that by passing the password through the MYSQL_PWD
environment variable instead (e.g., invoke mysql with
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u
"${MYSQL_USER}" -e "${sql}" 2>/dev/null) so the password is not visible in
process listings; keep the function name mysql_exec and the same parameters but
remove the inline -p argument.
| mysql_exec "INSERT INTO ${TEST_SCHEMA}.${TEST_TABLE} (title, content, category, priority) \ | ||
| VALUES ('${title}', '${doc}', '${category}', '${priority}');" 2>/dev/null || true | ||
| done |
There was a problem hiding this comment.
SQL injection vulnerability in INSERT statement.
The ${doc} variable contains user-defined test data with apostrophes (e.g., "Customer John Smith...") which will break the SQL or could enable injection. Use proper escaping or parameterized queries.
🐛 Suggested fix - escape single quotes
for doc_id in "${!TEST_DOCUMENTS[@]}"; do
local doc="${TEST_DOCUMENTS[$doc_id]}"
+ # Escape single quotes for SQL
+ doc="${doc//\'/\'\'}"
local title="Document ${doc_id}"
...🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 341 - 343, The INSERT builds SQL
with unescaped variables (title, doc, category, priority) which allows quotes to
break the statement; before calling mysql_exec in the loop, sanitize each
variable by replacing every single-quote (') with two single-quotes ('') so the
SQL string is safe (e.g., transform ${doc} -> escaped_doc), then use those
escaped variables in the INSERT; update the code around the mysql_exec call and
reference the mysql_exec invocation and the TEST_SCHEMA/TEST_TABLE variables
when making the change.
| current_path=$(mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \ | ||
| -e "SELECT @@mcp-fts_path" -s -N 2>/dev/null | tr -d '\r') |
There was a problem hiding this comment.
Password exposure in multiple mysql invocations in test_fts_custom_database_path.
Several direct mysql calls in this test function also expose the password via command line. Apply the same MYSQL_PWD fix or create a helper function.
Also applies to: 1005-1006, 1018-1019, 1029-1030, 1089-1092
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 992 - 993, The mysql invocations in
test_fts_custom_database_path (e.g., the command populating current_path and
other mysql calls) expose the password on the command line; replace direct use
of -p"${MYSQL_PASSWORD}" with a secure approach such as exporting
MYSQL_PWD="${MYSQL_PASSWORD}" for the command (e.g.,
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" ...) or implement a small
helper function (e.g., safe_mysql()) that injects MYSQL_PWD into the environment
and runs mysql, then update all occurrences (current_path assignment and the
other mysql calls referenced) to use that helper to avoid password exposure.
Ensure the helper preserves -s -N and 2>/dev/null flags used in existing calls.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@doc/MCP/FTS_USER_GUIDE.md`:
- Line 853: Replace the bare URL "https://github.com/ProxySQL/proxysql-vec" with
a labeled Markdown link to satisfy lint and improve rendering; for example
change the line to use a descriptive label like [ProxySQL proxysql-vec GitHub
repository](https://github.com/ProxySQL/proxysql-vec) so the link is not bare.
🧹 Nitpick comments (2)
doc/MCP/FTS_USER_GUIDE.md (2)
32-38: Add language identifiers to fenced code blocks. This is flagged by markdownlint and improves readability/syntax highlighting.Proposed fix
-``` +```text Traditional Query Flow: LLM Agent → Full Table Scan → Millions of Rows → Slow Response @@ -``` +```text @@ -``` +```text @@ -``` +```text @@ -``` +```text @@ -``` +```textAlso applies to: 46-87, 92-99, 575-577, 773-775, 779-781
1-1: Hyphenate compound adjective “full-text” where it modifies a noun. Keeps phrasing consistent and matches common style guides.Also applies to: 19-19
| ## Support | ||
|
|
||
| For issues, questions, or contributions: | ||
| - GitHub: https://github.com/ProxySQL/proxysql-vec |
There was a problem hiding this comment.
Avoid bare URLs in Markdown. Convert to a labeled link to satisfy lint and improve rendering.
Proposed fix
-- GitHub: https://github.com/ProxySQL/proxysql-vec
+- GitHub: [ProxySQL/proxysql-vec](https://github.com/ProxySQL/proxysql-vec)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - GitHub: https://github.com/ProxySQL/proxysql-vec | |
| - GitHub: [ProxySQL/proxysql-vec](https://github.com/ProxySQL/proxysql-vec) |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
853-853: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In `@doc/MCP/FTS_USER_GUIDE.md` at line 853, Replace the bare URL
"https://github.com/ProxySQL/proxysql-vec" with a labeled Markdown link to
satisfy lint and improve rendering; for example change the line to use a
descriptive label like [ProxySQL proxysql-vec GitHub
repository](https://github.com/ProxySQL/proxysql-vec) so the link is not bare.
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@lib/MySQL_FTS.cpp`:
- Around line 303-316: The query builder in MySQL_FTS.cpp currently appends
identifiers and the where_clause raw (symbols: mysql_query, selected_cols,
schema, table, where_clause), which risks SQL injection; implement an
identifier-escaping helper (e.g., escape_identifier) that doubles any backticks
and use it when writing selected_cols, schema and table into mysql_query, and
stop appending where_clause verbatim—either require the caller to supply a
pre-validated/parameterized filter, validate the clause against an allowed
pattern, or convert to a parameterized WHERE using bound parameters; update the
mysql_query construction to use escape_identifier(schema),
escape_identifier(table), and escaped selected_cols instead of raw values.
- Around line 582-585: The code sets the C-string pointer error and returns
without freeing it, causing a memory leak; in the block where result["error"] is
assigned (the same place where error is checked), call the SQLite free stub
exactly as used in search(): (*proxy_sqlite3_free)(error) before returning
result.dump(); locate the error pointer usage in MySQL_FTS.cpp (the error
variable and the block that builds result["error"]) and add the free call
immediately after constructing the error message (ensuring you do not free error
twice elsewhere).
- Around line 748-756: The code reads reindex_json["error"].get<std::string>()
which can throw if the "error" key is missing; update the failure branch (where
reindex_json is inspected and failed_item is built) to safely obtain the error
string — either check reindex_json.contains("error") before calling get or use
reindex_json.value("error", "unknown error") (or similar default) to populate
failed_item["error"]; keep the rest of the logic (failed.push_back(failed_item),
rebuilt_count handling) unchanged and reference reindex_json, failed_item, and
failed in your change.
- Around line 90-99: sanitize_name currently only replaces '.', '-', and space,
leaving other dangerous characters (backticks, quotes, semicolons, nulls) that
can enable SQL injection when used in CREATE/ALTER statements; change it to an
allowlist-based sanitizer that only permits ASCII letters, digits and
underscore, enforces the first character is a letter, enforces a reasonable max
length, rejects or canonicalizes empty/invalid results (e.g., return error/throw
or prepend a fixed safe prefix), and explicitly strip or reject any NUL bytes or
non-ASCII characters; update the sanitize_name implementation to iterate over
bytes of the input, build a new string containing only [A-Za-z0-9_], ensure the
first character is a letter (or prefix with "t_" if not), truncate to the max
length, and make callers treat an empty/invalid sanitized result as an error
rather than embedding it into SQL.
- Around line 30-33: The init path leaks the allocated SQLite3DB when db->open()
fails; in MySQL_FTS.cpp where you construct SQLite3DB* db and call db->open(),
ensure you free the allocation before returning -1 (either delete db on error or
switch the variable to a smart pointer like std::unique_ptr<SQLite3DB>), and
include the same error logging via proxy_error; update the failure branch around
db->open() to release/delete db (or use unique_ptr so it auto-frees) before
returning.
- Around line 24-37: MySQL_FTS::init currently creates a VLA for db_path and
calls new SQLite3DB() without handling allocation failure; replace the VLA with
a heap buffer using std::vector<char> (copy db_path into it and push a
terminating '\0') and pass path_buf.data() to db->open, and change the
allocation to use new (std::nothrow) SQLite3DB() (or wrap in try/catch) and
check the returned pointer (db) for nullptr before calling db->open; on failure
call proxy_error and return -1; keep the rest of the flow (init_schema)
unchanged and reference the SQLite3DB, db_path, db->open, proxy_error, and
init_schema symbols when making the edits.
- Around line 488-497: The MATCH clause currently interpolates user input via
escape_sql(query) into the constructed search_sql (see fts_table, sanitized_data
and the WHERE ... MATCH usage), which allows FTS5 operator injection; fix by
producing an FTS5-literal-safe value: double any embedded double-quote
characters in the user query and wrap the entire query in double quotes before
inserting into MATCH (i.e., transform query -> "\"" + replace(query, "\"",
"\"\"") + "\""), or better yet switch to a parameterized/bound MATCH parameter
instead of string interpolation; update the code that builds search_sql (the
snippet constructing search_sql and the use of escape_sql) to use the
quoted/doubled-quote form or bind parameters so MATCH receives a literal phrase.
🧹 Nitpick comments (5)
include/MySQL_FTS.h (1)
7-7: Unused include.
<memory>is included but no smart pointers (std::unique_ptr,std::shared_ptr) are used in this class. Consider removing it or switching to smart pointers forSQLite3DB* dbmember.lib/MySQL_FTS.cpp (4)
46-61: PRAGMA and index creation failures silently ignored.Lines 48-50 execute PRAGMA statements without checking return values. While these are typically optional optimizations, failures could indicate underlying issues. Similarly, lines 84-85 don't check if index creation succeeded.
332-392: Batch insert without transaction may be slow and non-atomic.Rows are inserted individually without an explicit transaction. SQLite auto-commits each statement by default (unless already in a transaction). For large tables, this will be significantly slower and won't be atomic.
Proposed fix - wrap in transaction
db->wrlock(); + db->execute("BEGIN TRANSACTION"); try { // ... existing insert loop ... // Update metadata // ... db->execute(metadata_sql.str().c_str()); + db->execute("COMMIT"); db->wrunlock(); // ... } catch (const std::exception& e) { + db->execute("ROLLBACK"); db->wrunlock(); // ... }
544-554: In-memory pagination is inefficient for large datasets.All matching results are fetched into
all_resultsbefore pagination is applied. For queries matching many rows across multiple indexes, this could consume significant memory. Consider pushingLIMITandOFFSETto individual index queries or implementing cursor-based pagination.
633-656: Delete operations not wrapped in transaction.If an error occurs between dropping triggers/tables and deleting metadata, the database could be left in an inconsistent state. Consider wrapping in a transaction.
| int MySQL_FTS::init() { | ||
| // Initialize database connection | ||
| db = new SQLite3DB(); | ||
| char path_buf[db_path.size() + 1]; | ||
| strcpy(path_buf, db_path.c_str()); | ||
| int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE); | ||
| if (rc != SQLITE_OK) { | ||
| proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc); | ||
| return -1; | ||
| } | ||
|
|
||
| // Initialize schema | ||
| return init_schema(); | ||
| } |
There was a problem hiding this comment.
VLA and potential null pointer dereference.
- Line 27 uses a Variable Length Array (VLA) which is a non-standard C++ extension. If
db_pathis very long, this could cause stack overflow. - Line 29 uses
db->open()without checking ifnew SQLite3DB()returned a valid pointer (though in practicenewthrows on failure in C++, unlessnothrowis used).
Suggested fix using std::vector or c_str() directly
int MySQL_FTS::init() {
// Initialize database connection
db = new SQLite3DB();
- char path_buf[db_path.size() + 1];
- strcpy(path_buf, db_path.c_str());
- int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
+ int rc = db->open(const_cast<char*>(db_path.c_str()), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
if (rc != SQLITE_OK) {
proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc);
+ delete db;
+ db = NULL;
return -1;
}If db->open() requires a mutable buffer, use std::vector<char>:
std::vector<char> path_buf(db_path.begin(), db_path.end());
path_buf.push_back('\0');
int rc = db->open(path_buf.data(), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);🧰 Tools
🪛 Cppcheck (2.19.0)
[warning] 30-30: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 24 - 37, MySQL_FTS::init currently creates a
VLA for db_path and calls new SQLite3DB() without handling allocation failure;
replace the VLA with a heap buffer using std::vector<char> (copy db_path into it
and push a terminating '\0') and pass path_buf.data() to db->open, and change
the allocation to use new (std::nothrow) SQLite3DB() (or wrap in try/catch) and
check the returned pointer (db) for nullptr before calling db->open; on failure
call proxy_error and return -1; keep the rest of the flow (init_schema)
unchanged and reference the SQLite3DB, db_path, db->open, proxy_error, and
init_schema symbols when making the edits.
| if (rc != SQLITE_OK) { | ||
| proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Memory leak on init failure.
If db->open() fails, the allocated SQLite3DB object is not deleted before returning -1, causing a memory leak.
Proposed fix
int rc = db->open(path_buf, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE);
if (rc != SQLITE_OK) {
proxy_error("Failed to open FTS database at %s: %d\n", db_path.c_str(), rc);
+ delete db;
+ db = NULL;
return -1;
}🧰 Tools
🪛 Cppcheck (2.19.0)
[warning] 30-30: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 30 - 33, The init path leaks the allocated
SQLite3DB when db->open() fails; in MySQL_FTS.cpp where you construct SQLite3DB*
db and call db->open(), ensure you free the allocation before returning -1
(either delete db on error or switch the variable to a smart pointer like
std::unique_ptr<SQLite3DB>), and include the same error logging via proxy_error;
update the failure branch around db->open() to release/delete db (or use
unique_ptr so it auto-frees) before returning.
| std::string MySQL_FTS::sanitize_name(const std::string& name) { | ||
| std::string sanitized = name; | ||
| // Replace dots and special characters with underscores | ||
| for (size_t i = 0; i < sanitized.length(); i++) { | ||
| if (sanitized[i] == '.' || sanitized[i] == '-' || sanitized[i] == ' ') { | ||
| sanitized[i] = '_'; | ||
| } | ||
| } | ||
| return sanitized; | ||
| } |
There was a problem hiding this comment.
Incomplete sanitization allows potential SQL injection in table names.
sanitize_name() only replaces ., -, and space characters. It does not handle SQL metacharacters like backticks, quotes, semicolons, or null bytes. Since the result is used directly in SQL statements (e.g., CREATE TABLE ... << sanitized_data), malicious schema/table names could inject SQL.
Proposed fix - use allowlist approach
std::string MySQL_FTS::sanitize_name(const std::string& name) {
- std::string sanitized = name;
- // Replace dots and special characters with underscores
- for (size_t i = 0; i < sanitized.length(); i++) {
- if (sanitized[i] == '.' || sanitized[i] == '-' || sanitized[i] == ' ') {
- sanitized[i] = '_';
+ std::string sanitized;
+ sanitized.reserve(name.length());
+ for (size_t i = 0; i < name.length(); i++) {
+ char c = name[i];
+ // Allow only alphanumeric and underscore
+ if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
+ (c >= '0' && c <= '9') || c == '_') {
+ sanitized += c;
+ } else {
+ sanitized += '_';
}
}
return sanitized;
}🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 90 - 99, sanitize_name currently only
replaces '.', '-', and space, leaving other dangerous characters (backticks,
quotes, semicolons, nulls) that can enable SQL injection when used in
CREATE/ALTER statements; change it to an allowlist-based sanitizer that only
permits ASCII letters, digits and underscore, enforces the first character is a
letter, enforces a reasonable max length, rejects or canonicalizes empty/invalid
results (e.g., return error/throw or prepend a fixed safe prefix), and
explicitly strip or reject any NUL bytes or non-ASCII characters; update the
sanitize_name implementation to iterate over bytes of the input, build a new
string containing only [A-Za-z0-9_], ensure the first character is a letter (or
prefix with "t_" if not), truncate to the max length, and make callers treat an
empty/invalid sanitized result as an error rather than embedding it into SQL.
| // Build MySQL query to fetch data | ||
| std::ostringstream mysql_query; | ||
| mysql_query << "SELECT "; | ||
| for (size_t i = 0; i < selected_cols.size(); i++) { | ||
| if (i > 0) mysql_query << ", "; | ||
| mysql_query << "`" << selected_cols[i] << "`"; | ||
| } | ||
|
|
||
| mysql_query << " FROM `" << schema << "`.`" << table << "`"; | ||
|
|
||
| if (!where_clause.empty()) { | ||
| mysql_query << " WHERE " << where_clause; | ||
| } | ||
|
|
There was a problem hiding this comment.
SQL injection risk in MySQL query construction.
- Line 311: Schema and table names are wrapped in backticks but backticks within the names are not escaped. A schema/table containing a backtick could break out.
- Line 314:
where_clauseis appended directly without any validation or sanitization, allowing arbitrary SQL injection.
Proposed fix
For backtick escaping in identifiers:
std::string escape_identifier(const std::string& name) {
std::string escaped;
for (char c : name) {
if (c == '`') escaped += "``";
else escaped += c;
}
return escaped;
}For where_clause, consider:
- Validating it contains only allowed SQL constructs, or
- Documenting that this is a privileged operation and the caller must ensure safety, or
- Using parameterized queries where possible
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 303 - 316, The query builder in MySQL_FTS.cpp
currently appends identifiers and the where_clause raw (symbols: mysql_query,
selected_cols, schema, table, where_clause), which risks SQL injection;
implement an identifier-escaping helper (e.g., escape_identifier) that doubles
any backticks and use it when writing selected_cols, schema and table into
mysql_query, and stop appending where_clause verbatim—either require the caller
to supply a pre-validated/parameterized filter, validate the clause against an
allowed pattern, or convert to a parameterized WHERE using bound parameters;
update the mysql_query construction to use escape_identifier(schema),
escape_identifier(table), and escaped selected_cols instead of raw values.
| // Search query for this index (use table name for MATCH/bm25) | ||
| std::ostringstream search_sql; | ||
| search_sql << "SELECT d.schema_name, d.table_name, d.primary_key_value, " | ||
| << "snippet(" << fts_table << ", 0, '<mark>', '</mark>', '...', 30) AS snippet, " | ||
| << "d.metadata " | ||
| << "FROM " << fts_table << " " | ||
| << "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid " | ||
| << "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' " | ||
| << "ORDER BY bm25(" << fts_table << ") ASC " | ||
| << "LIMIT " << limit; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the MySQL_FTS.cpp file
find . -name "MySQL_FTS.cpp" -type fRepository: ProxySQL/proxysql-vec
Length of output: 84
🏁 Script executed:
# Search for the escape_sql function to understand how it works
rg "escape_sql" -A 5 -B 2 | head -100Repository: ProxySQL/proxysql-vec
Length of output: 5391
🌐 Web query:
SQLite FTS5 MATCH query escaping requirements and security
💡 Result:
Short summary + practical guidance:
Key points
- The right-hand side of MATCH is an FTS5 query string that is parsed (supports boolean operators, NEAR, prefixes, column filters, phrases, etc.). User input can therefore be interpreted as query operators, not just a literal term. [1][2]
- FTS5 allows unquoted barewords (letters, digits, underscore, non-ASCII) and requires quoting (double quotes) for strings containing other characters (spaces, punctuation). Inside a quoted string double quotes are escaped by doubling them. [1]
- Because the MATCH string is parsed by FTS5, you must treat it like any other language input: untrusted input can change the meaning of the query (i.e., “injection” of operators). [1][3]
Escaping / literal-search rules
- To make an input treated as a single literal phrase, enclose it in double quotes and double any embedded double-quote characters. Example: input He said "hi" -> MATCH '"He said ""hi"""'. [1]
- If you need to treat each word literally (not as a phrase), quote each word separately: MATCH '"word1" "word2"'. [1]
- Bare punctuation/special characters cannot appear unquoted in a bareword; if they’re present you must quote the string. Also bear in mind future FTS5 versions might change bareword allowances, so quoting is safer for untrusted content. [1]
Security / best practices
- Prefer parameterized queries / prepared statements rather than concatenating SQL. You can bind the MATCH RHS as a parameter (e.g., WHERE ft MATCH ? or use the table-valued form ft(?)). Binding prevents SQL-level injection and is the recommended first defense. [1][2]
- Because MATCH has its own query language, binding alone does not change how FTS5 parses the bound string — it still can contain FTS operators. If you need to prevent the user from supplying operators (i.e., force a literal token/phrase search), sanitize by quoting the whole input as an FTS5 string (wrap in double quotes, escape internal double-quotes) before binding. Example: bind the value '"user input with ""quotes"""'. [1][3]
- For strict literal substring matching consider using the trigram tokenizer and LIKE/GLOB on that column (FTS5 trigram supports indexed LIKE/GLOB), or maintain a separate (escaped) column for exact matches. These approaches avoid FTS5 query parsing semantics. [1]
- Validate inputs and limit features exposed to untrusted callers (e.g., disallow OR/NEAR/column filters) by sanitizing or rejecting characters/patterns you don’t want allowed.
Minimal recommended workflow
- Decide whether callers may supply FTS operators. If yes, bind their string directly (prepared statement). If no:
- Escape by wrapping the user input in double quotes and doubling any existing double quotes, then bind that value to MATCH ? (or use ft(?)). This forces a phrase-literal interpretation. [1][3]
- For substring/complex needs, consider trigram tokenizer + LIKE/GLOB or explicit tokenization on input. [1]
References
- SQLite FTS5 documentation (query syntax, strings, quoting, table-valued form, trigram tokenizer). [1][2]
- Community note / examples on escaping MATCH terms (practical quoting advice). [3]
References used
[1] SQLite FTS5 documentation — Query syntax & strings/phrases/quoting.
[2] SQLite FTS5 documentation — table-valued function, MATCH parameter examples (bm25/rank examples).
[3] StackOverflow discussion showing practical quoting examples for MATCH.
🏁 Script executed:
# Read the search() function around lines 488-497 to get full context
sed -n '470,520p' lib/MySQL_FTS.cpp | cat -nRepository: ProxySQL/proxysql-vec
Length of output: 2292
🏁 Script executed:
# Check if there's any validation on the query parameter in the function signature or before line 488
sed -n '426,497p' lib/MySQL_FTS.cpp | cat -nRepository: ProxySQL/proxysql-vec
Length of output: 2858
FTS5 query syntax injection risk in MATCH clause.
Line 495 uses escape_sql(query) which only escapes single quotes, but FTS5 MATCH parses special operators and syntax (*, ", AND, OR, NOT, NEAR, etc.). Without FTS5-compliant escaping, untrusted user input can inject these operators to alter query behavior or trigger unintended search patterns.
For example:
- Input
test OR maliciouswould execute as a boolean OR instead of a literal phrase search - Input
test*would perform a prefix wildcard search
Use FTS5-compliant escaping: wrap the entire query in double quotes and double any embedded double-quote characters before the MATCH clause (e.g., "user input with ""quotes""" to force literal phrase matching). Alternatively, use parameterized queries with bound parameters to prevent the injection vector.
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 488 - 497, The MATCH clause currently
interpolates user input via escape_sql(query) into the constructed search_sql
(see fts_table, sanitized_data and the WHERE ... MATCH usage), which allows FTS5
operator injection; fix by producing an FTS5-literal-safe value: double any
embedded double-quote characters in the user query and wrap the entire query in
double quotes before inserting into MATCH (i.e., transform query -> "\"" +
replace(query, "\"", "\"\"") + "\""), or better yet switch to a
parameterized/bound MATCH parameter instead of string interpolation; update the
code that builds search_sql (the snippet constructing search_sql and the use of
escape_sql) to use the quoted/doubled-quote form or bind parameters so MATCH
receives a literal phrase.
| if (error) { | ||
| result["error"] = "Failed to list indexes: " + std::string(error); | ||
| return result.dump(); | ||
| } |
There was a problem hiding this comment.
Memory leak: error pointer not freed.
When error is set (line 582-583), it should be freed with (*proxy_sqlite3_free)(error) as done in the search() method (line 508).
Proposed fix
if (error) {
result["error"] = "Failed to list indexes: " + std::string(error);
+ (*proxy_sqlite3_free)(error);
return result.dump();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (error) { | |
| result["error"] = "Failed to list indexes: " + std::string(error); | |
| return result.dump(); | |
| } | |
| if (error) { | |
| result["error"] = "Failed to list indexes: " + std::string(error); | |
| (*proxy_sqlite3_free)(error); | |
| return result.dump(); | |
| } |
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 582 - 585, The code sets the C-string pointer
error and returns without freeing it, causing a memory leak; in the block where
result["error"] is assigned (the same place where error is checked), call the
SQLite free stub exactly as used in search(): (*proxy_sqlite3_free)(error)
before returning result.dump(); locate the error pointer usage in MySQL_FTS.cpp
(the error variable and the block that builds result["error"]) and add the free
call immediately after constructing the error message (ensuring you do not free
error twice elsewhere).
| if (reindex_json["success"].get<bool>()) { | ||
| rebuilt_count++; | ||
| } else { | ||
| json failed_item; | ||
| failed_item["schema"] = schema; | ||
| failed_item["table"] = table; | ||
| failed_item["error"] = reindex_json["error"].get<std::string>(); | ||
| failed.push_back(failed_item); | ||
| } |
There was a problem hiding this comment.
Potential exception if "error" key is missing.
Line 754 calls reindex_json["error"].get<std::string>() but the index_table method may return without setting an "error" key in some edge cases. Use .value() with a default or check for key existence.
Proposed fix
if (reindex_json["success"].get<bool>()) {
rebuilt_count++;
} else {
json failed_item;
failed_item["schema"] = schema;
failed_item["table"] = table;
- failed_item["error"] = reindex_json["error"].get<std::string>();
+ failed_item["error"] = reindex_json.value("error", "Unknown error");
failed.push_back(failed_item);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (reindex_json["success"].get<bool>()) { | |
| rebuilt_count++; | |
| } else { | |
| json failed_item; | |
| failed_item["schema"] = schema; | |
| failed_item["table"] = table; | |
| failed_item["error"] = reindex_json["error"].get<std::string>(); | |
| failed.push_back(failed_item); | |
| } | |
| if (reindex_json["success"].get<bool>()) { | |
| rebuilt_count++; | |
| } else { | |
| json failed_item; | |
| failed_item["schema"] = schema; | |
| failed_item["table"] = table; | |
| failed_item["error"] = reindex_json.value("error", "Unknown error"); | |
| failed.push_back(failed_item); | |
| } |
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 748 - 756, The code reads
reindex_json["error"].get<std::string>() which can throw if the "error" key is
missing; update the failure branch (where reindex_json is inspected and
failed_item is built) to safely obtain the error string — either check
reindex_json.contains("error") before calling get or use
reindex_json.value("error", "unknown error") (or similar default) to populate
failed_item["error"]; keep the rest of the logic (failed.push_back(failed_item),
rebuilt_count handling) unchanged and reference reindex_json, failed_item, and
failed in your change.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant new feature: full-text search (FTS) capabilities for the MCP query server, using SQLite's FTS5 extension. The changes include new C++ classes for managing FTS, integration into the existing tool handling framework, configuration options, extensive documentation, and new test scripts. Overall, this is a well-structured and comprehensive addition. My review focuses on improving security, correctness, and performance in the core implementation, as well as enhancing the clarity of the new user documentation.
| if (row.contains(primary_key_lower) && !row[primary_key_lower].is_null()) { | ||
| pk_value = row[primary_key_lower].get<std::string>(); | ||
| } else { | ||
| pk_value = std::to_string(row_count); | ||
| } |
There was a problem hiding this comment.
When a primary key value is null or not found for a row during indexing, a fallback key is generated using std::to_string(row_count). This is problematic because this generated key does not correspond to any real key in the source MySQL table, making it impossible to retrieve the original row using the FTS result. This can lead to incorrect data retrieval and confusion. It would be safer to skip indexing rows that have a null or missing primary key and log a warning.
| if (row.contains(primary_key_lower) && !row[primary_key_lower].is_null()) { | |
| pk_value = row[primary_key_lower].get<std::string>(); | |
| } else { | |
| pk_value = std::to_string(row_count); | |
| } | |
| if (row.contains(primary_key_lower) && !row[primary_key_lower].is_null()) { | |
| pk_value = row[primary_key_lower].get<std::string>(); | |
| } else { | |
| proxy_warning("FTS: Skipping row in %s.%s with null or missing primary key '%s'.", schema.c_str(), table.c_str(), primary_key.c_str()); | |
| continue; // Skip this row | |
| } |
| std::ostringstream insert_sql; | ||
| insert_sql << "INSERT INTO " << sanitized_data | ||
| << " (schema_name, table_name, primary_key_value, content, metadata) " | ||
| << "VALUES ('" << escape_sql(schema) << "', '" | ||
| << escape_sql(table) << "', '" | ||
| << escape_sql(pk_value) << "', '" | ||
| << escape_sql(content.str()) << "', '" | ||
| << escape_sql(metadata.dump()) << "');"; | ||
|
|
||
| if (!db->execute(insert_sql.str().c_str())) { | ||
| proxy_error("Failed to insert row into FTS: %s\n", insert_sql.str().c_str()); | ||
| } |
There was a problem hiding this comment.
The SQL INSERT statement is constructed using string concatenation with manual escaping via escape_sql. This approach is vulnerable to SQL injection if the escaping function is not exhaustive. To ensure security and robustness, it is strongly recommended to use prepared statements with parameter binding (e.g., sqlite3_prepare_v2, sqlite3_bind_text) instead of manual string construction.
| << "d.metadata " | ||
| << "FROM " << fts_table << " " | ||
| << "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid " | ||
| << "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' " |
There was a problem hiding this comment.
The FTS5 MATCH clause is constructed by concatenating a user-provided query string after only escaping single quotes. The FTS5 query syntax is complex, and this simple escaping may not be sufficient to prevent all forms of injection or syntax errors, especially with untrusted input. While MATCH does not support bound parameters for the query string, the input should be more rigorously validated or sanitized to prevent abuse of the FTS5 query syntax (e.g., injection of column filters or function calls).
| { | ||
| "schema": "sales", | ||
| "table": "orders", | ||
| "columns": ["order_id","customer_name","notes"], |
There was a problem hiding this comment.
The example response for fts_list_indexes has an incorrectly formatted columns field. It is shown as a single string ["order_id","customer_name","notes"] instead of a proper JSON array of strings. This could be misleading for users trying to parse the API response.
| "columns": ["order_id","customer_name","notes"], | |
| "columns": ["order_id", "customer_name", "notes"], |
| // Collect all results from each index | ||
| json all_results = json::array(); | ||
| int total_matches = 0; | ||
|
|
||
| for (std::vector<SQLite3_row*>::iterator it = indexes_result->rows.begin(); | ||
| it != indexes_result->rows.end(); ++it) { | ||
| SQLite3_row* row = *it; | ||
| const char* idx_schema = row->fields[0]; | ||
| const char* idx_table = row->fields[1]; | ||
|
|
||
| if (!idx_schema || !idx_table) continue; | ||
|
|
||
| std::string data_table = get_data_table_name(idx_schema, idx_table); | ||
| std::string fts_table = get_fts_table_name(idx_schema, idx_table); | ||
| std::string sanitized_data = data_table; | ||
|
|
||
| // Search query for this index (use table name for MATCH/bm25) | ||
| std::ostringstream search_sql; | ||
| search_sql << "SELECT d.schema_name, d.table_name, d.primary_key_value, " | ||
| << "snippet(" << fts_table << ", 0, '<mark>', '</mark>', '...', 30) AS snippet, " | ||
| << "d.metadata " | ||
| << "FROM " << fts_table << " " | ||
| << "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid " | ||
| << "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' " | ||
| << "ORDER BY bm25(" << fts_table << ") ASC " | ||
| << "LIMIT " << limit; | ||
|
|
||
| SQLite3_result* idx_resultset = NULL; | ||
| error = NULL; | ||
| cols = 0; | ||
| affected = 0; | ||
|
|
||
| idx_resultset = db->execute_statement(search_sql.str().c_str(), &error, &cols, &affected); | ||
|
|
||
| if (error) { | ||
| proxy_error("FTS search error on %s.%s: %s\n", idx_schema, idx_table, error); | ||
| (*proxy_sqlite3_free)(error); | ||
| } | ||
|
|
||
| if (idx_resultset) { | ||
| for (std::vector<SQLite3_row*>::iterator row_it = idx_resultset->rows.begin(); | ||
| row_it != idx_resultset->rows.end(); ++row_it) { | ||
| SQLite3_row* res_row = *row_it; | ||
|
|
||
| json match; | ||
| match["schema"] = res_row->fields[0] ? res_row->fields[0] : ""; | ||
| match["table"] = res_row->fields[1] ? res_row->fields[1] : ""; | ||
| match["primary_key_value"] = res_row->fields[2] ? res_row->fields[2] : ""; | ||
|
|
||
| match["snippet"] = res_row->fields[3] ? res_row->fields[3] : ""; | ||
|
|
||
| // Parse metadata JSON | ||
| try { | ||
| if (res_row->fields[4]) { | ||
| match["metadata"] = json::parse(res_row->fields[4]); | ||
| } else { | ||
| match["metadata"] = json::object(); | ||
| } | ||
| } catch (const json::exception& e) { | ||
| match["metadata"] = res_row->fields[4] ? res_row->fields[4] : ""; | ||
| } | ||
|
|
||
| all_results.push_back(match); | ||
| total_matches++; | ||
| } | ||
| delete idx_resultset; | ||
| } | ||
| } | ||
|
|
||
| delete indexes_result; | ||
| db->rdunlock(); |
There was a problem hiding this comment.
The search function fetches results from all matching indexes into an in-memory all_results vector and then applies pagination. This can lead to excessive memory consumption if there are many matches across numerous indexes, as the initial LIMIT is applied per index. A more scalable approach would be to perform pagination at the database level, for instance by combining results with UNION ALL and applying a single ORDER BY and LIMIT/OFFSET to the final result set.
| bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) { | ||
| pthread_mutex_lock(&fts_lock); | ||
|
|
||
| if (fts) { | ||
| delete fts; | ||
| fts = NULL; | ||
| } | ||
|
|
||
| if (!path.empty()) { | ||
| fts = new MySQL_FTS(path); | ||
| if (fts->init()) { | ||
| proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str()); | ||
| delete fts; | ||
| fts = NULL; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| pthread_mutex_unlock(&fts_lock); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
There are two functions for updating the FTS path: reset_fts_path and reinit_fts (lines 1142-1178). reset_fts_path is called by the configuration handler but lacks a check for the directory's existence. reinit_fts performs this check but appears to be unused. To improve robustness and eliminate redundant code, the directory existence check from reinit_fts should be integrated into reset_fts_path, and the unused reinit_fts function should be removed.
There was a problem hiding this comment.
Pull request overview
This pull request adds full-text search (FTS) capabilities to the MCP query server using SQLite's FTS5 extension with BM25 ranking. The implementation allows AI agents to quickly search MySQL table data through an external SQLite index without requiring native MySQL FTS support.
Changes:
- Added 6 new FTS tools:
fts_index_table,fts_search,fts_list_indexes,fts_delete_index,fts_reindex, andfts_rebuild_all - Implemented
MySQL_FTSclass for managing SQLite FTS5 indexes with BM25 ranking - Added thread-safe FTS operations with mutex protection and runtime path configuration
- Included comprehensive test scripts and detailed user documentation
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/mcp/test_mcp_fts_detailed.sh | Detailed FTS validation script testing all tools, runtime config changes, and edge cases |
| scripts/mcp/test_mcp_fts.sh | Comprehensive test suite with 21 test cases covering all FTS functionality |
| lib/Query_Tool_Handler.cpp | Registers 6 FTS tools in MCP protocol and routes calls to MySQL_Tool_Handler |
| lib/ProxySQL_MCP_Server.cpp | Passes FTS path from configuration to MySQL_Tool_Handler constructor |
| lib/MySQL_Tool_Handler.cpp | Wraps MySQL_FTS class with thread-safe mutex protection and runtime path reset |
| lib/MySQL_FTS.cpp | Core FTS implementation using SQLite FTS5 with indexing, search, and management |
| lib/Makefile | Adds MySQL_FTS.oo to build targets |
| lib/MCP_Thread.cpp | Handles mcp_fts_path configuration variable with runtime reload support |
| include/MySQL_Tool_Handler.h | Declares FTS integration methods and FTS pointer member |
| include/MySQL_FTS.h | Defines MySQL_FTS class interface for full-text search operations |
| include/MCP_Thread.h | Adds mcp_fts_path to configuration variables structure |
| doc/MCP/FTS_USER_GUIDE.md | Comprehensive 854-line user guide with architecture, API reference, and examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::string MySQL_FTS::escape_sql(const std::string& str) { | ||
| std::string escaped; | ||
| for (size_t i = 0; i < str.length(); i++) { | ||
| if (str[i] == '\'') { | ||
| escaped += "''"; | ||
| } else { | ||
| escaped += str[i]; | ||
| } | ||
| } | ||
| return escaped; | ||
| } |
There was a problem hiding this comment.
Potential SQL injection vulnerability: The escape_sql function only escapes single quotes, but table/column names are being directly interpolated into SQL statements without proper quoting or validation. While sanitize_name is used for some identifiers, the FTS search query at line 495 uses escape_sql(query) within a MATCH clause which could allow FTS query syntax injection. Consider using parameterized queries where possible, or implementing more robust escaping/validation for user-controlled inputs.
| // Check if directory exists (SQLite can't create directories) | ||
| std::string::size_type last_slash = fts_path.find_last_of("/"); | ||
| if (last_slash != std::string::npos && last_slash > 0) { | ||
| std::string dir = fts_path.substr(0, last_slash); | ||
| struct stat st; | ||
| if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) { | ||
| proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n", | ||
| fts_path.c_str(), dir.c_str()); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Missing error handling: If stat() fails, the code returns -1, but if the directory exists but is not writable, this check won't catch it. Consider adding a write permission check or attempting to create a test file in the directory to verify write access before attempting to initialize the FTS database.
|
|
||
| if (new_fts->init() != 0) { | ||
| proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str()); | ||
| delete new_fts; |
There was a problem hiding this comment.
Potential resource leak: If new_fts->init() fails at line 1164, the function deletes new_fts and returns -1 without keeping the old FTS handler. However, there's a comment "Return error WITHOUT closing old FTS" which is correct. But if the old fts is NULL (first initialization), the system will remain in a state where FTS is not initialized. Consider documenting this behavior more clearly or handling the null case explicitly.
| delete new_fts; | |
| delete new_fts; | |
| // If this was the first initialization attempt (no previous FTS handler), | |
| // the system remains without an initialized FTS instance. | |
| if (!fts) { | |
| proxy_error("MySQL_Tool_Handler: No previous FTS handler exists; FTS remains uninitialized after failure\n"); | |
| } |
| json cols_json = json::parse(columns); | ||
| if (!cols_json.is_array()) { | ||
| result["error"] = "columns must be a JSON array"; | ||
| return result.dump(); | ||
| } | ||
| } catch (const json::exception& e) { | ||
| result["error"] = std::string("Invalid JSON in columns: ") + e.what(); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| // Check if index already exists | ||
| if (index_exists(schema, table)) { | ||
| result["error"] = "Index already exists for " + schema + "." + table + ". Use fts_reindex to update."; | ||
| return result.dump(); | ||
| } | ||
|
|
||
| // Create index tables | ||
| if (create_index_tables(schema, table) != 0) { | ||
| result["error"] = "Failed to create index tables"; | ||
| return result.dump(); | ||
| } | ||
|
|
||
| // Parse columns and build query (ensure primary key is selected) | ||
| std::vector<std::string> indexed_cols; | ||
| std::vector<std::string> selected_cols; | ||
| std::unordered_set<std::string> seen; | ||
|
|
||
| try { | ||
| json cols_json = json::parse(columns); | ||
| if (!cols_json.is_array()) { | ||
| result["error"] = "columns must be a JSON array"; | ||
| return result.dump(); | ||
| } | ||
| for (const auto& col : cols_json) { | ||
| std::string col_name = col.get<std::string>(); | ||
| std::string col_lower = col_name; | ||
| std::transform(col_lower.begin(), col_lower.end(), col_lower.begin(), ::tolower); | ||
| indexed_cols.push_back(col_lower); | ||
| if (seen.insert(col_lower).second) { | ||
| selected_cols.push_back(col_name); | ||
| } | ||
| } | ||
| } catch (const json::exception& e) { | ||
| result["error"] = std::string("Failed to parse columns: ") + e.what(); | ||
| return result.dump(); | ||
| } |
There was a problem hiding this comment.
The columns parameter is parsed twice in the index_table method: once at lines 251-259 for validation, and again at lines 278-296 for actual processing. This is inefficient and could lead to inconsistencies if the JSON parsing behaves differently between the two calls. Consider parsing once and reusing the parsed result.
| test_fts_search_special_chars() { | ||
| # Create a document with special characters | ||
| mysql_exec "INSERT INTO ${TEST_SCHEMA}.${TEST_TABLE} (title, content, category, priority) \ | ||
| VALUES ('Special Chars', 'Test with @ # $ % ^ & * ( ) - _ = + [ ] { } | \\ : ; \" \" < > ? / ~', 'test', 'normal');" 2>/dev/null || true |
There was a problem hiding this comment.
In the test at line 921, special characters including backslashes are inserted into the database. The backslash should be escaped properly in the bash script string. The current approach might not correctly insert all special characters as intended. Consider using a here-document or different quoting mechanism for the SQL statement.
| VALUES ('Special Chars', 'Test with @ # $ % ^ & * ( ) - _ = + [ ] { } | \\ : ; \" \" < > ? / ~', 'test', 'normal');" 2>/dev/null || true | |
| VALUES ('Special Chars', 'Test with @ # $ % ^ & * ( ) - _ = + [ ] { } | \\\\ : ; \" \" < > ? / ~', 'test', 'normal');" 2>/dev/null || true |
| bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) { | ||
| pthread_mutex_lock(&fts_lock); | ||
|
|
||
| if (fts) { | ||
| delete fts; | ||
| fts = NULL; | ||
| } | ||
|
|
||
| if (!path.empty()) { | ||
| fts = new MySQL_FTS(path); | ||
| if (fts->init()) { | ||
| proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str()); | ||
| delete fts; | ||
| fts = NULL; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| pthread_mutex_unlock(&fts_lock); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The reset_fts_path method acquires fts_lock, but reinit_fts does not. If both methods can be called from different contexts, there's a potential for inconsistent locking patterns. Consider either making reinit_fts acquire the lock, or documenting that it must only be called when the caller already holds the lock, or consolidating the logic to avoid duplication.
| json all_results = json::array(); | ||
| int total_matches = 0; | ||
|
|
||
| for (std::vector<SQLite3_row*>::iterator it = indexes_result->rows.begin(); | ||
| it != indexes_result->rows.end(); ++it) { | ||
| SQLite3_row* row = *it; | ||
| const char* idx_schema = row->fields[0]; | ||
| const char* idx_table = row->fields[1]; | ||
|
|
||
| if (!idx_schema || !idx_table) continue; | ||
|
|
||
| std::string data_table = get_data_table_name(idx_schema, idx_table); | ||
| std::string fts_table = get_fts_table_name(idx_schema, idx_table); | ||
| std::string sanitized_data = data_table; | ||
|
|
||
| // Search query for this index (use table name for MATCH/bm25) | ||
| std::ostringstream search_sql; | ||
| search_sql << "SELECT d.schema_name, d.table_name, d.primary_key_value, " | ||
| << "snippet(" << fts_table << ", 0, '<mark>', '</mark>', '...', 30) AS snippet, " | ||
| << "d.metadata " | ||
| << "FROM " << fts_table << " " | ||
| << "JOIN " << sanitized_data << " d ON " << fts_table << ".rowid = d.rowid " | ||
| << "WHERE " << fts_table << " MATCH '" << escape_sql(query) << "' " | ||
| << "ORDER BY bm25(" << fts_table << ") ASC " | ||
| << "LIMIT " << limit; | ||
|
|
||
| SQLite3_result* idx_resultset = NULL; | ||
| error = NULL; | ||
| cols = 0; | ||
| affected = 0; | ||
|
|
||
| idx_resultset = db->execute_statement(search_sql.str().c_str(), &error, &cols, &affected); | ||
|
|
||
| if (error) { | ||
| proxy_error("FTS search error on %s.%s: %s\n", idx_schema, idx_table, error); | ||
| (*proxy_sqlite3_free)(error); | ||
| } | ||
|
|
||
| if (idx_resultset) { | ||
| for (std::vector<SQLite3_row*>::iterator row_it = idx_resultset->rows.begin(); | ||
| row_it != idx_resultset->rows.end(); ++row_it) { | ||
| SQLite3_row* res_row = *row_it; | ||
|
|
||
| json match; | ||
| match["schema"] = res_row->fields[0] ? res_row->fields[0] : ""; | ||
| match["table"] = res_row->fields[1] ? res_row->fields[1] : ""; | ||
| match["primary_key_value"] = res_row->fields[2] ? res_row->fields[2] : ""; | ||
|
|
||
| match["snippet"] = res_row->fields[3] ? res_row->fields[3] : ""; | ||
|
|
||
| // Parse metadata JSON | ||
| try { | ||
| if (res_row->fields[4]) { | ||
| match["metadata"] = json::parse(res_row->fields[4]); | ||
| } else { | ||
| match["metadata"] = json::object(); | ||
| } | ||
| } catch (const json::exception& e) { | ||
| match["metadata"] = res_row->fields[4] ? res_row->fields[4] : ""; | ||
| } | ||
|
|
||
| all_results.push_back(match); | ||
| total_matches++; | ||
| } | ||
| delete idx_resultset; | ||
| } | ||
| } | ||
|
|
||
| delete indexes_result; | ||
| db->rdunlock(); | ||
|
|
||
| // Apply pagination to collected results | ||
| int total_size = (int)all_results.size(); | ||
| int start_idx = offset; | ||
| if (start_idx >= total_size) start_idx = total_size; | ||
| int end_idx = start_idx + limit; | ||
| if (end_idx > total_size) end_idx = total_size; | ||
|
|
||
| json paginated_results = json::array(); | ||
| for (int i = start_idx; i < end_idx; i++) { | ||
| paginated_results.push_back(all_results[i]); | ||
| } |
There was a problem hiding this comment.
The search method performs iteration over potentially large result sets without limiting memory usage. If a search matches millions of rows across multiple indexes, the all_results array at line 473 could grow very large before pagination is applied at line 544. Consider applying the limit during collection rather than after, or implementing streaming/chunked result processing.
| std::string MySQL_FTS::search( | ||
| const std::string& query, | ||
| const std::string& schema, | ||
| const std::string& table, | ||
| int limit, | ||
| int offset | ||
| ) { |
There was a problem hiding this comment.
Missing validation: The limit parameter in the search methods could potentially be negative or zero, which might cause unexpected behavior. Consider adding validation to ensure limit is positive (e.g., if (limit <= 0) limit = 100;) and similarly for offset to ensure it's non-negative.
| mysql_exec() { | ||
| local sql="$1" | ||
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" \ | ||
| -e "${sql}" 2>/dev/null |
There was a problem hiding this comment.
The test script uses set -e which causes the script to exit on any error. However, the mysql_exec function at line 279 redirects stderr to /dev/null, which might hide important error messages. Consider allowing errors to be visible during test execution, or at least logging them for debugging purposes.
| -e "${sql}" 2>/dev/null | |
| -e "${sql}" |
| char path_buf[db_path.size() + 1]; | ||
| strcpy(path_buf, db_path.c_str()); |
There was a problem hiding this comment.
The use of variable-length array (VLA) char path_buf[db_path.size() + 1] is not standard in C++ and is a C99 feature that may not be supported by all compilers. This can lead to portability issues. Consider using std::string directly with the SQLite3DB API, or dynamically allocate the buffer with new[] and delete[], or use std::vector<char>.
…leanup
Security fixes:
- Add escape_identifier() helper for proper SQLite identifier escaping
- Replace sanitize_name() with allowlist validation (ASCII letters, digits, underscore only)
- Fix MATCH clause FTS5 operator injection by wrapping query in double quotes
- Apply escape_identifier() to all DDL statements (CREATE, DROP, triggers)
Memory safety fixes:
- Replace VLA with std::vector in MySQL_FTS::init(), add delete on error path
- Fix memory leak: free error string before return in list_indexes()
- Fix reindex_json["error"] potential exception using .value() with default
Thread safety fixes:
- reinit_fts(): Add mutex lock around pointer swap
- reset_fts_path(): Move blocking init() outside lock, only swap pointer under lock
Code cleanup:
- Remove 7 debug fprintf statements from Query_Tool_Handler.cpp
- Remove unused #include <memory> from MySQL_FTS.h
Test script security fixes:
- Use MYSQL_PWD environment variable instead of -p"..." for password
- Add escape_sql() function and apply to INSERT statement
- Fix CURL_OPTS quoting: ${CURL_OPTS:+"${CURL_OPTS}"}
- Remove unused FTS_INDEX_NAME and SEARCH_QUERIES variables
Documentation fixes:
- Fix bare URL to markdown link format
- Add code block language identifiers (text, bash)
Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes
Summary
New Tools Added
fts_index_tablefts_searchfts_reindexfts_delete_indexfts_list_indexesfts_rebuild_allKey Features
Technical Details
Test Plan
scripts/mcp/test_mcp_fts.sh- Basic FTS functionalityscripts/mcp/test_mcp_fts_detailed.sh- Comprehensive FTS testing